gcc seems allergic to movq in the context of mmx: % cat movq.c #include <inttypes.h> #include <mmintrin.h> __m64 x; __m64 y; uint64_t foo(__m64 m) { return _mm_cvtm64_si64(_mm_add_pi32(x, y)); } % gcc -g -O3 -Wall -std=gnu99 -c -o movq.o movq.c % objdump -dr movq.o movq.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # 7 <foo+0x7> 3: R_X86_64_PC32 x+0xfffffffffffffffc 7: 48 89 44 24 f8 mov %rax,0xfffffffffffffff8(%rsp) c: 0f 6f 44 24 f8 movq 0xfffffffffffffff8(%rsp),%mm0 11: 0f fe 05 00 00 00 00 paddd 0(%rip),%mm0 # 18 <foo+0x18> 14: R_X86_64_PC32 y+0xfffffffffffffffc 18: 0f 7f 44 24 f8 movq %mm0,0xfffffffffffffff8(%rsp) 1d: 48 8b 44 24 f8 mov 0xfffffffffffffff8(%rsp),%rax 22: c3 retq the load of x should use "movq m64,mm". this is true in i386 targets as well. the transfer of %mm0 to %rax has the option of "movq %mm0,%rax" on x86_64, but should possibly be passed through memory depending on -mtune= settings: for intel core2 always use movq directly between the registers, no matter which direction. for AMD k8 family 15 always pass through mem for AMD k8 family 16+, for gpr->xmm/mmx pass through memory and for xmm/mmx -> gpr always use movd/movq direct between the registers. -dean p.s. gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c Thread model: posix gcc version 4.3.0 20071128 (experimental) (GCC)
this appears to be a regression between gcc 4.1.x and 4.2.x. i had to switch the intrinsic to _mm_cvtsi64_si64x but it otherwise generates the same code on 4.3.x... ubuntu 4.1.2: % objdump -dr movq.o movq.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 0f 6f 05 00 00 00 00 movq 0(%rip),%mm0 # 7 <foo+0x7> 3: R_X86_64_PC32 x+0xfffffffffffffffc 7: 0f fe 05 00 00 00 00 paddd 0(%rip),%mm0 # e <foo+0xe> a: R_X86_64_PC32 y+0xfffffffffffffffc e: 48 0f 7e c0 movd %mm0,%rax 12: c3 retq and 4.2.1: movq.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7 <foo+0x7> 3: R_X86_64_PC32 x+0xfffffffffffffffc 7: 48 89 44 24 f8 mov %rax,-0x8(%rsp) c: 0f 6f 44 24 f8 movq -0x8(%rsp),%mm0 11: 0f fe 05 00 00 00 00 paddd 0x0(%rip),%mm0 # 18 <foo+0x18> 14: R_X86_64_PC32 y+0xfffffffffffffffc 18: 0f 7f 44 24 f8 movq %mm0,-0x8(%rsp) 1d: 48 8b 44 24 f8 mov -0x8(%rsp),%rax 22: c3 retq
Created attachment 14653 [details] Patch to adjust mmx move instructions It looks that mmx move instructions need some tuning. Attached patch fixes your problems and generates (-march=core2): foo: .LFB4: movq x(%rip), %mm0 paddd y(%rip), %mm0 movd %mm0, %rax ret Since these RA adjustments are very fragile, this patch is not appropriate for stage3.
Confirmed.
*** This bug has been marked as a duplicate of 22076 ***
Subject: Bug 34256 Author: uros Date: Sat Feb 23 15:24:02 2008 New Revision: 132572 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132572 Log: PR target/22076 PR target/34256 * config/i386/mmx.md (*mov<mode>_internal_rex64): Use "!y" to prevent reload from using MMX registers. (*mov<mode>_internal): Ditto. (*movv2sf_internal_rex64): Ditto. (*movv2sf_internal): Ditto. testsuite/ChangeLog: PR target/22076 PR target/34256 * gcc.target/i386/pr22076.c: New test. * gcc.target/i386/pr34256.c: New test. * gcc.target/i386/vecinit-5.c: New test. * gcc.target/i386/vecinit-6.c: New test. * gcc.target/i386/vecinit-[1-4].c: Check that no MMX register is used. * g++.dg/compat/struct-layout-1.h: Do not include <mmintrin.h> and <xmmintrin.h>, define __m64 and __m128 directly. * g++.dg/compat/struct-layout-1_generate.c: Add -mno-mmx for x86. Added: trunk/gcc/testsuite/gcc.target/i386/pr22076.c trunk/gcc/testsuite/gcc.target/i386/pr34256.c trunk/gcc/testsuite/gcc.target/i386/vecinit-5.c trunk/gcc/testsuite/gcc.target/i386/vecinit-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/compat/struct-layout-1.h trunk/gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c trunk/gcc/testsuite/gcc.target/i386/vecinit-1.c trunk/gcc/testsuite/gcc.target/i386/vecinit-2.c trunk/gcc/testsuite/gcc.target/i386/vecinit-3.c trunk/gcc/testsuite/gcc.target/i386/vecinit-4.c
The gcc.target/i386/pr34256.c test case fails on i686-apple-darwin9 at -m64 as follows... Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdi r/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081115/gcc-4.4-20081115/gcc/testsuite/gcc.target/i386/pr34256.c -O2 -march=core2 -S -m64 -o pr3 4256.s (timeout = 300) PASS: gcc.target/i386/pr34256.c (test for excess errors) FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4
Created attachment 16691 [details] assembly file generated for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9
The gcc.target/i386/pr34256.c test case is still failing as... Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081116/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081116/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081116/gcc-4.4-20081116/gcc/testsuite/gcc.target/i386/pr34256.c -O2 -fomit-frame-pointer -march=core2 -S -m64 -o pr34256.s (timeout = 300) PASS: gcc.target/i386/pr34256.c (test for excess errors) FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4 UNSUPPORTED: gcc.target/i386/pr34312.c UNSUPPORTED: gcc.target/i386/pr34522.c UNSUPPORTED: gcc.target/i386/pr35083.c
Created attachment 16704 [details] assembly file for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9 with -fomit-frame-pointer
On i686-apple-darwin9, I have been using... Using built-in specs. Target: i686-apple-darwin9 Configured with: ../gcc-4.4-20081213/configure --prefix=/sw --prefix=/sw/lib/gcc4.4 --mandir=/sw/share/man --infodir=/sw/share/info --enable-languages=c,c++,fortran,objc,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --with-arch=nocona --with-tune=generic --build=i686-apple-darwin9 --host=i686-apple-darwin9 --target=i686-apple-darwin9 Thread model: posix gcc version 4.4.0 20081213 (experimental) (GCC) when the testsuite produces the failure... FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4
(In reply to comment #10) > FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4 PR 37364