#include <mmintrin.h> int foo( __m64 x ) { int y; __builtin_memcpy( &y, &x, sizeof( y ) ); return y; } gcc movd.c -O2 -S produces: foo: movq %xmm0, -24(%rsp) movl -24(%rsp), %eax ret while 'movd xmm0, eax' was expected.
You need -mtune=core to generate "movd %xmm0, %rax". Gcc 4.4 works.
(In reply to comment #1) > You need -mtune=core to generate "movd %xmm0, %rax". Gcc 4.4 works. is movd faster only on core2 architecture? and what about 32-bits? $ /opt/gcc44/bin/gcc movd.c -O2 -S -march=core2 -m32 foo: pushl %ebp movl %esp, %ebp subl $16, %esp movq %mm0, -16(%ebp) <===? movd mm0, eax movl -16(%ebp), %eax <===/ leave ret
(In reply to comment #2) > and what about 32-bits? The quote from i386.c: /* ??? This is a lie. We do have moves between mmx/general, and for mmx/sse2. But by saying we need secondary memory we discourage the register allocator from using the mmx registers unless needed. */ if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2)) return true;
The problem from Comment #2 won't be fixed. Too little gain for too much pain. And %mm registers are evil as they alias x87 regs.