Compiling the following function on x86 with options -O -fomit-frame-pointer -msse -S seems to result in erroneous code. #include <xmmintrin.h> __m128 bug(__m128 a, __m128 b) { __m128 c = _mm_sub_ps(a, b); return _mm_move_ss(c, a); } This is what the function in the resulting .s file looke like: .file "gccbug.c" .text .align 2 .globl __Z3bugU8__vectorfS_ .def __Z3bugU8__vectorfS_; .scl 2; .type 32; .endef __Z3bugU8__vectorfS_: subss %xmm1, %xmm0 ret According to http://www.intel.com/software/products/compilers/clin/docs/ug_cpp/comm1030.htm _mm_move_ss passes the upper three values through, but the code generated doesn't even calculate those values. I had expected the code to look more like this: movaps %xmm0, %xmm2 subps %xmm1, %xmm0 movss %xmm2, %xmm0 Hope that turned out right, I'm not experienced with AT&T syntax. Is this a bug, or have I misunderstood something? I get similar results with 3.3.4. Torgeir
*** Bug 18504 has been marked as a duplicate of this bug. ***
No you are reading the asm backwards, the corresponding intel asm is: bug: subss %xmm0, %xmm1 ret Which you can get with -masm=intel.
Subject: Re: _mm_move_ss SSE intrinsic causes erroneous pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-15 15:47 ------- > No you are reading the asm backwards, the corresponding intel asm is: > bug: subss %xmm0, %xmm1 > ret > > Which you can get with -masm=intel. > Yes, I know AT&T syntax is backwards compared to Intel, have I misinterpreted the intrinsics or resulting code somehow? To be more clear, I expected the C function to return the following vector: v0 = a0 v1 = a1 - b1 v2 = a2 - b2 v3 = a3 - b3 Instead it seems to return: v0 = a0 - b0 v1 = a1 v2 = a2 v3 = a3 _mm_move_ss should the three upper values from its first argument, and the lower from its second, no? Torgeir
Created attachment 7557 [details] Testcase for SSE bugs Running the program compiled by gcc -O gives the following output: -0.800000 1.300000 1.300000 1.300000 Without optimization enabled, the program simply crashes, probably because it contains movaps instructions that access unalligned memory. The same program compiled by Microsoft C++ gives the following output with or without optimization: 1.300000 -0.800000 -0.800000 -0.800000
Why was this bug closed without being resolved? I have attached an improved testcase that uncovers another bug that occurs when optimization is disabled, so hopefully it's okay that I reopen this report.
The problem was that "sse_movss" pattern has wrong vec_merge constant, so it was wrongly combined to "vmsubv4sf3" pattern. Instead of 0x1, "sse_movss" pattern should have 0x14: (define_insn "sse_movss" [(set (match_operand:V4SF 0 "register_operand" "=x") (vec_merge:V4SF (match_operand:V4SF 1 "register_operand" "0") (match_operand:V4SF 2 "register_operand" "x") (const_int 14)))] <- here! "TARGET_SSE" "movss\t{%2, %0|%0, %2}" [(set_attr "type" "ssemov") (set_attr "mode" "SF")]) With this change, the result from your testcase is OK. ASM code looks like: bug: movaps %xmm0, %xmm2 subps %xmm1, %xmm2 pushl %ebp movl %esp, %ebp movaps %xmm2, %xmm1 movss %xmm0, %xmm1 popl %ebp movaps %xmm1, %xmm0 ret BTW: "sse2_movsd" pattern has the same problem... And "sse_loadss_1", "sse_storess" and their sse2 equivalents. Bug is confirmed, a patch will follow soon... Thanks, Uros.
.
Patch here: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01377.html
Patch, rev. 2, waiting review: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01469.html
Subject: Bug 18503 CVSROOT: /cvs/gcc Module name: gcc Changes by: uros@gcc.gnu.org 2004-12-13 06:38:47 Modified files: gcc : ChangeLog gcc/config/i386: i386.md Log message: PR target/14941 PR target/18503 * config/i386/i386.md (sse_movss, sse2_movsd, sse2_movhpd): Fix wrong vec_merge selector bitmask. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6798&r2=2.6799 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.571&r2=1.572
Fixed.