Bug 18503 - _mm_move_ss SSE intrinsic causes erroneous
Summary: _mm_move_ss SSE intrinsic causes erroneous
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.4.2
: P2 normal
Target Milestone: 4.0.0
Assignee: Not yet assigned to anyone
URL:
Keywords: patch, wrong-code
: 18504 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-11-15 15:38 UTC by Torgeir Strand Henriksen
Modified: 2004-12-13 07:09 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.0.0
Last reconfirmed: 2004-11-17 09:19:16


Attachments
Testcase for SSE bugs (231 bytes, text/x-csrc)
2004-11-16 13:21 UTC, Torgeir Strand Henriksen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Torgeir Strand Henriksen 2004-11-15 15:38:59 UTC
Compiling the following function on x86 with options -O -fomit-frame-pointer
-msse -S seems to result in erroneous code.

#include <xmmintrin.h>

__m128 bug(__m128 a, __m128 b) {
	__m128 c = _mm_sub_ps(a, b);
	return _mm_move_ss(c, a);
}

This is what the function in the resulting .s file looke like:

	.file	"gccbug.c"
	.text
	.align 2
.globl __Z3bugU8__vectorfS_
	.def	__Z3bugU8__vectorfS_;	.scl	2;	.type	32;	.endef
__Z3bugU8__vectorfS_:
	subss	%xmm1, %xmm0
	ret

According to
http://www.intel.com/software/products/compilers/clin/docs/ug_cpp/comm1030.htm
_mm_move_ss passes the upper three values through, but the code generated
doesn't even calculate those values. I had expected the code to look more like this:

	movaps	%xmm0, %xmm2
	subps	%xmm1, %xmm0
	movss	%xmm2, %xmm0

Hope that turned out right, I'm not experienced with AT&T syntax. Is this a bug,
or have I misunderstood something? I get similar results with 3.3.4.

Torgeir
Comment 1 Andrew Pinski 2004-11-15 15:41:07 UTC
*** Bug 18504 has been marked as a duplicate of this bug. ***
Comment 2 Andrew Pinski 2004-11-15 15:47:08 UTC
No you are reading the asm backwards, the corresponding intel asm is:
bug:        subss   %xmm0, %xmm1
ret

Which you can  get with -masm=intel.
Comment 3 Torgeir Strand Henriksen 2004-11-15 16:26:14 UTC
Subject: Re:  _mm_move_ss SSE intrinsic causes
 erroneous

pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-15 15:47 -------
> No you are reading the asm backwards, the corresponding intel asm is:
> bug:        subss   %xmm0, %xmm1
> ret
> 
> Which you can  get with -masm=intel.
> 

Yes, I know AT&T syntax is backwards compared to Intel, have I 
misinterpreted the intrinsics or resulting code somehow? To be more 
clear, I expected the C function to return the following vector:

   v0 = a0
   v1 = a1 - b1
   v2 = a2 - b2
   v3 = a3 - b3

Instead it seems to return:

   v0 = a0 - b0
   v1 = a1
   v2 = a2
   v3 = a3

_mm_move_ss should the three upper values from its first argument, and 
the lower from its second, no?

Torgeir

Comment 4 Torgeir Strand Henriksen 2004-11-16 13:21:11 UTC
Created attachment 7557 [details]
Testcase for SSE bugs

Running the program compiled by gcc -O gives the following output:
-0.800000 1.300000 1.300000 1.300000
Without optimization enabled, the program simply crashes, probably because it
contains movaps instructions that access unalligned memory. The same program
compiled by Microsoft C++ gives the following output with or without
optimization:
1.300000 -0.800000 -0.800000 -0.800000
Comment 5 Torgeir Strand Henriksen 2004-11-16 13:23:04 UTC
Why was this bug closed without being resolved? I have attached an improved
testcase that uncovers another bug that occurs when optimization is disabled, so
hopefully it's okay that I reopen this report.
Comment 6 Uroš Bizjak 2004-11-17 09:17:25 UTC
The problem was that "sse_movss" pattern has wrong vec_merge constant, so it was
wrongly combined to "vmsubv4sf3" pattern. Instead of 0x1, "sse_movss" pattern
should have 0x14:

(define_insn "sse_movss"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
	(vec_merge:V4SF
	 (match_operand:V4SF 1 "register_operand" "0")
	 (match_operand:V4SF 2 "register_operand" "x")
	 (const_int 14)))]                               <- here!
  "TARGET_SSE"
  "movss\t{%2, %0|%0, %2}"
  [(set_attr "type" "ssemov")
   (set_attr "mode" "SF")])

With this change, the result from your testcase is OK. ASM code looks like:

bug:
        movaps %xmm0, %xmm2
        subps %xmm1, %xmm2
        pushl %ebp
        movl %esp, %ebp
        movaps %xmm2, %xmm1
        movss %xmm0, %xmm1
        popl %ebp
        movaps %xmm1, %xmm0
        ret

BTW: "sse2_movsd" pattern has the same problem... And "sse_loadss_1",
"sse_storess" and their sse2 equivalents.

Bug is confirmed, a patch will follow soon...

Thanks,
Uros.
Comment 7 uros 2004-11-17 09:19:15 UTC
.
Comment 8 uros 2004-11-17 10:10:14 UTC
Patch here: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01377.html
Comment 9 Uroš Bizjak 2004-11-22 06:36:28 UTC
Patch, rev. 2, waiting review:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01469.html
Comment 10 GCC Commits 2004-12-13 06:38:53 UTC
Subject: Bug 18503

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	uros@gcc.gnu.org	2004-12-13 06:38:47

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.md 

Log message:
	PR target/14941
	PR target/18503
	* config/i386/i386.md (sse_movss, sse2_movsd, sse2_movhpd):
	Fix wrong vec_merge selector bitmask.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6798&r2=2.6799
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.571&r2=1.572

Comment 11 uros 2004-12-13 07:09:37 UTC
Fixed.