18503 – _mm_move_ss SSE intrinsic causes erroneous

Bug 18503 - _mm_move_ss SSE intrinsic causes erroneous

Summary: _mm_move_ss SSE intrinsic causes erroneous

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	3.4.2

Importance:	P2 normal
Target Milestone:	4.0.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	patch, wrong-code

Duplicates (1):	18504 (view as bug list)
Depends on:
Blocks:

Reported:	2004-11-15 15:38 UTC by Torgeir Strand Henriksen
Modified:	2004-12-13 07:09 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.0.0
Last reconfirmed:	2004-11-17 09:19:16

Attachments
Testcase for SSE bugs (231 bytes, text/x-csrc) 2004-11-16 13:21 UTC, Torgeir Strand Henriksen	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Torgeir Strand Henriksen 2004-11-15 15:38:59 UTC

Compiling the following function on x86 with options -O -fomit-frame-pointer
-msse -S seems to result in erroneous code.

#include <xmmintrin.h>

__m128 bug(__m128 a, __m128 b) {
	__m128 c = _mm_sub_ps(a, b);
	return _mm_move_ss(c, a);
}

This is what the function in the resulting .s file looke like:

	.file	"gccbug.c"
	.text
	.align 2
.globl __Z3bugU8__vectorfS_
	.def	__Z3bugU8__vectorfS_;	.scl	2;	.type	32;	.endef
__Z3bugU8__vectorfS_:
	subss	%xmm1, %xmm0
	ret

According to
http://www.intel.com/software/products/compilers/clin/docs/ug_cpp/comm1030.htm
_mm_move_ss passes the upper three values through, but the code generated
doesn't even calculate those values. I had expected the code to look more like this:

	movaps	%xmm0, %xmm2
	subps	%xmm1, %xmm0
	movss	%xmm2, %xmm0

Hope that turned out right, I'm not experienced with AT&T syntax. Is this a bug,
or have I misunderstood something? I get similar results with 3.3.4.

Torgeir

Comment 1 Andrew Pinski 2004-11-15 15:41:07 UTC

*** Bug 18504 has been marked as a duplicate of this bug. ***

Comment 2 Andrew Pinski 2004-11-15 15:47:08 UTC

No you are reading the asm backwards, the corresponding intel asm is:
bug:        subss   %xmm0, %xmm1
ret

Which you can  get with -masm=intel.

Comment 3 Torgeir Strand Henriksen 2004-11-15 16:26:14 UTC

Subject: Re:  _mm_move_ss SSE intrinsic causes
 erroneous

pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-15 15:47 -------
> No you are reading the asm backwards, the corresponding intel asm is:
> bug:        subss   %xmm0, %xmm1
> ret
> 
> Which you can  get with -masm=intel.
> 

Yes, I know AT&T syntax is backwards compared to Intel, have I 
misinterpreted the intrinsics or resulting code somehow? To be more 
clear, I expected the C function to return the following vector:

   v0 = a0
   v1 = a1 - b1
   v2 = a2 - b2
   v3 = a3 - b3

Instead it seems to return:

   v0 = a0 - b0
   v1 = a1
   v2 = a2
   v3 = a3

_mm_move_ss should the three upper values from its first argument, and 
the lower from its second, no?

Torgeir

Comment 4 Torgeir Strand Henriksen 2004-11-16 13:21:11 UTC

Created attachment 7557 [details]
Testcase for SSE bugs

Running the program compiled by gcc -O gives the following output:
-0.800000 1.300000 1.300000 1.300000
Without optimization enabled, the program simply crashes, probably because it
contains movaps instructions that access unalligned memory. The same program
compiled by Microsoft C++ gives the following output with or without
optimization:
1.300000 -0.800000 -0.800000 -0.800000

Comment 5 Torgeir Strand Henriksen 2004-11-16 13:23:04 UTC

Why was this bug closed without being resolved? I have attached an improved
testcase that uncovers another bug that occurs when optimization is disabled, so
hopefully it's okay that I reopen this report.

Comment 6 Uroš Bizjak 2004-11-17 09:17:25 UTC

The problem was that "sse_movss" pattern has wrong vec_merge constant, so it was
wrongly combined to "vmsubv4sf3" pattern. Instead of 0x1, "sse_movss" pattern
should have 0x14:

(define_insn "sse_movss"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
	(vec_merge:V4SF
	 (match_operand:V4SF 1 "register_operand" "0")
	 (match_operand:V4SF 2 "register_operand" "x")
	 (const_int 14)))]                               <- here!
  "TARGET_SSE"
  "movss\t{%2, %0|%0, %2}"
  [(set_attr "type" "ssemov")
   (set_attr "mode" "SF")])

With this change, the result from your testcase is OK. ASM code looks like:

bug:
        movaps %xmm0, %xmm2
        subps %xmm1, %xmm2
        pushl %ebp
        movl %esp, %ebp
        movaps %xmm2, %xmm1
        movss %xmm0, %xmm1
        popl %ebp
        movaps %xmm1, %xmm0
        ret

BTW: "sse2_movsd" pattern has the same problem... And "sse_loadss_1",
"sse_storess" and their sse2 equivalents.

Bug is confirmed, a patch will follow soon...

Thanks,
Uros.

Comment 7 uros 2004-11-17 09:19:15 UTC

Comment 8 uros 2004-11-17 10:10:14 UTC

Patch here: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01377.html

Comment 9 Uroš Bizjak 2004-11-22 06:36:28 UTC

Patch, rev. 2, waiting review:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01469.html

Comment 10 GCC Commits 2004-12-13 06:38:53 UTC

Subject: Bug 18503

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	uros@gcc.gnu.org	2004-12-13 06:38:47

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.md 

Log message:
	PR target/14941
	PR target/18503
	* config/i386/i386.md (sse_movss, sse2_movsd, sse2_movhpd):
	Fix wrong vec_merge selector bitmask.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6798&r2=2.6799
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.571&r2=1.572

Comment 11 uros 2004-12-13 07:09:37 UTC

Fixed.