27869 – "-O -fregmove" handles SSE scalar instructions incorrectly

Bug 27869 - "-O -fregmove" handles SSE scalar instructions incorrectly

Summary: "-O -fregmove" handles SSE scalar instructions incorrectly

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.1.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Jan Hubicka

URL:
Keywords:	ssemmx, wrong-code

Depends on:
Blocks:

Reported:	2006-06-01 23:35 UTC by Tijl Coosemans
Modified:	2008-02-03 14:32 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:	i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2007-04-04 12:47:19

Attachments
proposed patch (442 bytes, patch) 2006-06-02 11:02 UTC, Tijl Coosemans	Details \| Diff
ssealts (681 bytes, text/plain) 2007-04-06 17:01 UTC, Jan Hubicka	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tijl Coosemans 2006-06-01 23:35:33 UTC

Consider the following C program using SSE intrinsics:

//----------
#include <stdio.h>
#include <xmmintrin.h>

int main(int argc, const char **argv) {
	__m128 v;
	v = _mm_setr_ps( 1.0f, 2.0f, 3.0f, 4.0f );

	v = _mm_rsqrt_ss( v );
	v = _mm_add_ss( v, _mm_movehl_ps( v, v ));
	v = _mm_add_ss( v, _mm_shuffle_ps( v, v, _MM_SHUFFLE( 0, 0, 0, 1 )));

	printf( "%e %e %e %e\n", ((float *)&v)[0], ((float *)&v)[1], ((float *)&v)[2], ((float *)&v)[3] );
	return 0;
}
//----------

Compiling and running this gives different results depending on whether -fregmove is specified or not.

tijl@kalimero regmove% gcc41 -Wall -O -fno-regmove -march=pentium4m -o test main.c
tijl@kalimero regmove% ./test
5.999756e+00 2.000000e+00 3.000000e+00 4.000000e+00
tijl@kalimero regmove% gcc41 -Wall -O -fregmove -march=pentium4m -o test main.c
tijl@kalimero regmove% ./test
7.999756e+00 4.000000e+00 3.000000e+00 4.000000e+00

The first case (-fno-regmove) is the correct one.

When you take a look at the assembly output for both cases the problem is with an "addss %xmm1, %xmm0" that is changed to "addss %xmm0, %xmm1". This is incorrect. The addss instruction is not commutative (unlike addps which sums over the entire vector).

The same problem occurs with _mm_add_ss in the code above replaced by _mm_mul_ss (mulss instruction), but not with _mm_sub_ss for instance (obviously), so I suppose this can be fixed by handling addss and mulss the same way as subss.

I suppose other instructions could be affected too.

Comment 1 Andrew Pinski 2006-06-02 00:48:52 UTC

(define_insn "sse_vmaddv4sf3"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
        (vec_merge:V4SF
          (plus:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "%0")
                     (match_operand:V4SF 2 "nonimmediate_operand" "xm"))
          (match_dup 1)
          (const_int 1)))]
  "TARGET_SSE && ix86_binary_operator_ok (PLUS, V4SFmode, operands)"
  "addss\t{%2, %0|%0, %2}"
  [(set_attr "type" "sseadd")
   (set_attr "mode" "SF")])


The % is incorrect here.

Comment 2 Tijl Coosemans 2006-06-02 11:02:32 UTC

Created attachment 11578 [details]
proposed patch

This patch fixes my problems, but I'm not sure I got all cases and I'm not sure if the _finite versions of maxss and minss need fixing at all.

Comment 3 Steven Bosscher 2007-04-04 12:17:32 UTC

Richi, Honza, is anyone looking at this problem?

Comment 4 Richard Biener 2007-04-04 12:35:43 UTC

No.

Comment 5 Steven Bosscher 2007-04-04 12:47:19 UTC

Investigating...

Comment 6 Eric Christopher 2007-04-05 23:56:09 UTC

Actually, I'll go ahead and take this, it was reported internally as well here and I've got a patch in testing :)

Comment 7 Jan Hubicka 2007-04-06 16:07:44 UTC

Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> Investigating...
The attached patch to remove '%' seems correct to me.  Merge operating
wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
is wrong.  Or am I missing something?

Honza

Comment 8 stevenb.gcc@gmail.com 2007-04-06 16:43:19 UTC

Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> The attached patch to remove '%' seems correct to me.  Merge operating
> wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
> is wrong.  Or am I missing something?

The commutative alternative asm output should also be removed.

Comment 9 Jan Hubicka 2007-04-06 17:01:15 UTC

Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> 
> 
> ------- Comment #8 from stevenb dot gcc at gmail dot com  2007-04-06 16:43 -------
> Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly
> 
> > The attached patch to remove '%' seems correct to me.  Merge operating
> > wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
> > is wrong.  Or am I missing something?
> 
> The commutative alternative asm output should also be removed.

I don't think there are alternative asm outputs, just intel variants,
unless I missed something.  The min/max commutative variant should be
removed however, I am testing the attached patch.

Honza

Comment 10 Jan Hubicka 2007-04-06 17:01:15 UTC

Created attachment 13334 [details]
ssealts

Comment 11 Eric Christopher 2007-04-06 20:31:51 UTC

Jan,
Yeah, that's exactly the patch I had when it finishes testing ok (it did for me on i386), would you please commit it to the 4.2 branch as well?

Comment 12 Jan Hubicka 2007-04-10 00:06:27 UTC

Subject: Bug 27869

Author: hubicka
Date: Tue Apr 10 00:06:16 2007
New Revision: 123682

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123682
Log:

	PR target/27869
	* config/i386/sse.md
	(sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier.
	(sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove.
	(sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier.
	(sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md

Comment 13 Eric Christopher 2007-04-10 01:57:00 UTC

Any hope of getting this in 4.2 as well? It's not a regression, but is a fairly longstanding bug that's easier to trip than we'd like.

Comment 14 Mark Wielaard 2007-04-10 11:02:25 UTC

Assuming other mark should be CCed to make 4.2 decision.

Comment 15 Mark Mitchell 2007-04-10 18:05:08 UTC

Yes, this is OK for 4.2.

Comment 16 Jan Hubicka 2007-04-16 17:07:34 UTC

Subject: Bug 27869

Author: hubicka
Date: Mon Apr 16 17:07:19 2007
New Revision: 123876

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123876
Log:
	PR target/27869
	* config/i386/sse.md
	(sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier.
	(sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove.
	(sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier.
	(sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove.

Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/config/i386/sse.md

Comment 17 Steven Bosscher 2008-02-03 14:32:02 UTC

Honza forgot to close this, it seems.