Bug 27869 - "-O -fregmove" handles SSE scalar instructions incorrectly
Summary: "-O -fregmove" handles SSE scalar instructions incorrectly
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.2
: P3 normal
Target Milestone: ---
Assignee: Jan Hubicka
URL:
Keywords: ssemmx, wrong-code
Depends on:
Blocks:
 
Reported: 2006-06-01 23:35 UTC by Tijl Coosemans
Modified: 2008-02-03 14:32 UTC (History)
6 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-04-04 12:47:19


Attachments
proposed patch (442 bytes, patch)
2006-06-02 11:02 UTC, Tijl Coosemans
Details | Diff
ssealts (681 bytes, text/plain)
2007-04-06 17:01 UTC, Jan Hubicka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tijl Coosemans 2006-06-01 23:35:33 UTC
Consider the following C program using SSE intrinsics:

//----------
#include <stdio.h>
#include <xmmintrin.h>

int main(int argc, const char **argv) {
	__m128 v;
	v = _mm_setr_ps( 1.0f, 2.0f, 3.0f, 4.0f );

	v = _mm_rsqrt_ss( v );
	v = _mm_add_ss( v, _mm_movehl_ps( v, v ));
	v = _mm_add_ss( v, _mm_shuffle_ps( v, v, _MM_SHUFFLE( 0, 0, 0, 1 )));

	printf( "%e %e %e %e\n", ((float *)&v)[0], ((float *)&v)[1], ((float *)&v)[2], ((float *)&v)[3] );
	return 0;
}
//----------

Compiling and running this gives different results depending on whether -fregmove is specified or not.

tijl@kalimero regmove% gcc41 -Wall -O -fno-regmove -march=pentium4m -o test main.c
tijl@kalimero regmove% ./test
5.999756e+00 2.000000e+00 3.000000e+00 4.000000e+00
tijl@kalimero regmove% gcc41 -Wall -O -fregmove -march=pentium4m -o test main.c
tijl@kalimero regmove% ./test
7.999756e+00 4.000000e+00 3.000000e+00 4.000000e+00

The first case (-fno-regmove) is the correct one.

When you take a look at the assembly output for both cases the problem is with an "addss %xmm1, %xmm0" that is changed to "addss %xmm0, %xmm1". This is incorrect. The addss instruction is not commutative (unlike addps which sums over the entire vector).

The same problem occurs with _mm_add_ss in the code above replaced by _mm_mul_ss (mulss instruction), but not with _mm_sub_ss for instance (obviously), so I suppose this can be fixed by handling addss and mulss the same way as subss.

I suppose other instructions could be affected too.
Comment 1 Andrew Pinski 2006-06-02 00:48:52 UTC
(define_insn "sse_vmaddv4sf3"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
        (vec_merge:V4SF
          (plus:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "%0")
                     (match_operand:V4SF 2 "nonimmediate_operand" "xm"))
          (match_dup 1)
          (const_int 1)))]
  "TARGET_SSE && ix86_binary_operator_ok (PLUS, V4SFmode, operands)"
  "addss\t{%2, %0|%0, %2}"
  [(set_attr "type" "sseadd")
   (set_attr "mode" "SF")])


The % is incorrect here.
Comment 2 Tijl Coosemans 2006-06-02 11:02:32 UTC
Created attachment 11578 [details]
proposed patch

This patch fixes my problems, but I'm not sure I got all cases and I'm not sure if the _finite versions of maxss and minss need fixing at all.
Comment 3 Steven Bosscher 2007-04-04 12:17:32 UTC
Richi, Honza, is anyone looking at this problem? 
Comment 4 Richard Biener 2007-04-04 12:35:43 UTC
No.
Comment 5 Steven Bosscher 2007-04-04 12:47:19 UTC
Investigating...
Comment 6 Eric Christopher 2007-04-05 23:56:09 UTC
Actually, I'll go ahead and take this, it was reported internally as well here and I've got a patch in testing :)
Comment 7 Jan Hubicka 2007-04-06 16:07:44 UTC
Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> Investigating...
The attached patch to remove '%' seems correct to me.  Merge operating
wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
is wrong.  Or am I missing something?

Honza
Comment 8 stevenb.gcc@gmail.com 2007-04-06 16:43:19 UTC
Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> The attached patch to remove '%' seems correct to me.  Merge operating
> wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
> is wrong.  Or am I missing something?

The commutative alternative asm output should also be removed.
Comment 9 Jan Hubicka 2007-04-06 17:01:15 UTC
Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> 
> 
> ------- Comment #8 from stevenb dot gcc at gmail dot com  2007-04-06 16:43 -------
> Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly
> 
> > The attached patch to remove '%' seems correct to me.  Merge operating
> > wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
> > is wrong.  Or am I missing something?
> 
> The commutative alternative asm output should also be removed.

I don't think there are alternative asm outputs, just intel variants,
unless I missed something.  The min/max commutative variant should be
removed however, I am testing the attached patch.

Honza
Comment 10 Jan Hubicka 2007-04-06 17:01:15 UTC
Created attachment 13334 [details]
ssealts
Comment 11 Eric Christopher 2007-04-06 20:31:51 UTC
Jan,
Yeah, that's exactly the patch I had when it finishes testing ok (it did for me on i386), would you please commit it to the 4.2 branch as well?
Comment 12 Jan Hubicka 2007-04-10 00:06:27 UTC
Subject: Bug 27869

Author: hubicka
Date: Tue Apr 10 00:06:16 2007
New Revision: 123682

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123682
Log:

	PR target/27869
	* config/i386/sse.md
	(sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier.
	(sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove.
	(sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier.
	(sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md

Comment 13 Eric Christopher 2007-04-10 01:57:00 UTC
Any hope of getting this in 4.2 as well? It's not a regression, but is a fairly longstanding bug that's easier to trip than we'd like.
Comment 14 Mark Wielaard 2007-04-10 11:02:25 UTC
Assuming other mark should be CCed to make 4.2 decision.
Comment 15 Mark Mitchell 2007-04-10 18:05:08 UTC
Yes, this is OK for 4.2.
Comment 16 Jan Hubicka 2007-04-16 17:07:34 UTC
Subject: Bug 27869

Author: hubicka
Date: Mon Apr 16 17:07:19 2007
New Revision: 123876

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123876
Log:
	PR target/27869
	* config/i386/sse.md
	(sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier.
	(sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove.
	(sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier.
	(sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove.

Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/config/i386/sse.md

Comment 17 Steven Bosscher 2008-02-03 14:32:02 UTC
Honza forgot to close this, it seems.