Consider the following C program using SSE intrinsics: //---------- #include <stdio.h> #include <xmmintrin.h> int main(int argc, const char **argv) { __m128 v; v = _mm_setr_ps( 1.0f, 2.0f, 3.0f, 4.0f ); v = _mm_rsqrt_ss( v ); v = _mm_add_ss( v, _mm_movehl_ps( v, v )); v = _mm_add_ss( v, _mm_shuffle_ps( v, v, _MM_SHUFFLE( 0, 0, 0, 1 ))); printf( "%e %e %e %e\n", ((float *)&v)[0], ((float *)&v)[1], ((float *)&v)[2], ((float *)&v)[3] ); return 0; } //---------- Compiling and running this gives different results depending on whether -fregmove is specified or not. tijl@kalimero regmove% gcc41 -Wall -O -fno-regmove -march=pentium4m -o test main.c tijl@kalimero regmove% ./test 5.999756e+00 2.000000e+00 3.000000e+00 4.000000e+00 tijl@kalimero regmove% gcc41 -Wall -O -fregmove -march=pentium4m -o test main.c tijl@kalimero regmove% ./test 7.999756e+00 4.000000e+00 3.000000e+00 4.000000e+00 The first case (-fno-regmove) is the correct one. When you take a look at the assembly output for both cases the problem is with an "addss %xmm1, %xmm0" that is changed to "addss %xmm0, %xmm1". This is incorrect. The addss instruction is not commutative (unlike addps which sums over the entire vector). The same problem occurs with _mm_add_ss in the code above replaced by _mm_mul_ss (mulss instruction), but not with _mm_sub_ss for instance (obviously), so I suppose this can be fixed by handling addss and mulss the same way as subss. I suppose other instructions could be affected too.
(define_insn "sse_vmaddv4sf3" [(set (match_operand:V4SF 0 "register_operand" "=x") (vec_merge:V4SF (plus:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "%0") (match_operand:V4SF 2 "nonimmediate_operand" "xm")) (match_dup 1) (const_int 1)))] "TARGET_SSE && ix86_binary_operator_ok (PLUS, V4SFmode, operands)" "addss\t{%2, %0|%0, %2}" [(set_attr "type" "sseadd") (set_attr "mode" "SF")]) The % is incorrect here.
Created attachment 11578 [details] proposed patch This patch fixes my problems, but I'm not sure I got all cases and I'm not sure if the _finite versions of maxss and minss need fixing at all.
Richi, Honza, is anyone looking at this problem?
No.
Investigating...
Actually, I'll go ahead and take this, it was reported internally as well here and I've got a patch in testing :)
Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > Investigating... The attached patch to remove '%' seems correct to me. Merge operating wrapping the (commutative) plus/mult/min/max is not commutative, so '%' is wrong. Or am I missing something? Honza
Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > The attached patch to remove '%' seems correct to me. Merge operating > wrapping the (commutative) plus/mult/min/max is not commutative, so '%' > is wrong. Or am I missing something? The commutative alternative asm output should also be removed.
Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > > > ------- Comment #8 from stevenb dot gcc at gmail dot com 2007-04-06 16:43 ------- > Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > > > The attached patch to remove '%' seems correct to me. Merge operating > > wrapping the (commutative) plus/mult/min/max is not commutative, so '%' > > is wrong. Or am I missing something? > > The commutative alternative asm output should also be removed. I don't think there are alternative asm outputs, just intel variants, unless I missed something. The min/max commutative variant should be removed however, I am testing the attached patch. Honza
Created attachment 13334 [details] ssealts
Jan, Yeah, that's exactly the patch I had when it finishes testing ok (it did for me on i386), would you please commit it to the 4.2 branch as well?
Subject: Bug 27869 Author: hubicka Date: Tue Apr 10 00:06:16 2007 New Revision: 123682 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123682 Log: PR target/27869 * config/i386/sse.md (sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier. (sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove. (sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier. (sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
Any hope of getting this in 4.2 as well? It's not a regression, but is a fairly longstanding bug that's easier to trip than we'd like.
Assuming other mark should be CCed to make 4.2 decision.
Yes, this is OK for 4.2.
Subject: Bug 27869 Author: hubicka Date: Mon Apr 16 17:07:19 2007 New Revision: 123876 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123876 Log: PR target/27869 * config/i386/sse.md (sse_vmaddv4sf3, sse_vmmulv4sf3): Remove '%' modifier. (sse_vmsmaxv4sf3_finite, sse_vmsminv4sf3_finite): Remove. (sse2_vmaddv2df3, sse2_vmmulv2df3): Remove '%' modifier. (sse2_vmsmaxv2df3_finite, sse2_vmsminv2df3_finite): Remove. Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/config/i386/sse.md
Honza forgot to close this, it seems.