Bug 49133 - [4.6/4.7 Regression] modification of aliased __m128d miscompiles
Summary: [4.6/4.7 Regression] modification of aliased __m128d miscompiles
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.1
: P3 normal
Target Milestone: 4.6.1
Assignee: Uroš Bizjak
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-23 21:55 UTC by Matthias Kretz (Vir)
Modified: 2011-05-25 16:39 UTC (History)
1 user (show)

See Also:
Host:
Target: x86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-05-24 08:22:04


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz (Vir) 2011-05-23 21:55:18 UTC
Compile the following testcase with "g++ -msse2 -O2"

#include <xmmintrin.h>

typedef double double_a __attribute__((__may_alias__));

struct V
{
    __m128d data;
};

int main()
{
    V a;
    __m128d b;
    b      = _mm_set_pd(1., 0.);
    a.data = _mm_set_pd(1., 0.);
    a.data = _mm_add_pd(a.data, _mm_and_pd(_mm_cmpeq_pd(a.data, _mm_set1_pd(0.)), _mm_set1_pd(2.)));
    reinterpret_cast<double_a *>(&a.data)[1] += 1.;
    b = _mm_add_pd(b, _mm_and_pd(_mm_cmpeq_pd(b, _mm_set1_pd(0.)), _mm_set1_pd(1.)));
    b = _mm_add_pd(b, _mm_and_pd(_mm_cmpeq_pd(b, _mm_set1_pd(1.)), _mm_set1_pd(1.)));
    if (_mm_movemask_pd(_mm_cmpeq_pd(a.data, b)) != 0x3) {
        abort();
    }
    return 0;
}

GCC 4.6.[01] calculate the correct values for a.data[0] and a.data[1] but fail to combine the results correctly. I.e. the resulting shufpd $0x1 is wrong. GCC 4.5.x uses unpacklpd, which gives the correct result, but emits unnecessary stores to the stack.
Comment 1 Uroš Bizjak 2011-05-24 08:22:04 UTC
Confirmed, proposed patch to sse2_loadhpd:

Index: sse.md
===================================================================
--- sse.md	(revision 174102)
+++ sse.md	(working copy)
@@ -4284,7 +4284,7 @@
    vmovhpd\t{%2, %1, %0|%0, %1, %2}
    unpcklpd\t{%2, %0|%0, %2}
    vunpcklpd\t{%2, %1, %0|%0, %1, %2}
-   shufpd\t{$1, %1, %0|%0, %1, 1}
+   shufpd\t{$0, %1, %0|%0, %1, 0}
    #
    #
    #"
Comment 2 Matthias Kretz (Vir) 2011-05-24 10:16:41 UTC
I applied the patch to the latest 4.6 snapshot. I confirm that it fixes the bug. Also, there are no regressions in my testsuite.

Just for confirmation, the patched sse.md looks like this for me now (starting from line 4952):
(define_insn "sse2_loadhpd"
  [(set (match_operand:V2DF 0 "nonimmediate_operand"     "=x,x,x,o,o,o")
    (vec_concat:V2DF
      (vec_select:DF
        (match_operand:V2DF 1 "nonimmediate_operand" " 0,0,x,0,0,0")
        (parallel [(const_int 0)]))
      (match_operand:DF 2 "nonimmediate_operand"     " m,x,0,x,*f,r")))]
  "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
  "@
   movhpd\t{%2, %0|%0, %2}
   unpcklpd\t{%2, %0|%0, %2}
   shufpd\t{$0, %1, %0|%0, %1, 0}
   #

Question, why not use unpcklpd instead of shufpd $0? On older CPUs unpcklpd should be slightly faster than shufpd.
Comment 3 Uroš Bizjak 2011-05-24 10:51:25 UTC
(In reply to comment #2)
> I applied the patch to the latest 4.6 snapshot. I confirm that it fixes the
> bug. Also, there are no regressions in my testsuite.
> 
> Just for confirmation, the patched sse.md looks like this for me now (starting
> from line 4952):
> (define_insn "sse2_loadhpd"
>   [(set (match_operand:V2DF 0 "nonimmediate_operand"     "=x,x,x,o,o,o")
>     (vec_concat:V2DF
>       (vec_select:DF
>         (match_operand:V2DF 1 "nonimmediate_operand" " 0,0,x,0,0,0")
>         (parallel [(const_int 0)]))
>       (match_operand:DF 2 "nonimmediate_operand"     " m,x,0,x,*f,r")))]
>   "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>   "@
>    movhpd\t{%2, %0|%0, %2}
>    unpcklpd\t{%2, %0|%0, %2}
>    shufpd\t{$0, %1, %0|%0, %1, 0}
>    #
> 
> Question, why not use unpcklpd instead of shufpd $0? On older CPUs unpcklpd
> should be slightly faster than shufpd.

OTOH, it looks that this alternative is wrong entirely. Unmodified operand can only be passed in lower half (operand 1 in the pattern above). GCC will then generate unpcklpd, as suggested.
Comment 4 Uroš Bizjak 2011-05-24 10:53:10 UTC
(In reply to comment #3)

> > I applied the patch to the latest 4.6 snapshot. I confirm that it fixes the
> > bug. Also, there are no regressions in my testsuite.

> OTOH, it looks that this alternative is wrong entirely. Unmodified operand can
> only be passed in lower half (operand 1 in the pattern above). GCC will then
> generate unpcklpd, as suggested.

Forgot to say, that unpcklpd will be generated with removed referred alternative.
Comment 5 uros 2011-05-24 15:31:15 UTC
Author: uros
Date: Tue May 24 15:31:12 2011
New Revision: 174122

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174122
Log:
	PR target/49133
	* config/i386/sse.md (sse2_loadhpd): Remove shufpd alternative.

testsuite/ChangeLog:

	PR target/49133
	* g++.dg/other/pr49133.C: New test.


Added:
    trunk/gcc/testsuite/g++.dg/other/pr49133.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 6 uros 2011-05-24 18:41:33 UTC
Author: uros
Date: Tue May 24 18:41:31 2011
New Revision: 174131

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174131
Log:
	PR target/49133
	* config/i386/sse.md (sse2_loadhpd): Remove shufpd alternative.

testsuite/ChangeLog:

	PR target/49133
	* g++.dg/other/pr49133.C: New test.


Added:
    branches/gcc-4_6-branch/gcc/testsuite/g++.dg/other/pr49133.C
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/i386/sse.md
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2011-05-25 07:42:31 UTC
Fixed.
Comment 8 uros 2011-05-25 13:26:46 UTC
Author: uros
Date: Wed May 25 13:26:42 2011
New Revision: 174195

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174195
Log:
	PR target/49133
	* config/i386/sse.md (sse2_loadhpd): Remove shufpd alternative.

testsuite/ChangeLog:

	PR target/49133
	* g++.dg/other/pr49133.C: New test.


Added:
    branches/gcc-4_5-branch/gcc/testsuite/g++.dg/other/pr49133.C
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/config/i386/sse.md
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Comment 9 uros 2011-05-25 16:39:25 UTC
Author: uros
Date: Wed May 25 16:39:22 2011
New Revision: 174215

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174215
Log:
	PR target/49133
	* config/i386/sse.md (sse2_loadhpd): Remove shufpd alternative.

testsuite/ChangeLog:

	PR target/49133
	* g++.dg/other/pr49133.C: New test.


Added:
    branches/gcc-4_4-branch/gcc/testsuite/g++.dg/other/pr49133.C
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/i386/sse.md
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog