The attached program works incorrectly when compiled for i686 with '-O3 -msse2' or '-O1 -ftree-vectorize -msse2'. GCC compiles the function #define N 1024 static unsigned int A[N]; double func (void) { unsigned int sum = 0; unsigned i; double t; for (i = 0; i < N; i++) sum += A[i]; t = sum; /* uint32 -> double */ return t; } into the following: 01: pxor %xmm0, %xmm0 02: movl $_A, %eax 03: L2: 04: paddd (%eax), %xmm0 05: addl $16, %eax 06: cmpl $_A+4096, %eax 07: jne L2 08: movdqa %xmm0, %xmm1 09: subl $28, %esp 10: psrldq $8, %xmm1 11: paddd %xmm1, %xmm0 12: movdqa %xmm0, %xmm1 13: psrldq $4, %xmm1 14: paddd %xmm1, %xmm0 15: movq %xmm0, 8(%esp) 16: fildq 8(%esp) 17: addl $28, %esp 18: ret After the line 07: xmm0 contains four partial sums. After the line 14: lower 4 bytes of xmm0 contain the total sum, the rest 12 bytes contain garbage. Lines 15 and 16: *eight* bytes from xmm0 are stored in memory and then loaded into an FPU register. According to the message from Robert Kausch ( http://lists.xiph.org/pipermail/flac-dev/2014-June/004723.html ) this bug exists since GCC 4.4.
Created attachment 32897 [details] test program
Confirmed. Works with -mfpmath=sse. Initial RTL looks ok to me: (insn 21 20 22 (set (reg:SI 99 [ stmp_sum_5.8 ]) (vec_select:SI (reg:V4SI 98 [ vect_sum_5.9 ]) (parallel [ (const_int 0 [0]) ]))) -1 (nil)) (insn 22 21 23 (parallel [ (set (reg:DF 97 [ t ]) (unsigned_float:DF (reg:SI 99 [ stmp_sum_5.8 ]))) (clobber (mem/c:DI (plus:SI (reg/f:SI 78 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 S8 A64])) (clobber (scratch:SI)) ]) t.c:18 -1 (nil)) (insn 23 22 24 (set (reg:DF 94 [ <retval> ]) (reg:DF 97 [ t ])) t.c:19 -1 (nil)) So it must be a bogus *floatunssidf2_1 pattern (insn 22 21 28 4 (parallel [ (set (reg:DF 8 st [orig:97 t ] [97]) (unsigned_float:DF (reg:SI 21 xmm0 [orig:99 stmp_sum_5.8 ] [99]))) (clobber (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 S8 A64])) (clobber (scratch:SI)) ]) t.c:18 211 {*floatunssidf2_1} (nil)) split to (insn 39 38 40 4 (set (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 S8 A64]) (reg:DI 21 xmm0 [orig:99 stmp_sum_5.8 ] [99])) t.c:18 89 {*movdi_internal} (nil)) (insn 40 39 28 4 (set (reg:DF 8 st [orig:97 t ] [97]) (float:DF (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 S8 A64]))) t.c:18 206 {*floatdidf2_i387} (nil)) note the use of a DImode memory but the missing zero-extend of xmm0:DI.
Looking into it.
Created attachment 32901 [details] Proposed patch This patch fies invalid code gen, but unfortunately uncovers a problem in REE pass and fails bootstrap when configured "--with-arch=core-avx-i --with-cpu=core-avx-i".
Created attachment 32903 [details] Testcase that fails in REE pass This testcase fails with patched gcc in REE pass: cc1 -O2 -m32 -march=corei7 libgcc2.i /home/uros/gcc-svn/trunk/libgcc/libgcc2.c: In function ‘__fixunssfdi’: /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1492:1: error: insn does not satisfy its constraints: } ^ (insn 54 11 47 2 (set (reg:DI 0 ax) (reg:DI 21 xmm0)) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 89 {*movdi_internal} (expr_list:REG_UNUSED (reg:DI 0 ax) (nil))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1492:1: internal compiler error: in copyprop_hardreg_forward_1, at regcprop.c:776 ... _split2 pass produces correct sequence: (insn 46 11 47 2 (set (reg:DI 21 xmm0 [118]) (zero_extend:DI (reg/v:SI 0 ax [orig:85 hi ] [85]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 133 {*zero_extendsidi2} (nil)) (insn 47 46 48 2 (set (mem/c:DI (reg/f:SI 7 sp) [0 S8 A64]) (reg:DI 21 xmm0 [118])) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 89 {*movdi_internal} (nil)) (insn 48 47 13 2 (set (reg:DF 9 st(1) [orig:101 D.6895 ] [101]) (float:DF (mem/c:DI (reg/f:SI 7 sp) [0 S8 A64]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 206 {*floatdidf2_i387} (nil)) and _ree pass converts this to: (insn 11 45 54 2 (set (reg:DI 21 xmm0) (zero_extend:DI (reg:SI 2 cx [99]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 133 {*zero_extendsidi2} (nil)) (insn 54 11 47 2 (set (reg:DI 0 ax) (reg:DI 21 xmm0)) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 -1 (nil)) (insn 47 54 48 2 (set (mem/c:DI (reg/f:SI 7 sp) [0 S8 A64]) (reg:DI 21 xmm0 [118])) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 89 {*movdi_internal} (nil)) (insn 48 47 13 2 (set (reg:DF 9 st(1) [orig:101 D.6895 ] [101]) (float:DF (mem/c:DI (reg/f:SI 7 sp) [0 S8 A64]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 206 {*floatdidf2_i387} (nil)) Please note ivalid (insn 54).
Jeff, can you please look at what goes wrong in REE pass?
Author: uros Date: Fri Jun 6 17:45:10 2014 New Revision: 211321 URL: http://gcc.gnu.org/viewcvs?rev=211321&root=gcc&view=rev Log: PR target/61423 * config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New define_insn_and_split pattern, merged from *floatunssi<mode>2_1 and corresponding splitters. Zero extend general register or memory input operand to XMM temporary. Enable for TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only. (floatunssi<mode>2): Update expander predicate. testsuite/ChangeLog: PR target/61423 * gcc.target/i386/pr61423.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr61423.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
(In reply to Uroš Bizjak from comment #5) > Created attachment 32903 [details] > Testcase that fails in REE pass > > This testcase fails with patched gcc in REE pass: This is now PR61446.
Shouldn't this patch be reverted until REE is fixed?
Either way will cause a failure. This is #1 on my hit list after the regex problem that's affecting mysql.
I expect the REE bug to be fixed tomorrow AM (my time). It's a trivial issue.
Author: uros Date: Tue Jun 17 05:00:52 2014 New Revision: 211723 URL: https://gcc.gnu.org/viewcvs?rev=211723&root=gcc&view=rev Log: Backport from mainline 2014-06-06 Uros Bizjak <ubizjak@gmail.com> PR target/61423 * config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New define_insn_and_split pattern, merged from *floatunssi<mode>2_1 and corresponding splitters. Zero extend general register or memory input operand to XMM temporary. Enable for TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only. (floatunssi<mode>2): Update expander predicate. testsuite/ChangeLog: Backport from mainline 2014-06-06 Uros Bizjak <ubizjak@gmail.com> PR target/61423 * gcc.target/i386/pr61423.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr61423.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/i386/i386.md branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Author: uros Date: Wed Jun 18 20:01:37 2014 New Revision: 211803 URL: https://gcc.gnu.org/viewcvs?rev=211803&root=gcc&view=rev Log: Backport from mainline 2014-06-06 Uros Bizjak <ubizjak@gmail.com> PR target/61423 * config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New define_insn_and_split pattern, merged from *floatunssi<mode>2_1 and corresponding splitters. Zero extend general register or memory input operand to XMM temporary. Enable for TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only. (floatunssi<mode>2): Update expander predicate. testsuite/ChangeLog: Backport from mainline 2014-06-13 Ilya Enkovich <ilya.enkovich@intel.com> PR rtl-optimization/61094 PR rtl-optimization/61446 * gcc.target/i386/pr61446.c : New. Backport from mainline 2014-06-06 Uros Bizjak <ubizjak@gmail.com> PR target/61423 * gcc.target/i386/pr61423.c: New test. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61423.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61446.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/config/i386/i386.md branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Fixed in all release branches.