Created attachment 37954 [details] Reproducer. Test case produces incorrect result starting with -O1 optimization level. Test case fails with -march=nehalem, haswell, knl and skylake-avx512 options and with gcc version 4.8.5 (Revision=226194), gcc version 4.9.4 20160311 (prerelease) (Revision=234148) and gcc version 5.3.1 20160311 (Revision=234148). Output: > g++ repr.cpp -o out -O1 -march=haswell; ./out -262143 > g++ repr.cpp -o out -O0 -march=haswell; ./out -1 GCC version: > g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/export/users/vlivinsk/gcc-trunk/bin/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /export/users/vlivinsk/gcc-trunk/gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-checking=release --enable-languages=c,c++,lto --with-gmp=/export/users/vlivinsk/gcc-trunk/gmp-6.1.0/bin --with-mpfr=/export/users/vlivinsk/gcc-trunk/mpfr-3.1.3/bin --with-mpc=/export/users/vlivinsk/gcc-trunk/mpc-1.0.3/bin --prefix=/export/users/vlivinsk/gcc-trunk/bin Thread model: posix gcc version 6.0.0 20160314 (experimental) (Revision=234175)
Confirmed. -O2 is fine again. void abort (void); int a = 1; unsigned int b = 2; int c = 0; int d = 0; void foo () { int e = ((-(c >= c)) < b) > ((int)(((unsigned long long int)(-1)) >> ((a / a) * 15))); d = -e; } int main () { foo (); if (d != -1) abort (); return 0; }
Inlining hides the RTL opt bug. Disabling RTL combine fixes it.
Yeah, it is combine. First of all, the -1ULL >> (... * 15) is properly optimized into (insn 12 11 13 2 (set (reg:DI 102) (const_int 8589934591 [0x1ffffffff])) pr70222.c:9 85 {*movdi_internal} (nil)) but the (insn 13 12 14 2 (parallel [ (set (reg:SI 104 [ e ]) (lshiftrt:SI (subreg:SI (reg:DI 102) 0) (const_int 31 [0x1f]))) (clobber (reg:CC 17 flags)) ]) pr70222.c:9 562 {*lshrsi3_1} (expr_list:REG_DEAD (reg:DI 102) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) which is supposed to yield 1 or 0 (in this case 1) is mis-simplified into: (insn 13 12 14 2 (parallel [ (set (subreg:DI (reg:SI 104 [ e ]) 0) (lshiftrt:DI (reg:DI 102) (subreg:QI (reg:SI 101) 0))) (clobber (reg:CC 17 flags)) ]) pr70222.c:9 564 {*lshrdi3_1} (expr_list:REG_DEAD (reg:SI 101) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:DI 102) (nil)))))
Looking at this.
So, the bug is apparently in simplify_shift_const_1 (code=LSHIFTRT, result_mode=SImode, varop=0x7ffff184de70, orig_count=31) where varop is: (subreg:SI (lshiftrt:DI (const_int -1 [0xffffffffffffffff]) (subreg:QI (reg:SI 100) 0)) 0) We optimize the inner shift (mode == DImode) into (lshiftrt:DI (const_int 8589934591 [0x1ffffffff]) (subreg:QI (reg:SI 100) 0)) 0) which is fine (assuming we do the final masking), and turn shift count into 0. shift_mode is still SImode, because nonzero_bits doesn't say the upper bits are 0, and we don't have OUTER_OP. For this to really work, we need to mask off the bits though. We have: /* If we were doing an LSHIFTRT in a wider mode than it was originally, turn off all the bits that the shift would have turned off. */ if (orig_code == LSHIFTRT && result_mode != shift_mode) x = simplify_and_const_int (NULL_RTX, shift_mode, x, GET_MODE_MASK (result_mode) >> orig_count); that does this, unfortunately it doesn't trigger here, because shift_mode == result_mode == SImode. So, one possibility is to do if (orig_code == LSHIFTRT && (result_mode != shift_mode || (result_mode != mode && count == 0))) (and maybe "&& orig_count != 0" too?), to say that we need to mask even if the shift in shift_mode actually isn't performed at all (or shall it be orig_count != count ?). Or we'd need to force outer_op in this case somewhere, but not sure what all cases would need to do that.
Created attachment 37956 [details] gcc6-pr70222.patch This untested patch works. Though, if that is the right way to go, I'd bootstrap/regtest it with some statistics gathering hack to see how often does it trigger.
The #c6 patch bootstrapped/regtested fine on x86_64-linux and i686-linux. I've additionally gathered statistics using: --- gcc/combine.c.jj 2016-03-14 14:00:24.000000000 +0100 +++ gcc/combine.c 2016-03-14 18:05:47.401401665 +0100 @@ -10838,6 +10838,19 @@ simplify_shift_const_1 (enum rtx_code co turn off all the bits that the shift would have turned off. Similarly do this if if we've optimized varop so that we don't perform any shift. */ +if (orig_code == LSHIFTRT && (mode != result_mode || result_mode != shift_mode) +&& ((orig_varop != varop && !rtx_equal_p (orig_varop, varop)) || orig_count != count || result_mode != shift_mode)) +{ +FILE *f = fopen ("/tmp/shifts", "a"); +fprintf (f, "@@@ %d %s %s %s %s %s %s %d %d\n--- ", (int) BITS_PER_WORD, main_input_filename ? main_input_filename : "-", +current_function_name (), GET_RTX_NAME (code), GET_MODE_NAME (result_mode), GET_MODE_NAME (mode), GET_MODE_NAME (shift_mode), +orig_count, count); +print_inline_rtx (f, orig_varop, 0); +fprintf (f, "\n+++ "); +print_inline_rtx (f, varop, 0); +fprintf (f, "\n"); +fclose (f); +} if (orig_code == LSHIFTRT && (result_mode != shift_mode || (result_mode != mode && count == 0))) across the two bootstraps/regtests. Out of 25808 records, the most common last 5 columns are (first number is from sort | uniq -c | sort -n, then code, result_mode, mode, shift_mode, orig_count, count): 112 QI SI SI 1 1 131 SI DI DI 8 8 170 QI HI HI 3 3 182 QI HI HI 1 1 189 HI DI DI 8 0 196 QI HI HI 4 4 200 HI SI SI 3 3 204 HI SI HI 8 8 216 HI SI SI 4 4 234 QI SI SI 7 7 250 SI DI DI 15 15 479 HI SI SI 8 8 599 SI DI DI 24 16 755 SI DI DI 16 0 816 HI SI SI 15 0 1372 HI SI SI 2 2 1673 HI SI SI 1 1 2509 SI DI DI 24 0 2743 SI DI SI 24 24 2839 SI DI DI 24 24 8238 HI SI SI 8 0 In addition, those where orig_count != count && count != 0 are (the 599 line above and): 2 HI SI SI 12 3 2 SI DI DI 21 20 3 SI DI SI 30 31 4 HI DI DI 15 1 4 SI DI DI 24 8 4 SI DI DI 28 16 4 SI DI DI 29 2 4 SI DI DI 3 1 4 SI DI DI 31 7 5 SI DI DI 16 8 7 HI SI SI 5 3 7 HI SI SI 6 2 9 HI SI SI 12 6 9 HI SI SI 12 9 9 SI DI DI 27 3 10 SI DI SI 28 31 12 SI DI SI 24 16 15 HI DI DI 14 2 15 HI DI DI 14 4 18 HI SI SI 11 8 34 HI SI SI 8 2 34 HI SI SI 8 3 35 SI DI SI 21 31 48 HI SI SI 15 8 91 HI SI SI 8 1 If (shift_mode != result_mode), combine.c used to do the masking already previously, so for us it is interesting only if shift_mode == result_mode (and mode != result_mode). Those are: 3 HI DI HI 15 15 3 SI DI SI 30 31 8 HI DI HI 8 8 10 SI DI SI 28 31 12 SI DI SI 24 16 15 SI DI SI 31 31 17 QI SI QI 7 7 24 QI DI QI 2 2 26 SI DI SI 6 6 32 HI SI HI 15 15 35 SI DI SI 21 31 48 QI SI QI 2 2 57 SI DI SI 31 0 204 HI SI HI 8 8 2743 SI DI SI 24 24 If the count is non-zero, then we've supposedly only changed the varop, but are shifting in the narrower mode anyway (will look at one or two examples just to be sure), so the interesting cases are: 57 SI DI SI 31 0 (this pr70222.c testcase (45 times, plus some LTO unidentifiable ones, bet this testcase with -flto)), and also interesting are ones where the shift count is different: 3 SI DI SI 30 31 10 SI DI SI 28 31 12 SI DI SI 24 16 35 SI DI SI 21 31 Will look at those now.
Example of lshiftrt SI DI SI 24 16 is e.g. gcc.c-torture/execute/20030408-1.c where we have SImode LSHIFTRT of (subreg:SI (ashift:DI (reg:DI 125) (const_int 8 [0x8])) 0) by 24, and is optimized into (subreg:SI (reg:DI 125) 0) shift down by 16, which is equivalent and we don't need masking. Another example is ashiftrt SI DI SI 28 31 in objc.dg/gnu-encoding/struct-layout-encoding-1_generate.c, where we have SImode LSHIFTRT of (ashiftrt:SI (subreg:SI (reg:DI 91 [ _12 ]) 0) (const_int 31 [0x1f])) by 28, and optimize this into: ASHIFTRT of (subreg:SI (reg:DI 91 [ _12 ]) 0) by 31. That would be wrong, if we didn't have outer_op of AND and outer_const 15, so we are fine too. So, looking at lines with result_mode == shift_mode && result_mode != mode && (orig_count != count || code != LSHIFTRT), I'm seeing: 2 @@@ 64 /tmp/ccf9ftjT.o test1 lshiftrt SI DI SI 24 16 2 @@@ 64 /tmp/ccq197Ke.ltrans0.o main lshiftrt SI DI SI 24 16 3 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gfortran.dg/function_optimize_11.f90 main ashiftrt SI DI SI 30 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Hash ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Get ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Remove ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Set ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Getxnb ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Presentxnb ashiftrt SI DI SI 21 31 5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Removexnb ashiftrt SI DI SI 21 31 6 @@@ 64 /tmp/ccJJfd4X.o foo lshiftrt SI DI SI 31 0 6 @@@ 64 /tmp/ccJJfd4X.o main lshiftrt SI DI SI 31 0 8 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/20030408-1.c test1 lshiftrt SI DI SI 24 16 8 @@@ 64 pem_decrypt_test.go x509.CreateCertificate rotate QI DI QI 2 2 10 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/objc.dg/gnu-encoding/struct-layout-encoding-1_generate.c generate_fields ashiftrt SI DI SI 28 31 16 @@@ 32 ../../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI SI QI 2 2 16 @@@ 32 ../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI SI QI 2 2 16 @@@ 32 pem_decrypt_test.go x509.CreateCertificate rotate QI SI QI 2 2 16 @@@ 64 ../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI DI QI 2 2 18 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr70222.c main lshiftrt SI DI SI 31 0 27 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr70222.c foo lshiftrt SI DI SI 31 0 pr70222.c is the debugged case, ashiftrt SI DI SI 28 31 analyzed above, lshiftrt SI DI SI 24 16 too, and the rest are similar to the analyzed ones. Thus, my conclusion is that a) the #c6 patch doesn't hopefully trigger that often (only on the testcase during x86_64/i686 bootstrap/regtest), and b) haven't found another wrong-code issue. Thus I think the patch is, while perhaps incomplete, a step in the right direction.
Author: jakub Date: Tue Mar 15 16:11:48 2016 New Revision: 234222 URL: https://gcc.gnu.org/viewcvs?rev=234222&root=gcc&view=rev Log: PR rtl-optimization/70222 * combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT optimization if mode is different from result_mode, queue up masking of the result in outer_op. Formatting fix. * gcc.c-torture/execute/pr70222-1.c: New test. * gcc.c-torture/execute/pr70222-2.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c trunk/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
Fixed for 6+ so far.
Author: jakub Date: Wed Mar 30 12:41:40 2016 New Revision: 234563 URL: https://gcc.gnu.org/viewcvs?rev=234563&root=gcc&view=rev Log: Backported from mainline 2016-03-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/70222 * combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT optimization if mode is different from result_mode, queue up masking of the result in outer_op. Formatting fix. * gcc.c-torture/execute/pr70222-1.c: New test. * gcc.c-torture/execute/pr70222-2.c: New test. Added: branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/combine.c branches/gcc-5-branch/gcc/testsuite/ChangeLog
Fixed for 5.4+ too.
GCC 6.1 has been released.
Author: jakub Date: Thu Jul 7 21:49:58 2016 New Revision: 238140 URL: https://gcc.gnu.org/viewcvs?rev=238140&root=gcc&view=rev Log: Backported from mainline 2016-03-15 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/70222 * combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT optimization if mode is different from result_mode, queue up masking of the result in outer_op. Formatting fix. * gcc.c-torture/execute/pr70222-1.c: New test. * gcc.c-torture/execute/pr70222-2.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c branches/gcc-4_9-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/combine.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Fixed.