Bug 70222 - Test miscompiled with -O1
Summary: Test miscompiled with -O1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: 6.2
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2016-03-14 09:16 UTC by Vsevolod Livinskii
Modified: 2021-11-01 23:07 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64
Build:
Known to work:
Known to fail: 4.0.4, 4.1.2, 4.2.4, 4.3.6, 4.8.5, 4.9.4, 5.3.1
Last reconfirmed: 2016-03-14 00:00:00


Attachments
Reproducer. (177 bytes, text/x-csrc)
2016-03-14 09:16 UTC, Vsevolod Livinskii
Details
gcc6-pr70222.patch (854 bytes, text/plain)
2016-03-14 13:06 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2016-03-14 09:16:03 UTC
Created attachment 37954 [details]
Reproducer.

Test case produces incorrect result starting with -O1 optimization level. Test case fails with -march=nehalem, haswell, knl and skylake-avx512 options and with gcc version 4.8.5 (Revision=226194), gcc version 4.9.4 20160311 (prerelease) (Revision=234148) and gcc version 5.3.1 20160311 (Revision=234148).

Output:
>  g++ repr.cpp -o out -O1 -march=haswell; ./out
-262143
>  g++ repr.cpp -o out -O0 -march=haswell; ./out
-1

GCC version:
> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/export/users/vlivinsk/gcc-trunk/bin/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /export/users/vlivinsk/gcc-trunk/gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-checking=release --enable-languages=c,c++,lto --with-gmp=/export/users/vlivinsk/gcc-trunk/gmp-6.1.0/bin --with-mpfr=/export/users/vlivinsk/gcc-trunk/mpfr-3.1.3/bin --with-mpc=/export/users/vlivinsk/gcc-trunk/mpc-1.0.3/bin --prefix=/export/users/vlivinsk/gcc-trunk/bin
Thread model: posix
gcc version 6.0.0 20160314 (experimental) (Revision=234175)
Comment 1 Richard Biener 2016-03-14 10:44:17 UTC
Confirmed.  -O2 is fine again.

void abort (void);

int a = 1;
unsigned int b = 2;
int c = 0;
int d = 0;

void foo () 
{
  int e = ((-(c >= c)) < b) > ((int)(((unsigned long long int)(-1)) >> ((a / a) * 15)));
  d = -e;
}

int main ()
{
  foo ();
  if (d != -1)
    abort ();
  return 0;
}
Comment 2 Richard Biener 2016-03-14 10:48:23 UTC
Inlining hides the RTL opt bug.  Disabling RTL combine fixes it.
Comment 3 Jakub Jelinek 2016-03-14 11:03:02 UTC
Yeah, it is combine.
First of all, the -1ULL >> (... * 15) is properly optimized into
(insn 12 11 13 2 (set (reg:DI 102)
        (const_int 8589934591 [0x1ffffffff])) pr70222.c:9 85 {*movdi_internal}
     (nil))
but the
(insn 13 12 14 2 (parallel [
            (set (reg:SI 104 [ e ])
                (lshiftrt:SI (subreg:SI (reg:DI 102) 0)
                    (const_int 31 [0x1f])))
            (clobber (reg:CC 17 flags))
        ]) pr70222.c:9 562 {*lshrsi3_1}
     (expr_list:REG_DEAD (reg:DI 102)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
which is supposed to yield 1 or 0 (in this case 1) is mis-simplified into:
(insn 13 12 14 2 (parallel [
            (set (subreg:DI (reg:SI 104 [ e ]) 0)
                (lshiftrt:DI (reg:DI 102)
                    (subreg:QI (reg:SI 101) 0)))
            (clobber (reg:CC 17 flags))
        ]) pr70222.c:9 564 {*lshrdi3_1}
     (expr_list:REG_DEAD (reg:SI 101)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_DEAD (reg:DI 102)
                (nil)))))
Comment 4 Jakub Jelinek 2016-03-14 11:15:09 UTC
Looking at this.
Comment 5 Jakub Jelinek 2016-03-14 12:54:18 UTC
So, the bug is apparently in simplify_shift_const_1 (code=LSHIFTRT, result_mode=SImode, varop=0x7ffff184de70, orig_count=31)
where varop is:
(subreg:SI (lshiftrt:DI (const_int -1 [0xffffffffffffffff])
        (subreg:QI (reg:SI 100) 0)) 0)
We optimize the inner shift (mode == DImode) into
(lshiftrt:DI (const_int 8589934591 [0x1ffffffff])
        (subreg:QI (reg:SI 100) 0)) 0)
which is fine (assuming we do the final masking), and turn shift count into 0.  shift_mode is still SImode, because nonzero_bits doesn't say the upper bits are 0, and we don't have OUTER_OP.  For this to really work, we need to mask off the bits though.

We have:

  /* If we were doing an LSHIFTRT in a wider mode than it was originally,  
     turn off all the bits that the shift would have turned off.  */
  if (orig_code == LSHIFTRT && result_mode != shift_mode)
    x = simplify_and_const_int (NULL_RTX, shift_mode, x,
                                GET_MODE_MASK (result_mode) >> orig_count);

that does this, unfortunately it doesn't trigger here, because shift_mode == result_mode == SImode.

So, one possibility is to do
  if (orig_code == LSHIFTRT
      && (result_mode != shift_mode
          || (result_mode != mode && count == 0)))
(and maybe "&& orig_count != 0" too?), to say that we need to mask even if the shift in shift_mode actually isn't performed at all (or shall it be orig_count != count ?).

Or we'd need to force outer_op in this case somewhere, but not sure what all cases would need to do that.
Comment 6 Jakub Jelinek 2016-03-14 13:06:59 UTC
Created attachment 37956 [details]
gcc6-pr70222.patch

This untested patch works.  Though, if that is the right way to go, I'd bootstrap/regtest it with some statistics gathering hack to see how often does it trigger.
Comment 7 Jakub Jelinek 2016-03-15 09:40:05 UTC
The #c6 patch bootstrapped/regtested fine on x86_64-linux and i686-linux.  I've additionally gathered statistics using:
--- gcc/combine.c.jj	2016-03-14 14:00:24.000000000 +0100
+++ gcc/combine.c	2016-03-14 18:05:47.401401665 +0100
@@ -10838,6 +10838,19 @@ simplify_shift_const_1 (enum rtx_code co
      turn off all the bits that the shift would have turned off.
      Similarly do this if if we've optimized varop so that we don't perform
      any shift.  */
+if (orig_code == LSHIFTRT && (mode != result_mode || result_mode != shift_mode)
+&& ((orig_varop != varop && !rtx_equal_p (orig_varop, varop)) || orig_count != count || result_mode != shift_mode))
+{
+FILE *f = fopen ("/tmp/shifts", "a");
+fprintf (f, "@@@ %d %s %s %s %s %s %s %d %d\n--- ", (int) BITS_PER_WORD, main_input_filename ? main_input_filename : "-",
+current_function_name (), GET_RTX_NAME (code), GET_MODE_NAME (result_mode), GET_MODE_NAME (mode), GET_MODE_NAME (shift_mode),
+orig_count, count);
+print_inline_rtx (f, orig_varop, 0);
+fprintf (f, "\n+++ ");
+print_inline_rtx (f, varop, 0);
+fprintf (f, "\n");
+fclose (f);
+}
   if (orig_code == LSHIFTRT
       && (result_mode != shift_mode
 	  || (result_mode != mode && count == 0)))
across the two bootstraps/regtests.  Out of 25808 records, the most common last 5 columns are (first number is from sort | uniq -c | sort -n, then code, result_mode, mode, shift_mode, orig_count, count):
    112 QI SI SI 1 1
    131 SI DI DI 8 8
    170 QI HI HI 3 3
    182 QI HI HI 1 1
    189 HI DI DI 8 0
    196 QI HI HI 4 4
    200 HI SI SI 3 3
    204 HI SI HI 8 8
    216 HI SI SI 4 4
    234 QI SI SI 7 7
    250 SI DI DI 15 15
    479 HI SI SI 8 8
    599 SI DI DI 24 16
    755 SI DI DI 16 0
    816 HI SI SI 15 0
   1372 HI SI SI 2 2
   1673 HI SI SI 1 1
   2509 SI DI DI 24 0
   2743 SI DI SI 24 24
   2839 SI DI DI 24 24
   8238 HI SI SI 8 0

In addition, those where orig_count != count && count != 0 are (the 599 line above and):
      2 HI SI SI 12 3
      2 SI DI DI 21 20
      3 SI DI SI 30 31
      4 HI DI DI 15 1
      4 SI DI DI 24 8
      4 SI DI DI 28 16
      4 SI DI DI 29 2
      4 SI DI DI 3 1
      4 SI DI DI 31 7
      5 SI DI DI 16 8
      7 HI SI SI 5 3
      7 HI SI SI 6 2
      9 HI SI SI 12 6
      9 HI SI SI 12 9
      9 SI DI DI 27 3
     10 SI DI SI 28 31
     12 SI DI SI 24 16
     15 HI DI DI 14 2
     15 HI DI DI 14 4
     18 HI SI SI 11 8
     34 HI SI SI 8 2
     34 HI SI SI 8 3
     35 SI DI SI 21 31
     48 HI SI SI 15 8
     91 HI SI SI 8 1

If (shift_mode != result_mode), combine.c used to do the masking already previously, so for us it is interesting only if shift_mode == result_mode (and mode != result_mode).
Those are:
      3 HI DI HI 15 15
      3 SI DI SI 30 31
      8 HI DI HI 8 8
     10 SI DI SI 28 31
     12 SI DI SI 24 16
     15 SI DI SI 31 31
     17 QI SI QI 7 7
     24 QI DI QI 2 2
     26 SI DI SI 6 6
     32 HI SI HI 15 15
     35 SI DI SI 21 31
     48 QI SI QI 2 2
     57 SI DI SI 31 0
    204 HI SI HI 8 8
   2743 SI DI SI 24 24
If the count is non-zero, then we've supposedly only changed the varop, but are shifting in the narrower mode anyway (will look at one or two examples just to be sure), so the interesting cases are:
     57 SI DI SI 31 0
(this pr70222.c testcase (45 times, plus some LTO unidentifiable ones, bet this testcase with -flto)), and also interesting are ones where the shift count is different:
      3 SI DI SI 30 31
     10 SI DI SI 28 31
     12 SI DI SI 24 16
     35 SI DI SI 21 31
Will look at those now.
Comment 8 Jakub Jelinek 2016-03-15 10:22:09 UTC
Example of lshiftrt SI DI SI 24 16 is e.g. gcc.c-torture/execute/20030408-1.c where we have SImode LSHIFTRT of
(subreg:SI (ashift:DI (reg:DI 125)
        (const_int 8 [0x8])) 0)
by 24, and is optimized into
(subreg:SI (reg:DI 125) 0)
shift down by 16, which is equivalent and we don't need masking.
Another example is ashiftrt SI DI SI 28 31 in objc.dg/gnu-encoding/struct-layout-encoding-1_generate.c, where we have SImode LSHIFTRT of
(ashiftrt:SI (subreg:SI (reg:DI 91 [ _12 ]) 0)
    (const_int 31 [0x1f]))
by 28, and optimize this into:
ASHIFTRT of
(subreg:SI (reg:DI 91 [ _12 ]) 0)
by 31.  That would be wrong, if we didn't have outer_op of AND and outer_const 15, so we are fine too.
So, looking at lines with result_mode == shift_mode && result_mode != mode && (orig_count != count || code != LSHIFTRT), I'm seeing:
      2 @@@ 64 /tmp/ccf9ftjT.o test1 lshiftrt SI DI SI 24 16
      2 @@@ 64 /tmp/ccq197Ke.ltrans0.o main lshiftrt SI DI SI 24 16
      3 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gfortran.dg/function_optimize_11.f90 main ashiftrt SI DI SI 30 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Hash ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Get ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Remove ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Set ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Getxnb ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Presentxnb ashiftrt SI DI SI 21 31
      5 @@@ 64 /home/jakub/src/gcc/gcc/ada/makeutl.adb Makeutl.Queue.Marks.Tab.Removexnb ashiftrt SI DI SI 21 31
      6 @@@ 64 /tmp/ccJJfd4X.o foo lshiftrt SI DI SI 31 0
      6 @@@ 64 /tmp/ccJJfd4X.o main lshiftrt SI DI SI 31 0
      8 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/20030408-1.c test1 lshiftrt SI DI SI 24 16
      8 @@@ 64 pem_decrypt_test.go x509.CreateCertificate rotate QI DI QI 2 2
     10 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/objc.dg/gnu-encoding/struct-layout-encoding-1_generate.c generate_fields ashiftrt SI DI SI 28 31
     16 @@@ 32 ../../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI SI QI 2 2
     16 @@@ 32 ../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI SI QI 2 2
     16 @@@ 32 pem_decrypt_test.go x509.CreateCertificate rotate QI SI QI 2 2
     16 @@@ 64 ../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificate rotate QI DI QI 2 2
     18 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr70222.c main lshiftrt SI DI SI 31 0
     27 @@@ 64 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr70222.c foo lshiftrt SI DI SI 31 0
pr70222.c is the debugged case, ashiftrt SI DI SI 28 31 analyzed above, lshiftrt SI DI SI 24 16 too, and the rest are similar to the analyzed ones.
Thus, my conclusion is that a) the #c6 patch doesn't hopefully trigger that often (only on the testcase during x86_64/i686 bootstrap/regtest), and b) haven't found another wrong-code issue.
Thus I think the patch is, while perhaps incomplete, a step in the right direction.
Comment 9 Jakub Jelinek 2016-03-15 16:12:20 UTC
Author: jakub
Date: Tue Mar 15 16:11:48 2016
New Revision: 234222

URL: https://gcc.gnu.org/viewcvs?rev=234222&root=gcc&view=rev
Log:
	PR rtl-optimization/70222
	* combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT
	optimization if mode is different from result_mode, queue up masking
	of the result in outer_op.  Formatting fix.

	* gcc.c-torture/execute/pr70222-1.c: New test.
	* gcc.c-torture/execute/pr70222-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog
Comment 10 Jakub Jelinek 2016-03-15 17:05:53 UTC
Fixed for 6+ so far.
Comment 11 Jakub Jelinek 2016-03-30 12:42:12 UTC
Author: jakub
Date: Wed Mar 30 12:41:40 2016
New Revision: 234563

URL: https://gcc.gnu.org/viewcvs?rev=234563&root=gcc&view=rev
Log:
	Backported from mainline
	2016-03-15  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/70222
	* combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT
	optimization if mode is different from result_mode, queue up masking
	of the result in outer_op.  Formatting fix.

	* gcc.c-torture/execute/pr70222-1.c: New test.
	* gcc.c-torture/execute/pr70222-2.c: New test.

Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c
    branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c
Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/combine.c
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
Comment 12 Jakub Jelinek 2016-03-30 13:29:40 UTC
Fixed for 5.4+ too.
Comment 13 Jakub Jelinek 2016-04-27 10:56:26 UTC
GCC 6.1 has been released.
Comment 14 Jakub Jelinek 2016-07-07 21:50:30 UTC
Author: jakub
Date: Thu Jul  7 21:49:58 2016
New Revision: 238140

URL: https://gcc.gnu.org/viewcvs?rev=238140&root=gcc&view=rev
Log:
	Backported from mainline
	2016-03-15  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/70222
	* combine.c (simplify_shift_const_1): For A >> B >> C LSHIFTRT
	optimization if mode is different from result_mode, queue up masking
	of the result in outer_op.  Formatting fix.

	* gcc.c-torture/execute/pr70222-1.c: New test.
	* gcc.c-torture/execute/pr70222-2.c: New test.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-1.c
    branches/gcc-4_9-branch/gcc/testsuite/gcc.c-torture/execute/pr70222-2.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/combine.c
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Comment 15 Jakub Jelinek 2016-07-08 07:21:16 UTC
Fixed.