Bug 68730 - [5 Regression] wrong code at -O3 on x86_64-linux-gnu (in 32-bit mode)
Summary: [5 Regression] wrong code at -O3 on x86_64-linux-gnu (in 32-bit mode)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.0
: P1 normal
Target Milestone: 5.4
Assignee: Bernd Schmidt
URL:
Keywords: ra, wrong-code
Depends on:
Blocks:
 
Reported: 2015-12-05 21:27 UTC by Zhendong Su
Modified: 2016-02-16 22:12 UTC (History)
4 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-12-07 00:00:00


Attachments
gcc6-pr68730.patch (903 bytes, patch)
2015-12-10 11:46 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zhendong Su 2015-12-05 21:27:07 UTC
The current gcc trunk miscompiles the following code on x86_64-linux-gnu at -O3 in the 32-bit mode (but not in the 64-bit mode). 

This is a regression from 5.3.x. 

-fno-tree-vrp makes the miscompilation disappear. 


$ gcc-trunk -v
Using built-in specs.
COLLECT_GCC=gcc-trunk
COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib
Thread model: posix
gcc version 6.0.0 20151204 (experimental) [trunk revision 231299] (GCC) 
$ 
$ gcc-trunk -m32 -O2 small.c; ./a.out
-28479
-28479
1
$ gcc-trunk -m64 -O3 small.c; ./a.out
-28479
-28479
1
$ gcc-5.3 -m32 -O3 small.c; ./a.out
-28479
-28479
1
$ 
$ gcc-trunk -m32 -O3 -fno-tree-vrp small.c; ./a.out
-28479
-28479
1
$ 
$ gcc-trunk -m32 -O3 small.c
$ timeout -s 9 10 ./a.out
-28479
-28479
-32319
Killed
$ 


---------------------------------------------------


int printf (const char *, ...);

int b, d, e;
unsigned long long c = 4100543410106915;

void
fn1 ()
{
  short f, g = 4 % c;
  int h = c;
  if (h)
    {
      int i = ~c;
      if (~c)
	i = 25662;
      f = g = i;
      h = c - g + ~-f;
      c = ~(c * h - f);
    }
  f = g;
  unsigned long long k = g || c;
  short l = c ^ g ^ k;
  if (g > 25662 || c == 74074520320 || !(g < 2))
    {
      k = c;
      l = g;
      c = ~((k && c) + ~l);
      f = ~(f * (c ^ k) | l);
      if (c > k)
	printf ("%d\n", f);
    }
  short m = -f;
  unsigned long long n = c;
  c = m * f | n % c;
  if (n)
    printf ("%d\n", f);
  while (f < -31807)
    ;
  c = ~(n | c) | f;
  if (n < c)
    printf ("%lld\n", (long long) f);
  for (; d;)
    for (; e;)
      for (;;)
	;
  c = h;
  c = l % c;
}

int
main ()
{
  for (; b < 2; b++)
    fn1 ();
  return 0;
}
Comment 1 Richard Biener 2015-12-07 09:24:32 UTC
Confirmed.  It ICEs with -fno-if-conversion

> ./xgcc -B. t.c -O3 -m32 -fno-if-conversion
t.c: In function ‘fn1’:
t.c:48:1: error: dominator of 13 should be 36, not 35
 }
 ^

t.c:48:1: error: dominator of 35 should be 38, not 37
t.c:48:1: error: dominator of 36 status unknown
t.c:48:1: internal compiler error: Segmentation fault
0xdcdd12 crash_signal
        /space/rguenther/src/svn/trunk3/gcc/toplev.c:334
0x949327 verify_dominators(cdi_direction)
        /space/rguenther/src/svn/trunk3/gcc/dominance.c:1033
0x947120 checking_verify_dominators
Comment 2 Jakub Jelinek 2015-12-07 16:35:46 UTC
Weird, can't reproduce either the hang (with -O3 -m32) nor the ICE (-O3 -m32 -fno-if-conversion).
Comment 3 Marek Polacek 2015-12-07 16:41:32 UTC
And I can reproduce the hang but not the ICE ;).
Comment 4 Marek Polacek 2015-12-09 16:29:16 UTC
And now I can only reproduce the ICE but not the hang.  Eh.
Comment 5 Marek Polacek 2015-12-09 16:33:29 UTC
Note that if you want to reproduce this by running cc1, add -march=x86-64:
$ ./cc1 -quiet v.c -O2 -fno-if-conversion -m32
$ ./cc1 -quiet v.c -O2 -fno-if-conversion -m32 -march=x86-64
v.c: In function ‘fn1’:
v.c:48:1: error: dominator of 13 should be 36, not 35
 }
 ^

v.c:48:1: error: dominator of 28 should be 31, not 30
v.c:48:1: error: dominator of 31 status unknown
v.c:48:1: internal compiler error: Segmentation fault
0xdc88c2 crash_signal
	/home/marek/src/gcc/gcc/toplev.c:334
0x95036a verify_dominators(cdi_direction)
	/home/marek/src/gcc/gcc/dominance.c:1032
0x94e111 checking_verify_dominators
	/home/marek/src/gcc/gcc/dominance.h:71
0x94f8a7 calculate_dominance_info(cdi_direction)
	/home/marek/src/gcc/gcc/dominance.c:664
0xb8811c ira
	/home/marek/src/gcc/gcc/ira.c:5155
0xb88a52 execute
	/home/marek/src/gcc/gcc/ira.c:5511
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
Comment 6 Marek Polacek 2015-12-09 17:02:37 UTC
The dominator ICE started with r228231.
Comment 7 Jakub Jelinek 2015-12-10 11:46:04 UTC
Created attachment 36980 [details]
gcc6-pr68730.patch

Untested fix for the dominator issue.  The problem is that the stv pass added by i386 after combiner computes CDI_DOMINATORS, but does not free them, and various passes in between the combiner and ira just assume dominators are not computed (e.g. pass_outof_cfg_layout).  So the fix can be either to do what I wrote in the patch, or use something like:
  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);

  if (dom_calculated_here)
    calculate_dominance_info (CDI_DOMINATORS);
...
  if (dom_calculated_here)
    free_dominance_info (CDI_DOMINATORS);
in pass_stv::execute, or track down all the passes between combiner and ira that could break dominators and free_dominance_info (CDI_DOMINATORS) in them.  Any preferences?
Comment 8 Jakub Jelinek 2015-12-10 15:55:23 UTC
As for the miscompilation, it looks like RA issue to me. Note, -m32 -O3 -march=x86-64 is needed on the testcase, and I've been actually looking at:
int printf (const char *, ...);

int b, d, e;
unsigned long long c = 4100543410106915;

void
fn1 ()
{
  short f, g = 4 % c;
  int h = c;
  if (h)
    {
      int i = ~c;
      if (~c)
	i = 25662;
      f = g = i;
      h = c - g + ~-f;
      c = ~(c * h - f);
    }
  f = g;
  unsigned long long k = g || c;
  short l = c ^ g ^ k;
  if (g > 25662 || c == 74074520320 || !(g < 2))
    {
      k = c;
      l = g;
      c = ~((k && c) + ~l);
      f = ~(f * (c ^ k) | l);
      if (c > k)
	printf ("X %d\n", f);
    }
  short m = -f;
  unsigned long long n = c;
  c = m * f | n % c;
  if (n)
    printf ("Y %d %d %d\n", f, g, l);
  while (f < -31807)
    ;
  c = ~(n | c) | f;
  if (n < c)
    printf ("Z %lld %d %d\n", (long long) f, h, l);
  for (; d;)
    for (; e;)
      for (;;)
	;
  c = h;
  c = l % c;
}

int
main ()
{
  for (; b < 2; b++)
    fn1 ();
  return 0;
}

testcase instead (following stuff is from that) with the same options (as makes it more clear what the bug is, for the code path taken in the first call to fn1, we end up with l being equal to f (i.e. -28479) instead of the right value, l being equal to g (i.e. 25662) as can be seen already at the Y printf.
We have in *.ira:
(code_label 294 125 127 16 24 "" [1 uses])
(note 127 294 128 16 [bb 16] NOTE_INSN_BASIC_BLOCK)
(insn 128 127 13 16 (set (reg:SI 107 [ _52 ])
        (sign_extend:SI (reg:HI 132 [ g.2_118 ]))) 147 {extendhisi2}
     (nil))
(insn 13 128 14 16 (set (reg:DI 130 [ c.0_114 ])
        (reg:DI 100 [ _42 ])) pr68730-2.c:29 85 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 100 [ _42 ])
        (nil)))
(insn 14 13 15 16 (set (reg/v:HI 88 [ l ])
        (reg/v:HI 87 [ g ])) pr68730-2.c:29 88 {*movhi_internal}
     (expr_list:REG_DEAD (reg/v:HI 87 [ g ])
        (nil)))
(insn 15 14 348 16 (set (reg/v:HI 87 [ g ])
        (reg:HI 132 [ g.2_118 ])) pr68730-2.c:28 88 {*movhi_internal}
     (nil))
(jump_insn 348 15 349 16 (set (pc)
        (label_ref 140)) 663 {jump}
     (nil)
where (reg:HI 132 [ g.2_118 ]) is apparently the RTL for the partition including both g.2_118 and _50 at least:
  g.2_206 = (unsigned short) g_202;
...
  # g.2_118 = PHI <g.2_206(25), g.2_206(10), 0(7)>
...
  _39 = ~prephitmp_197;
  _40 = iftmp.3_7 + _39;
  _41 = ~_40;
  _42 = (long long unsigned int) _41;
  c = _42;
  _45 = (unsigned short) g_202;
  _46 = _42 ^ c.0_114;
  _47 = (unsigned short) _46;
  _48 = _45 * _47;
  _49 = _48 | g.2_118;
  _50 = ~_49;
  f_51 = (short int) _50;
so while reg:HI 132 is initially equal to the current value of the g variable, it is reused for something else afterwards.
This particular problematic RTL bb is likely coming just from PHIs:
  # f_1 = PHI <g_202(9), f_51(12), f_51(13), g_202(8)>
  # l_5 = PHI <l_203(9), g_202(12), g_202(13), l_110(8)>
  # prephitmp_222 = PHI <g.2_206(9), _219(12), _220(13), 0(8)>
  # prephitmp_224 = PHI <prephitmp_199(9), _42(12), pretmp_223(13), prephitmp_199(8)>
  # prephitmp_228 = PHI <prephitmp_197(9), _226(12), _52(13), prephitmp_197(8)>
on certain edge.  At this point reg:HI 132 should contain one value (the current value of f after f = ~(f * (c ^ k) | l);, and
reg:HI 87 should contain the current value of g variable.  But in *.reload dump this is already:
(code_label 294 125 127 16 24 "" [1 uses])
(note 127 294 458 16 [bb 16] NOTE_INSN_BASIC_BLOCK)
(insn 458 127 128 16 (set (reg:HI 0 ax [orig:132 g.2_118 ] [132])
        (reg:HI 6 bp [orig:132 g.2_118 ] [132])) 88 {*movhi_internal}
     (nil))
(insn 128 458 419 16 (set (reg:SI 3 bx [orig:107 _52 ] [107])
        (sign_extend:SI (reg:HI 0 ax [orig:132 g.2_118 ] [132]))) 147 {extendhisi2}
     (nil))
(insn 419 128 13 16 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 16 [0x10])) [4 %sfp+-32 S4 A64])
        (reg:SI 3 bx [orig:107 _52 ] [107])) 86 {*movsi_internal}
     (nil))
(insn 13 419 479 16 (set (reg:DI 2 cx [orig:130 c.0_114 ] [130])
        (reg:DI 4 si [orig:100 _42 ] [100])) pr68730-2.c:29 85 {*movdi_internal}
     (nil))
(insn 479 13 474 16 (set (reg/v:HI 5 di [orig:87 g ] [87])
        (reg:HI 6 bp [orig:132 g.2_118 ] [132])) pr68730-2.c:29 88 {*movhi_internal}
     (nil))
(note 474 479 14 16 NOTE_INSN_DELETED)
(insn 14 474 15 16 (set (mem/c:HI (plus:SI (reg/f:SI 7 sp)
                (const_int 24 [0x18])) [4 %sfp+-24 S2 A32])
        (reg/v:HI 5 di [orig:87 g ] [87])) pr68730-2.c:29 88 {*movhi_internal}
     (nil))
(insn 15 14 348 16 (set (mem/c:HI (plus:SI (reg/f:SI 7 sp)
                (const_int 10 [0xa])) [4 %sfp+-38 S2 A16])
        (reg:HI 0 ax [orig:132 g.2_118 ] [132])) pr68730-2.c:28 88 {*movhi_internal}
     (nil))
which means that both l contains the g.2_118 value (aka f), instead of the desired one (g).
Vlad, can you please have a look?
Comment 9 Jakub Jelinek 2015-12-14 08:05:09 UTC
Author: jakub
Date: Mon Dec 14 08:04:37 2015
New Revision: 231606

URL: https://gcc.gnu.org/viewcvs?rev=231606&root=gcc&view=rev
Log:
	PR rtl-optimization/68730
	* cfgrtl.c (cfg_layout_finalize): Free dominators.

	* gcc.dg/pr68730.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr68730.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgrtl.c
    trunk/gcc/testsuite/ChangeLog
Comment 10 Jakub Jelinek 2016-01-23 20:22:10 UTC
The wrong-code issue went away with r232413.  Was that a fix for this, or just something that made the issue latent?
Comment 11 Jakub Jelinek 2016-01-25 11:00:16 UTC
Reverting r232500 and r232413 results in the miscompilation again.  So if those changes are optimizations only we have a latent issue.
Comment 12 Ilya Enkovich 2016-01-25 12:19:42 UTC
(In reply to Jakub Jelinek from comment #11)
> Reverting r232500 and r232413 results in the miscompilation again.  So if
> those changes are optimizations only we have a latent issue.

Those changes fix performance regression and were not supposed to resolve any stability issue.
Comment 13 Bernd Schmidt 2016-01-26 15:43:17 UTC
Investigating a bit.
Comment 14 Bernd Schmidt 2016-01-27 12:35:30 UTC
It does look like an issue with lra-remat...
Comment 15 Bernd Schmidt 2016-02-08 15:31:39 UTC
Author: bernds
Date: Mon Feb  8 15:31:08 2016
New Revision: 233215

URL: https://gcc.gnu.org/viewcvs?rev=233215&root=gcc&view=rev
Log:
Fix latent LRA remat issue (PR68730)

	PR rtl-optimization/68730
	* lra-remat.c (insn_to_cand_activation): New static variable.
	(lra_remat): Allocate and free it.
	(create_cand): New arg activation. Initialize a field in
	insn_to_cand_activation if it is nonnull.
	(create_cands): Pass the activation insn to create_cand when making
	a candidate involving an output reload.  Reorganize code a little.
	(do_remat): Keep track of active status of candidates in a separate
	bitmap.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-remat.c
Comment 16 Jeffrey A. Law 2016-02-09 18:44:06 UTC
I verified Bernd's committed patch fixes the issue (by going back to a Dec compiler to reproduce the runtime failure, then applying Bernd's patch and verifying the runtime behaviour is correct).  Closing...
Comment 17 Bernd Schmidt 2016-02-09 21:02:56 UTC
I was going to keep it open and ask for a 5.0 backport if it doesn't show negative effects in a week or two...
Comment 18 Jeffrey A. Law 2016-02-09 21:09:27 UTC
Ah, then reopened with a gcc-5 regression marker.
Comment 19 Bernd Schmidt 2016-02-16 21:14:31 UTC
Author: bernds
Date: Tue Feb 16 21:13:59 2016
New Revision: 233475

URL: https://gcc.gnu.org/viewcvs?rev=233475&root=gcc&view=rev
Log:
Backport lra-remat fix from mainline, PR68730

	PR rtl-optimization/68730
	* lra-remat.c (insn_to_cand_activation): New static variable.
	(lra_remat): Allocate and free it.
	(create_cand): New arg activation. Initialize a field in
	insn_to_cand_activation if it is nonnull.
	(create_cands): Pass the activation insn to create_cand when making
	a candidate involving an output reload.  Reorganize code a little.
	(do_remat): Keep track of active status of candidates in a separate
	bitmap.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/lra-remat.c
Comment 20 Bernd Schmidt 2016-02-16 22:12:24 UTC
Fixed everywhere.