Bug 69123 - [6 Regression] --with-build-config='bootstrap-O3 bootstrap-debug' miscompiled stage 2
Summary: [6 Regression] --with-build-config='bootstrap-O3 bootstrap-debug' miscompiled...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: bootstrap (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: 6.0
Assignee: Alexandre Oliva
URL:
Keywords:
: 69036 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-01-03 15:10 UTC by H.J. Lu
Modified: 2016-01-11 10:45 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-01-05 00:00:00


Attachments
A testcase (334.18 KB, application/octet-stream)
2016-01-05 13:29 UTC, H.J. Lu
Details
A smaller testcase (197.18 KB, application/octet-stream)
2016-01-05 14:55 UTC, H.J. Lu
Details
Patch I'm testing to fix the bug (1.81 KB, patch)
2016-01-09 01:37 UTC, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2016-01-03 15:10:45 UTC
On x86-64, --with-build-config='bootstrap-O3 bootstrap-debug' miscompiled
stage 2 into an infinite loop:

(gdb) bt
#0  0x0000000000e01141 in get_ref_base_and_extent(tree_node*, long*, long*, long*, bool*) ()
#1  0x0000000000ebb061 in ao_ref_base(ao_ref*) ()
#2  0x00000000008db7b2 in ao_ref_from_mem(ao_ref*, rtx_def const*) ()
#3  0x00000000008dbd2d in rtx_refs_may_alias_p(rtx_def const*, rtx_def const*, bool) ()
#4  0x00000000008e001a in true_dependence_1(rtx_def const*, machine_mode, rtx_def*, rtx_def const*, rtx_def*, bool) ()
#5  0x0000000001094a6d in drop_overlapping_mem_locs(variable**, overlapping_mems*) ()
#6  0x0000000001094bd9 in clobber_overlapping_mems(dataflow_set*, rtx_def*) ()
#7  0x0000000001099bf3 in val_store(dataflow_set*, rtx_def*, rtx_def*, rtx_insn*, bool) ()
#8  0x000000000109e086 in compute_bb_dataflow(basic_block_def*) [clone .isra.135] ()
#9  0x000000000109f4ca in vt_find_locations() ()
#10 0x00000000010a15e0 in variable_tracking_main_1() ()
#11 0x00000000010a178c in (anonymous namespace)::pass_variable_tracking::execute(function*) ()
#12 0x0000000000cb7d74 in execute_one_pass(opt_pass*) ()
#13 0x0000000000cb8a55 in execute_pass_list_1(opt_pass*) ()
#14 0x0000000000cb8cea in execute_pass_list(function*, opt_pass*) ()
#15 0x0000000000964254 in cgraph_node::expand() ()
---Type <return> to continue, or q <return> to quit---
#16 0x0000000000965c07 in symbol_table::compile() [clone .part.49] ()
#17 0x0000000000968293 in symbol_table::finalize_compilation_unit() ()
#18 0x0000000000dad023 in compile_file() ()
#19 0x00000000005d2186 in toplev::main(int, char**) ()
#20 0x00000000005d3db7 in main ()
(gdb)
Comment 1 H.J. Lu 2016-01-03 22:38:44 UTC
It was caused by r231674.
Comment 2 Andrew Pinski 2016-01-04 18:58:46 UTC
(In reply to H.J. Lu from comment #1)
> It was caused by r231674.

In this case I think exposed by is the correct term as this just changes the cost algorithm.

Also do you have a short testcase of what is being miscompiled?
Comment 3 H.J. Lu 2016-01-05 12:58:26 UTC
tree-vect-slp.o is miscompiled.  When I replaced tree-vect-slp.o from
r231672 build, cc1plus finished.
Comment 4 H.J. Lu 2016-01-05 13:08:35 UTC
r231674 may have exposed a latent bug in var-tracking.c since the same
file compiles without -g.
Comment 5 H.J. Lu 2016-01-05 13:29:02 UTC
Created attachment 37226 [details]
A testcase

./xgcc -B./ -S -O3 -g /tmp/x.ii -fno-rtti -fno-exceptions

never finishes.
Comment 6 Jakub Jelinek 2016-01-05 14:46:38 UTC
I bet vt_find_location must be oscillating for some reason and thus not converging.  This is on the get_address_cost routine.  Alex, do you think you could have a look?  Thanks.
Comment 7 H.J. Lu 2016-01-05 14:55:10 UTC
Created attachment 37228 [details]
A smaller testcase
Comment 8 H.J. Lu 2016-01-05 17:08:39 UTC
A small testcase:

[hjl@gnu-mic-2 i386]$ cat /tmp/x.ii
struct xxx_def;
typedef xxx_def *xxx;

union rtxxx
{
  const char *rt_str;
  xxx rt_xxx;
};

struct xxx_def {
  union u {
    rtxxx fld[1];
  } u;
};

extern xxx bar (void);
extern int foo1 (xxx);

static inline xxx
foo2 (xxx arg0, xxx arg1)
{
  xxx rt;
  rt = bar ();
  (((rt)->u.fld[0]).rt_xxx) = arg0;
  (((rt)->u.fld[1]).rt_xxx) = arg1;
  return rt;
}

static inline xxx
foo4 (const char *arg0 )
{
  xxx rt;
  rt = bar ();
  (((rt)->u.fld[0]).rt_str) = arg0;
  (((rt)->u.fld[1]).rt_xxx) = (xxx) 0;
  return rt;
}

extern xxx foo5 (long);

struct address_cost_data
{
  unsigned costs[2][2][2][2];
};

void
get_address_cost (address_cost_data *data)
{
  unsigned acost;
  long i;
  long rat, off = 0;
  unsigned sym_p, var_p, off_p, rat_p;
  xxx addr, base;
  xxx reg0, reg1;

  reg1 = bar ();
  addr = foo2 (reg1, (xxx) 0);
  rat = 1;
  acost = 0;
  reg0 = bar ();
  reg1 = bar ();

  for (i = 0; i < 16; i++)
    {
      sym_p = i & 1;
      var_p = (i >> 1) & 1;
      off_p = (i >> 2) & 1;
      rat_p = (i >> 3) & 1;

      addr = reg0;
      if (rat_p)
	addr = foo2 (addr, foo5 (rat)) ;

      if (var_p)
	addr = foo2 (addr, reg1);

      if (sym_p)
	base = foo4 ("");
      else if (off_p)
	base = foo5 (off);
      else
	base = (xxx) 0;

      if (base)
	addr = foo2 (addr, base);

      acost = foo1 (addr);
      data->costs[sym_p][var_p][off_p][rat_p] = acost;
    }
}
[hjl@gnu-mic-2 i386]$ ./cc1plus -fpreprocessed /tmp/x.ii -quiet -dumpbase x.ii -mtune=generic -march=x86-64 -auxbase x -g -O3 -version -fno-rtti -fno-exceptions -o x.s
Comment 9 H.J. Lu 2016-01-05 17:59:03 UTC
(insn:TI 248 289 246 8 (set (reg:V2DI 21 xmm0 [130])
        (mem/c:V2DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [9 %sfp+-32 S16 A128])) /tmp/x.ii:24 1215 {*movv2di_internal}
     (nil))
...
(insn:TI 250 118 120 9 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [9 %sfp+-32 S8 A128])
        (reg/v/f:DI 0 ax [orig:120 base ] [120])) /tmp/x.ii:33 85 {*movdi_internal}
     (nil))

seem to confuse canonicalize_values_star and lead to infinite loop.
Comment 10 H.J. Lu 2016-01-05 18:21:07 UTC
This change makes it to compile:

diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 07eea84..43a85b7 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -4968,7 +4968,7 @@ dataflow_set_different (dataflow_set *old_set, dataflow_set *new_set)
   if (old_set->vars == new_set->vars)
     return false;
 
-  if (shared_hash_htab (old_set->vars)->elements ()
+  if (0 && shared_hash_htab (old_set->vars)->elements ()
       != shared_hash_htab (new_set->vars)->elements ())
     return true;
Comment 11 Gary Funck 2016-01-05 23:14:37 UTC
Might be a duplicate of a bug I reported, PR69036
Comment 12 H.J. Lu 2016-01-05 23:15:48 UTC
(In reply to H.J. Lu from comment #10)
> This change makes it to compile:
> 
> diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
> index 07eea84..43a85b7 100644
> --- a/gcc/var-tracking.c
> +++ b/gcc/var-tracking.c
> @@ -4968,7 +4968,7 @@ dataflow_set_different (dataflow_set *old_set,
> dataflow_set *new_set)
>    if (old_set->vars == new_set->vars)
>      return false;
>  
> -  if (shared_hash_htab (old_set->vars)->elements ()
> +  if (0 && shared_hash_htab (old_set->vars)->elements ()
>        != shared_hash_htab (new_set->vars)->elements ())
>      return true;

This patch bootstraps GCC using -O3 without any regressions.
Comment 13 H.J. Lu 2016-01-05 23:16:52 UTC
*** Bug 69036 has been marked as a duplicate of this bug. ***
Comment 14 Alexandre Oliva 2016-01-07 03:48:05 UTC
Mine.

H.J., thanks for the reduced testcase.  I'm afraid the patch is not right, though.  All the test you removed does is to allow an optimization in the comparison between two dataflow sets, namely, to refrain from comparing each element of the second set with corresponding element in the first, after comparing each element of the first set with the corresponding element in the second.  There's a comment at the end of dataflow_set_different_p to that effect, and removing the optimization will result in incorrect dataflow analysis, because GCC won't notice when entries have been added to the new dataflow set.

I'm looking into it to figure out why it's oscillating.
Comment 15 Alexandre Oliva 2016-01-09 01:37:49 UTC
Created attachment 37284 [details]
Patch I'm testing to fix the bug

The problem arises because we used to drop overwritten MEMs from loc lists of VALUEs, but not of other onepart variables, and it just so happens that, by doing so, block 6 has no D#5 in its output in the first pass, because the MEM holding its (previous) value was correctly dropped from value 88:88, but gains it in pass 2 because D#5 has the MEM location incoming directly in its loc list, rather than indirectly in a VALUE.  This incorrect binding enables other blocks to believe they have a tentative binding for D#5 in some cycles, but others, still operating on the early conclusion, believe there isn't, and they oscillate.

Since we check for escaping MEMs in clobbers, we won't lose anything relevant by dropping call-clobbered or overwritten MEMs in all onepart variables.

The attached patch also includes some changes to the dataflow set compare function to make it more verbose when requested, which helped me figure out what was going on.  I'll probably split it out into a separate patch when I submit the patch.
Comment 16 Alexandre Oliva 2016-01-11 10:40:44 UTC
Author: aoliva
Date: Mon Jan 11 10:40:12 2016
New Revision: 232217

URL: https://gcc.gnu.org/viewcvs?rev=232217&root=gcc&view=rev
Log:
[PR69123] make dataflow_set_different details more verbose

for  gcc/ChangeLog

	PR bootstrap/69123
	* var-tracking.c (dump_onepart_variable_differences): New.
	(dataflow_set_different): If a detailed dump is requested,
	delay early returns and dump differences between onepart
	variables present before and after, and added variables.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c
Comment 17 Alexandre Oliva 2016-01-11 10:41:04 UTC
Author: aoliva
Date: Mon Jan 11 10:40:33 2016
New Revision: 232218

URL: https://gcc.gnu.org/viewcvs?rev=232218&root=gcc&view=rev
Log:
[PR69123] fix handling of MEMs in VTA to avoid dataflow oscillation

The problem arises because we used to drop overwritten MEMs from loc
lists of VALUEs, but not of other onepart variables, and it just so
happens that, by doing so, block 6 in the testcase has no D#5 in its
output in the first pass, because the MEM holding its (previous) value
was correctly dropped from value 88:88, but gains it in the second
pass because D#5 has the MEM location incoming directly in its loc
list, rather than indirectly in a VALUE.

This incorrect binding enables other blocks to believe they have a
tentative binding for D#5 in some cycles, but others, still operating
on the early conclusion, believe there isn't, and they oscillate from
that.

Since we check for escaping MEMs in clobbers, we won't lose anything
relevant by dropping call-clobbered or overwritten MEMs in all onepart
variables, and this ensures the loc intersection operation in onepart
vars won't let a MEM through that wasn't present in earlier
iterations.

for  gcc/ChangeLog

	PR bootstrap/69123
	* var-tracking.c (drop_overlapping_mem_locs): Operate on all
	onepart vars.  Fix typo in comment.  Fix reversed condition in
	unshare test.
	(dataflow_set_remove_mem_locs): Operate on all onepart vars.

for gcc/testsuite/ChangeLog

	PR bootstrap/69123
	* g++.dg/pr69123.C: New.

Added:
    trunk/gcc/testsuite/g++.dg/pr69123.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/var-tracking.c
Comment 18 Alexandre Oliva 2016-01-11 10:45:45 UTC
Fixed