On x86-64, --with-build-config='bootstrap-O3 bootstrap-debug' miscompiled stage 2 into an infinite loop: (gdb) bt #0 0x0000000000e01141 in get_ref_base_and_extent(tree_node*, long*, long*, long*, bool*) () #1 0x0000000000ebb061 in ao_ref_base(ao_ref*) () #2 0x00000000008db7b2 in ao_ref_from_mem(ao_ref*, rtx_def const*) () #3 0x00000000008dbd2d in rtx_refs_may_alias_p(rtx_def const*, rtx_def const*, bool) () #4 0x00000000008e001a in true_dependence_1(rtx_def const*, machine_mode, rtx_def*, rtx_def const*, rtx_def*, bool) () #5 0x0000000001094a6d in drop_overlapping_mem_locs(variable**, overlapping_mems*) () #6 0x0000000001094bd9 in clobber_overlapping_mems(dataflow_set*, rtx_def*) () #7 0x0000000001099bf3 in val_store(dataflow_set*, rtx_def*, rtx_def*, rtx_insn*, bool) () #8 0x000000000109e086 in compute_bb_dataflow(basic_block_def*) [clone .isra.135] () #9 0x000000000109f4ca in vt_find_locations() () #10 0x00000000010a15e0 in variable_tracking_main_1() () #11 0x00000000010a178c in (anonymous namespace)::pass_variable_tracking::execute(function*) () #12 0x0000000000cb7d74 in execute_one_pass(opt_pass*) () #13 0x0000000000cb8a55 in execute_pass_list_1(opt_pass*) () #14 0x0000000000cb8cea in execute_pass_list(function*, opt_pass*) () #15 0x0000000000964254 in cgraph_node::expand() () ---Type <return> to continue, or q <return> to quit--- #16 0x0000000000965c07 in symbol_table::compile() [clone .part.49] () #17 0x0000000000968293 in symbol_table::finalize_compilation_unit() () #18 0x0000000000dad023 in compile_file() () #19 0x00000000005d2186 in toplev::main(int, char**) () #20 0x00000000005d3db7 in main () (gdb)
It was caused by r231674.
(In reply to H.J. Lu from comment #1) > It was caused by r231674. In this case I think exposed by is the correct term as this just changes the cost algorithm. Also do you have a short testcase of what is being miscompiled?
tree-vect-slp.o is miscompiled. When I replaced tree-vect-slp.o from r231672 build, cc1plus finished.
r231674 may have exposed a latent bug in var-tracking.c since the same file compiles without -g.
Created attachment 37226 [details] A testcase ./xgcc -B./ -S -O3 -g /tmp/x.ii -fno-rtti -fno-exceptions never finishes.
I bet vt_find_location must be oscillating for some reason and thus not converging. This is on the get_address_cost routine. Alex, do you think you could have a look? Thanks.
Created attachment 37228 [details] A smaller testcase
A small testcase: [hjl@gnu-mic-2 i386]$ cat /tmp/x.ii struct xxx_def; typedef xxx_def *xxx; union rtxxx { const char *rt_str; xxx rt_xxx; }; struct xxx_def { union u { rtxxx fld[1]; } u; }; extern xxx bar (void); extern int foo1 (xxx); static inline xxx foo2 (xxx arg0, xxx arg1) { xxx rt; rt = bar (); (((rt)->u.fld[0]).rt_xxx) = arg0; (((rt)->u.fld[1]).rt_xxx) = arg1; return rt; } static inline xxx foo4 (const char *arg0 ) { xxx rt; rt = bar (); (((rt)->u.fld[0]).rt_str) = arg0; (((rt)->u.fld[1]).rt_xxx) = (xxx) 0; return rt; } extern xxx foo5 (long); struct address_cost_data { unsigned costs[2][2][2][2]; }; void get_address_cost (address_cost_data *data) { unsigned acost; long i; long rat, off = 0; unsigned sym_p, var_p, off_p, rat_p; xxx addr, base; xxx reg0, reg1; reg1 = bar (); addr = foo2 (reg1, (xxx) 0); rat = 1; acost = 0; reg0 = bar (); reg1 = bar (); for (i = 0; i < 16; i++) { sym_p = i & 1; var_p = (i >> 1) & 1; off_p = (i >> 2) & 1; rat_p = (i >> 3) & 1; addr = reg0; if (rat_p) addr = foo2 (addr, foo5 (rat)) ; if (var_p) addr = foo2 (addr, reg1); if (sym_p) base = foo4 (""); else if (off_p) base = foo5 (off); else base = (xxx) 0; if (base) addr = foo2 (addr, base); acost = foo1 (addr); data->costs[sym_p][var_p][off_p][rat_p] = acost; } } [hjl@gnu-mic-2 i386]$ ./cc1plus -fpreprocessed /tmp/x.ii -quiet -dumpbase x.ii -mtune=generic -march=x86-64 -auxbase x -g -O3 -version -fno-rtti -fno-exceptions -o x.s
(insn:TI 248 289 246 8 (set (reg:V2DI 21 xmm0 [130]) (mem/c:V2DI (plus:DI (reg/f:DI 7 sp) (const_int 16 [0x10])) [9 %sfp+-32 S16 A128])) /tmp/x.ii:24 1215 {*movv2di_internal} (nil)) ... (insn:TI 250 118 120 9 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 16 [0x10])) [9 %sfp+-32 S8 A128]) (reg/v/f:DI 0 ax [orig:120 base ] [120])) /tmp/x.ii:33 85 {*movdi_internal} (nil)) seem to confuse canonicalize_values_star and lead to infinite loop.
This change makes it to compile: diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 07eea84..43a85b7 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -4968,7 +4968,7 @@ dataflow_set_different (dataflow_set *old_set, dataflow_set *new_set) if (old_set->vars == new_set->vars) return false; - if (shared_hash_htab (old_set->vars)->elements () + if (0 && shared_hash_htab (old_set->vars)->elements () != shared_hash_htab (new_set->vars)->elements ()) return true;
Might be a duplicate of a bug I reported, PR69036
(In reply to H.J. Lu from comment #10) > This change makes it to compile: > > diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c > index 07eea84..43a85b7 100644 > --- a/gcc/var-tracking.c > +++ b/gcc/var-tracking.c > @@ -4968,7 +4968,7 @@ dataflow_set_different (dataflow_set *old_set, > dataflow_set *new_set) > if (old_set->vars == new_set->vars) > return false; > > - if (shared_hash_htab (old_set->vars)->elements () > + if (0 && shared_hash_htab (old_set->vars)->elements () > != shared_hash_htab (new_set->vars)->elements ()) > return true; This patch bootstraps GCC using -O3 without any regressions.
*** Bug 69036 has been marked as a duplicate of this bug. ***
Mine. H.J., thanks for the reduced testcase. I'm afraid the patch is not right, though. All the test you removed does is to allow an optimization in the comparison between two dataflow sets, namely, to refrain from comparing each element of the second set with corresponding element in the first, after comparing each element of the first set with the corresponding element in the second. There's a comment at the end of dataflow_set_different_p to that effect, and removing the optimization will result in incorrect dataflow analysis, because GCC won't notice when entries have been added to the new dataflow set. I'm looking into it to figure out why it's oscillating.
Created attachment 37284 [details] Patch I'm testing to fix the bug The problem arises because we used to drop overwritten MEMs from loc lists of VALUEs, but not of other onepart variables, and it just so happens that, by doing so, block 6 has no D#5 in its output in the first pass, because the MEM holding its (previous) value was correctly dropped from value 88:88, but gains it in pass 2 because D#5 has the MEM location incoming directly in its loc list, rather than indirectly in a VALUE. This incorrect binding enables other blocks to believe they have a tentative binding for D#5 in some cycles, but others, still operating on the early conclusion, believe there isn't, and they oscillate. Since we check for escaping MEMs in clobbers, we won't lose anything relevant by dropping call-clobbered or overwritten MEMs in all onepart variables. The attached patch also includes some changes to the dataflow set compare function to make it more verbose when requested, which helped me figure out what was going on. I'll probably split it out into a separate patch when I submit the patch.
Author: aoliva Date: Mon Jan 11 10:40:12 2016 New Revision: 232217 URL: https://gcc.gnu.org/viewcvs?rev=232217&root=gcc&view=rev Log: [PR69123] make dataflow_set_different details more verbose for gcc/ChangeLog PR bootstrap/69123 * var-tracking.c (dump_onepart_variable_differences): New. (dataflow_set_different): If a detailed dump is requested, delay early returns and dump differences between onepart variables present before and after, and added variables. Modified: trunk/gcc/ChangeLog trunk/gcc/var-tracking.c
Author: aoliva Date: Mon Jan 11 10:40:33 2016 New Revision: 232218 URL: https://gcc.gnu.org/viewcvs?rev=232218&root=gcc&view=rev Log: [PR69123] fix handling of MEMs in VTA to avoid dataflow oscillation The problem arises because we used to drop overwritten MEMs from loc lists of VALUEs, but not of other onepart variables, and it just so happens that, by doing so, block 6 in the testcase has no D#5 in its output in the first pass, because the MEM holding its (previous) value was correctly dropped from value 88:88, but gains it in the second pass because D#5 has the MEM location incoming directly in its loc list, rather than indirectly in a VALUE. This incorrect binding enables other blocks to believe they have a tentative binding for D#5 in some cycles, but others, still operating on the early conclusion, believe there isn't, and they oscillate from that. Since we check for escaping MEMs in clobbers, we won't lose anything relevant by dropping call-clobbered or overwritten MEMs in all onepart variables, and this ensures the loc intersection operation in onepart vars won't let a MEM through that wasn't present in earlier iterations. for gcc/ChangeLog PR bootstrap/69123 * var-tracking.c (drop_overlapping_mem_locs): Operate on all onepart vars. Fix typo in comment. Fix reversed condition in unshare test. (dataflow_set_remove_mem_locs): Operate on all onepart vars. for gcc/testsuite/ChangeLog PR bootstrap/69123 * g++.dg/pr69123.C: New. Added: trunk/gcc/testsuite/g++.dg/pr69123.C Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/var-tracking.c
Fixed