[511] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20210528 (experimental) [master revision 4774807e6e5:0e2d976f72b:cd62d089f6021fd1ad4537b8182836d15b14514f] (GCC) [512] % [512] % gcctk -O0 small.c; ./a.out [513] % [513] % gcctk -O1 small.c [514] % ./a.out Aborted [515] % [515] % cat small.c int a, b = 1, c = 1, e, f = 1, g, h, j; volatile int d; static void k() { int i; h = b; if (c && a >= 0) { while (a) { i++; h--; } if (g) for (h = 0; h < 2; h++) ; if (!b) i &&d; } } static void l() { for (; j < 1; j++) if (!e && c && f) k(); } int main() { if (f) l(); if (h != 1) __builtin_abort(); return 0; }
Started with r12-397-gda9e6e63d1ae22e530ec7baf59f6ed028bf05776.
Isn't the variable "a" uninitialized in this code. If "a" happens to have the value zero, the code works fine; but if "a" is non-zero, the value of "h" changes value in calls to k() and the program aborts?
a is not an automatic variable and as such it is zero initialized when it doesn't have explicit initializer.
I believe this bug occurs during the .195t.ccp4 pass that was introduced by the commit identified above, where tree-ssa-propagate.c's substitute_and_fold_engine appears not to correctly handle the situation of a completely empty basic block (or the CFG flags describing it are incorrect). Things are fine (but unusual) with the incoming GIMPLE: <bb 11> [local count: 69772953]: # i_30 = PHI <i_16(10)> # h_lsm.23_40 = PHI <h_lsm.23_29(10)> if (g.12_20 != 0) goto <bb 12>; [50.00%] else goto <bb 13>; [50.00%] <bb 12> [local count: 282263306]: <bb 13> [local count: 69772953]: # h_lsm.23_27 = PHI <h_lsm.23_40(11), 2(12)> _31 = i_30 != 0; _14 = _10 & _31; if (_14 != 0) goto <bb 14>; [25.00%] else goto <bb 15>; [75.00%] Notice that the only purpose of the empty bb12 is to define the phi eddges into bb13, one edge sets h_lsm to 2, the other doesn't. Alas, this logic gets overlooked by ccp4, where TDF_DETAILS reports: Folding PHI node: h_lsm.23_40 = PHI <h_lsm.23_29(10)> No folding possible Folding PHI node: h_lsm.23_27 = PHI <h_lsm.23_40(11), 2(12)> Queued PHI for removal. Folds to: 2 ... Removing basic block 12 ;; basic block 12, loop depth 1 ;; pred: ;; succ: 13 Merging blocks 11 and 13 And things do downhill from there. Alas I'm still trying to figure exactly how the "Folds to: 2" is (mis)deduced, but I thought I'd share my analysis so far so that a real tree-ssa expert can confirm what's supposed to happen. A useful workaround for debugging is: diff --git a/gcc/passes.def b/gcc/passes.def index 26d86df..c2806bc 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -339,7 +339,9 @@ along with GCC; see the file COPYING3. If not see /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */ +#if 0 NEXT_PASS (pass_ccp, true /* nonzero_p */); +#endif NEXT_PASS (pass_warn_restrict); NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce, true /* update_address_taken_p */); Another workaround, for this particular testcase, is -fno-tree-loop-sm, but the real culprit (as shown above) is inside pass_ccp (or in the invariants it's relying on).
# i_16 = PHI <i_25(D)(21), i_17(9)> .... _22 = (unsigned int) i_16; _48 = -_22; _49 = _39 - _22; h_lsm.23_29 = (int) _49; Somehow or another h gets its value from i which causes the incorrect code. ccp is not at fault.
GCC does warn about an unitialized variable i even in "i && d" For: if (!b) i &&d; b is even set to 1. Note setting b to be 0 still get the wrong code.
(In reply to Andrew Pinski from comment #6) > GCC does warn about an unitialized variable i even in "i && d" > For: > if (!b) > i &&d; > > b is even set to 1. Note setting b to be 0 still get the wrong code. The bug is IV-OPTS which selects a variable which is based on an unintialized value. CCP is not the problem, it was just happen to find that h_lsm.23_29 is based on an unintialized value and props the undefinedness forward.
Patch proposed. https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html
Btw, while in the abstract machine sense for the testcase 'i' is never actually read the SSA GIMPLE already exposes an uninitialized read of it: void k () { ... <bb 3> : a.9_3 = a; if (a.9_3 >= 0) goto <bb 5>; [INV] else goto <bb 15>; [INV] <bb 4> : i_25 = i_13 + 1; h.10_4 = h; _5 = h.10_4 + -1; h = _5; <bb 5> : # i_13 = PHI <i_20(D)(3), i_25(4)> a.11_6 = a; if (a.11_6 != 0) goto <bb 4>; [INV] else goto <bb 6>; [INV] that's because for while (a) { } which is not entered because 'a' is zero we have the PHI node for 'i' since the condition is part of the loop and SSA form requires us to represent the value of 'i' on the entry edge. That means those uses are not real uses and thus do not invoke undefined behavior (at the point of execution of the PHI node). That would be different from tem_2 = i_20(D); <bbN> # tem_3 = PHI <tem_2, ..> but copy propagation will turn this into the other case (but that would be conservatively OK). Note that for the testcase IVOPTs inserts the rewritten uses in a place where i is now used unconditionally turning if (a.9_15 >= 0) goto <bb 20>; [59.00%] else goto <bb 15>; [41.00%] <bb 20> [local count: 139545903]: goto <bb 8>; [100.00%] <bb 7> [local count: 564526613]: i_17 = i_16 + 1; _19 = h_lsm.23_10 + -1; <bb 8> [local count: 634299566]: # i_16 = PHI <i_24(D)(20), i_17(7)> # h_lsm.23_10 = PHI <b.7_13(20), _19(7)> if (a.9_15 != 0) goto <bb 7>; [89.00%] else goto <bb 29>; [11.00%] <bb 29> [local count: 69772953]: into if (a.9_15 >= 0) goto <bb 20>; [59.00%] else goto <bb 15>; [41.00%] <bb 20> [local count: 139545903]: goto <bb 8>; [100.00%] <bb 7> [local count: 564526613]: i_17 = i_16 + 1; <bb 8> [local count: 634299566]: # i_16 = PHI <i_24(D)(20), i_17(7)> _28 = (unsigned int) b.7_13; _41 = (unsigned int) i_24(D); _51 = _28 + _41; _22 = (unsigned int) i_16; _48 = -_22; _49 = _48 + _51; h_lsm.23_10 = (int) _49; if (a.9_15 != 0) goto <bb 7>; [89.00%] else goto <bb 29>; [11.00%] <bb 29> [local count: 69772953]: when it eliminates the h_lsm.23 IV and rewrites it in terms of i.
I've posted a(nother) patch.
*** Bug 102902 has been marked as a duplicate of this bug. ***
Testcase from PR102902, since it runtime times-out difficult to adapt to the testsuite bug fixed by both proposed patches. [576] % gcctk -O2 small.c; ./a.out [577] % [577] % gcctk -O3 small.c [578] % timeout -s 9 10 ./a.out Killed [579] % [579] % cat small.c int printf (const char *, ...); int a, b, c, d, e, f; int main() { int g; short h = 1; for (; e < 2; e++) { L1: f = 1; while (b > 0 || a > 0) { g++; h++; printf("%d", g); } L2: if (!h && (!c || a)) goto L1; if (c) goto L2; } return 0; }
*** Bug 105337 has been marked as a duplicate of this bug. ***
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ab91c10792cd3a1ba1495aa30a34ca17b043bafb commit r12-8241-gab91c10792cd3a1ba1495aa30a34ca17b043bafb Author: Richard Biener <rguenther@suse.de> Date: Fri Apr 1 11:30:00 2022 +0200 tree-optimization/100810 - avoid undefs in IVOPT rewrites The following attempts to avoid IVOPTs rewriting uses using IV candidates that involve undefined behavior by using uninitialized SSA names. First we restrict the set of candidates we produce for such IVs to the original ones and mark them as not important. Second we try to only allow expressing uses with such IV if they originally use them. That is to avoid rewriting all such uses in terms of other IVs. Since cand->iv and use->iv seem to never exactly match up we resort to comparing the IV bases. The approach ends up similar to the one posted by Roger at https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html but it marks IV candidates rather than use groups and the cases we allow in determine_group_iv_cost_generic are slightly different. 2022-01-04 Richard Biener <rguenther@suse.de> PR tree-optimization/100810 * tree-ssa-loop-ivopts.cc (struct iv_cand): Add involves_undefs flag. (find_ssa_undef): New function. (add_candidate_1): Avoid adding derived candidates with undefined SSA names and mark the original ones. (determine_group_iv_cost_generic): Reject rewriting uses with a different IV when that involves undefined SSA names. * gcc.dg/torture/pr100810.c: New testcase. * gcc.dg/torture/pr105337.c: Likewise.
Fixed.
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>: https://gcc.gnu.org/g:be2861fe8c527a5952257462ceca899bb43b1452 commit r13-972-gbe2861fe8c527a5952257462ceca899bb43b1452 Author: Alexandre Oliva <oliva@adacore.com> Date: Fri Jun 3 03:59:03 2022 -0300 [PR105665] ivopts: check defs of names in base for undefs The patch for PR 100810 tested for undefined SSA_NAMEs appearing directly in the base expression of the potential IV candidate, but that's not enough. The testcase for PR105665 shows an undefined SSA_NAME has the same ill effect if it's referenced as an PHI_NODE arg in the referenced SSA_NAME. The variant of that test shows it can be further removed from the referenced SSA_NAME. To avoid deep recursion, precompute maybe-undefined SSA_NAMEs: start from known-undefined nonvirtual default defs, and propagate them to any PHI nodes reached by a maybe-undefined arg, as long as there aren't intervening non-PHI uses, that would imply the maybe-undefined name must be defined at that point, otherwise it would invoke undefined behavior. Also test for intervening non-PHI uses of DEFs in the base expr. The test for intervening uses implemented herein relies on dominance; this could be further extended, regarding conditional uses in every path leading to a point as an unconditional use dominating that point, but I haven't implemented that. for gcc/ChangeLog PR tree-optimization/105665 PR tree-optimization/100810 * tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): New. (ssa_name_any_use_dominates_bb_p, mark_ssa_maybe_undefs): New. (find_ssa_undef): Check precomputed flag and intervening uses. (tree_ssa_iv_optimize): Call mark_ssa_maybe_undefs. for gcc/testsuite/ChangeLog PR tree-optimization/105665 PR tree-optimization/100810 * gcc.dg/torture/pr105665.c: New.
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:4ca164a6372412a1b3dc62b6fbde77d338e22bd4 commit r12-8624-g4ca164a6372412a1b3dc62b6fbde77d338e22bd4 Author: Alexandre Oliva <oliva@adacore.com> Date: Fri Jun 3 03:59:03 2022 -0300 [PR105665] ivopts: check defs of names in base for undefs The patch for PR 100810 tested for undefined SSA_NAMEs appearing directly in the base expression of the potential IV candidate, but that's not enough. The testcase for PR105665 shows an undefined SSA_NAME has the same ill effect if it's referenced as an PHI_NODE arg in the referenced SSA_NAME. The variant of that test shows it can be further removed from the referenced SSA_NAME. To avoid deep recursion, precompute maybe-undefined SSA_NAMEs: start from known-undefined nonvirtual default defs, and propagate them to any PHI nodes reached by a maybe-undefined arg, as long as there aren't intervening non-PHI uses, that would imply the maybe-undefined name must be defined at that point, otherwise it would invoke undefined behavior. Also test for intervening non-PHI uses of DEFs in the base expr. The test for intervening uses implemented herein relies on dominance; this could be further extended, regarding conditional uses in every path leading to a point as an unconditional use dominating that point, but I haven't implemented that. for gcc/ChangeLog PR tree-optimization/105665 PR tree-optimization/100810 * tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): New. (ssa_name_any_use_dominates_bb_p, mark_ssa_maybe_undefs): New. (find_ssa_undef): Check precomputed flag and intervening uses. (tree_ssa_iv_optimize): Call mark_ssa_maybe_undefs. for gcc/testsuite/ChangeLog PR tree-optimization/105665 PR tree-optimization/100810 * gcc.dg/torture/pr105665.c: New. (cherry picked from commit be2861fe8c527a5952257462ceca899bb43b1452)