Bug 100810 - [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r12-397-gda9e6e63d1ae22e530ec7baf59f6ed028bf05776
Summary: [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r12-397...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Richard Biener
URL:
Keywords: patch, wrong-code
: 102902 105337 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-05-28 08:59 UTC by Zhendong Su
Modified: 2023-12-31 09:18 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-05-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zhendong Su 2021-05-28 08:59:21 UTC
[511] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210528 (experimental) [master revision 4774807e6e5:0e2d976f72b:cd62d089f6021fd1ad4537b8182836d15b14514f] (GCC) 
[512] % 
[512] % gcctk -O0 small.c; ./a.out
[513] % 
[513] % gcctk -O1 small.c
[514] % ./a.out
Aborted
[515] % 
[515] % cat small.c
int a, b = 1, c = 1, e, f = 1, g, h, j;
volatile int d;
static void k() {
  int i;
  h = b;
  if (c && a >= 0) {
    while (a) {
      i++;
      h--;
    }
    if (g)
      for (h = 0; h < 2; h++)
        ;
    if (!b)
      i &&d;
  }
}
static void l() {
  for (; j < 1; j++)
    if (!e && c && f)
      k();
}
int main() {
  if (f)
    l();
  if (h != 1)
    __builtin_abort();
  return 0;
}
Comment 1 Martin Liška 2021-05-28 09:23:27 UTC
Started with r12-397-gda9e6e63d1ae22e530ec7baf59f6ed028bf05776.
Comment 2 Roger Sayle 2021-06-11 08:27:00 UTC
Isn't the variable "a" uninitialized in this code.  If "a" happens to have the value zero, the code works fine; but if "a" is non-zero, the value of "h" changes value in calls to k() and the program aborts?
Comment 3 Jakub Jelinek 2021-06-14 09:37:47 UTC
a is not an automatic variable and as such it is zero initialized when it doesn't have explicit initializer.
Comment 4 Roger Sayle 2021-08-03 21:21:16 UTC
I believe this bug occurs during the .195t.ccp4 pass that was introduced by the commit identified above, where tree-ssa-propagate.c's substitute_and_fold_engine appears not to correctly handle the situation of a completely empty basic block (or the CFG flags describing it are incorrect).

Things are fine (but unusual) with the incoming GIMPLE:

  <bb 11> [local count: 69772953]:
  # i_30 = PHI <i_16(10)>
  # h_lsm.23_40 = PHI <h_lsm.23_29(10)>
  if (g.12_20 != 0)
    goto <bb 12>; [50.00%]
  else
    goto <bb 13>; [50.00%]

  <bb 12> [local count: 282263306]:

  <bb 13> [local count: 69772953]:
  # h_lsm.23_27 = PHI <h_lsm.23_40(11), 2(12)>
  _31 = i_30 != 0;
  _14 = _10 & _31;
  if (_14 != 0)
    goto <bb 14>; [25.00%]
  else
    goto <bb 15>; [75.00%]

Notice that the only purpose of the empty bb12 is to define the phi eddges
into bb13, one edge sets h_lsm to 2, the other doesn't.  Alas, this logic gets overlooked by ccp4, where TDF_DETAILS reports:

Folding PHI node: h_lsm.23_40 = PHI <h_lsm.23_29(10)>
No folding possible
Folding PHI node: h_lsm.23_27 = PHI <h_lsm.23_40(11), 2(12)>
Queued PHI for removal.  Folds to: 2
...
Removing basic block 12
;; basic block 12, loop depth 1
;;  pred:
;;  succ:       13
Merging blocks 11 and 13

And things do downhill from there. Alas I'm still trying to figure exactly how the "Folds to: 2" is (mis)deduced, but I thought I'd share my analysis so far so that a real tree-ssa expert can confirm what's supposed to happen.

A useful workaround for debugging is:
diff --git a/gcc/passes.def b/gcc/passes.def
index 26d86df..c2806bc 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -339,7 +339,9 @@ along with GCC; see the file COPYING3.  If not see
       /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  Instead of just copy_prop, we use ccp to
         compute alignment and nonzero bits.  */
+#if 0
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
+#endif
       NEXT_PASS (pass_warn_restrict);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_cd_dce, true /* update_address_taken_p */);

Another workaround, for this particular testcase, is -fno-tree-loop-sm, but the real culprit (as shown above) is inside pass_ccp (or in the invariants it's relying on).
Comment 5 Andrew Pinski 2021-08-03 21:36:51 UTC
  # i_16 = PHI <i_25(D)(21), i_17(9)>
....
  _22 = (unsigned int) i_16;
  _48 = -_22;
  _49 = _39 - _22;
  h_lsm.23_29 = (int) _49;

Somehow or another h gets its value from i which causes the incorrect code.
ccp is not at fault.
Comment 6 Andrew Pinski 2021-08-03 21:41:15 UTC
GCC does warn about an unitialized variable i even in "i && d"
For:
    if (!b)
      i &&d;

b is even set to 1.  Note setting b to be 0 still get the wrong code.
Comment 7 Andrew Pinski 2021-08-03 21:47:39 UTC
(In reply to Andrew Pinski from comment #6)
> GCC does warn about an unitialized variable i even in "i && d"
> For:
>     if (!b)
>       i &&d;
> 
> b is even set to 1.  Note setting b to be 0 still get the wrong code.

The bug is IV-OPTS which selects a variable which is based on an unintialized value.
CCP is not the problem, it was just happen to find that h_lsm.23_29 is based on an unintialized value and props the undefinedness forward.
Comment 8 Roger Sayle 2021-08-31 06:23:26 UTC
Patch proposed.  https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html
Comment 9 Richard Biener 2022-04-01 11:06:21 UTC
Btw, while in the abstract machine sense for the testcase 'i' is never actually read the SSA GIMPLE already exposes an uninitialized read of it:

void k ()
{
...
  <bb 3> :
  a.9_3 = a;
  if (a.9_3 >= 0)
    goto <bb 5>; [INV]
  else
    goto <bb 15>; [INV]

  <bb 4> :
  i_25 = i_13 + 1;
  h.10_4 = h;
  _5 = h.10_4 + -1;
  h = _5;

  <bb 5> :
  # i_13 = PHI <i_20(D)(3), i_25(4)>
  a.11_6 = a;
  if (a.11_6 != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 6>; [INV]

that's because for

  while (a) {
  }

which is not entered because 'a' is zero we have the PHI node for 'i' since
the condition is part of the loop and SSA form requires us to represent
the value of 'i' on the entry edge.

That means those uses are not real uses and thus do not invoke undefined
behavior (at the point of execution of the PHI node).  That would be
different from

  tem_2 = i_20(D);

<bbN>
  # tem_3 = PHI <tem_2, ..>

but copy propagation will turn this into the other case (but that would
be conservatively OK).


Note that for the testcase IVOPTs inserts the rewritten uses in a place
where i is now used unconditionally turning

  if (a.9_15 >= 0)
    goto <bb 20>; [59.00%]
  else
    goto <bb 15>; [41.00%]

  <bb 20> [local count: 139545903]:
  goto <bb 8>; [100.00%]

  <bb 7> [local count: 564526613]:
  i_17 = i_16 + 1;
  _19 = h_lsm.23_10 + -1;

  <bb 8> [local count: 634299566]:
  # i_16 = PHI <i_24(D)(20), i_17(7)>
  # h_lsm.23_10 = PHI <b.7_13(20), _19(7)>
  if (a.9_15 != 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 29>; [11.00%]

  <bb 29> [local count: 69772953]:

into

  if (a.9_15 >= 0)
    goto <bb 20>; [59.00%]
  else
    goto <bb 15>; [41.00%]

  <bb 20> [local count: 139545903]:
  goto <bb 8>; [100.00%]

  <bb 7> [local count: 564526613]:
  i_17 = i_16 + 1;

  <bb 8> [local count: 634299566]:
  # i_16 = PHI <i_24(D)(20), i_17(7)>
  _28 = (unsigned int) b.7_13;
  _41 = (unsigned int) i_24(D);
  _51 = _28 + _41;
  _22 = (unsigned int) i_16;
  _48 = -_22;
  _49 = _48 + _51;
  h_lsm.23_10 = (int) _49;
  if (a.9_15 != 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 29>; [11.00%]

  <bb 29> [local count: 69772953]:

when it eliminates the h_lsm.23 IV and rewrites it in terms of i.
Comment 10 Richard Biener 2022-04-01 11:08:56 UTC
I've posted a(nother) patch.
Comment 11 Richard Biener 2022-04-04 12:59:59 UTC
*** Bug 102902 has been marked as a duplicate of this bug. ***
Comment 12 Richard Biener 2022-04-04 13:00:51 UTC
Testcase from PR102902, since it runtime times-out difficult to adapt to the testsuite bug fixed by both proposed patches.

[576] % gcctk -O2 small.c; ./a.out
[577] % 
[577] % gcctk -O3 small.c
[578] % timeout -s 9 10 ./a.out
Killed
[579] % 
[579] % cat small.c
int printf (const char *, ...);
int a, b, c, d, e, f;
int main() {
  int g;
  short h = 1;
  for (; e < 2; e++) {
  L1:
    f = 1;
    while (b > 0 || a > 0) {
      g++;
      h++;
      printf("%d", g);
    }
  L2:
    if (!h && (!c || a))
      goto L1;
    if (c)
      goto L2;
  }
  return 0;
}
Comment 13 Richard Biener 2022-04-22 06:11:54 UTC
*** Bug 105337 has been marked as a duplicate of this bug. ***
Comment 14 GCC Commits 2022-04-25 07:52:43 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ab91c10792cd3a1ba1495aa30a34ca17b043bafb

commit r12-8241-gab91c10792cd3a1ba1495aa30a34ca17b043bafb
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Apr 1 11:30:00 2022 +0200

    tree-optimization/100810 - avoid undefs in IVOPT rewrites
    
    The following attempts to avoid IVOPTs rewriting uses using
    IV candidates that involve undefined behavior by using uninitialized
    SSA names.  First we restrict the set of candidates we produce
    for such IVs to the original ones and mark them as not important.
    Second we try to only allow expressing uses with such IV if they
    originally use them.  That is to avoid rewriting all such uses
    in terms of other IVs.  Since cand->iv and use->iv seem to never
    exactly match up we resort to comparing the IV bases.
    
    The approach ends up similar to the one posted by Roger at
    https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html
    but it marks IV candidates rather than use groups and the cases
    we allow in determine_group_iv_cost_generic are slightly different.
    
    2022-01-04  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/100810
            * tree-ssa-loop-ivopts.cc (struct iv_cand): Add involves_undefs flag.
            (find_ssa_undef): New function.
            (add_candidate_1): Avoid adding derived candidates with
            undefined SSA names and mark the original ones.
            (determine_group_iv_cost_generic): Reject rewriting
            uses with a different IV when that involves undefined SSA names.
    
            * gcc.dg/torture/pr100810.c: New testcase.
            * gcc.dg/torture/pr105337.c: Likewise.
Comment 15 Richard Biener 2022-04-25 07:54:01 UTC
Fixed.
Comment 16 GCC Commits 2022-06-03 07:05:05 UTC
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>:

https://gcc.gnu.org/g:be2861fe8c527a5952257462ceca899bb43b1452

commit r13-972-gbe2861fe8c527a5952257462ceca899bb43b1452
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Fri Jun 3 03:59:03 2022 -0300

    [PR105665] ivopts: check defs of names in base for undefs
    
    The patch for PR 100810 tested for undefined SSA_NAMEs appearing
    directly in the base expression of the potential IV candidate, but
    that's not enough.  The testcase for PR105665 shows an undefined
    SSA_NAME has the same ill effect if it's referenced as an PHI_NODE arg
    in the referenced SSA_NAME.  The variant of that test shows it can be
    further removed from the referenced SSA_NAME.
    
    To avoid deep recursion, precompute maybe-undefined SSA_NAMEs: start
    from known-undefined nonvirtual default defs, and propagate them to
    any PHI nodes reached by a maybe-undefined arg, as long as there
    aren't intervening non-PHI uses, that would imply the maybe-undefined
    name must be defined at that point, otherwise it would invoke
    undefined behavior.  Also test for intervening non-PHI uses of DEFs in
    the base expr.
    
    The test for intervening uses implemented herein relies on dominance;
    this could be further extended, regarding conditional uses in every
    path leading to a point as an unconditional use dominating that point,
    but I haven't implemented that.
    
    
    for  gcc/ChangeLog
    
            PR tree-optimization/105665
            PR tree-optimization/100810
            * tree-ssa-loop-ivopts.cc
            (ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): New.
            (ssa_name_any_use_dominates_bb_p, mark_ssa_maybe_undefs): New.
            (find_ssa_undef): Check precomputed flag and intervening uses.
            (tree_ssa_iv_optimize): Call mark_ssa_maybe_undefs.
    
    for  gcc/testsuite/ChangeLog
    
            PR tree-optimization/105665
            PR tree-optimization/100810
            * gcc.dg/torture/pr105665.c: New.
Comment 17 GCC Commits 2022-07-27 09:24:10 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:4ca164a6372412a1b3dc62b6fbde77d338e22bd4

commit r12-8624-g4ca164a6372412a1b3dc62b6fbde77d338e22bd4
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Fri Jun 3 03:59:03 2022 -0300

    [PR105665] ivopts: check defs of names in base for undefs
    
    The patch for PR 100810 tested for undefined SSA_NAMEs appearing
    directly in the base expression of the potential IV candidate, but
    that's not enough.  The testcase for PR105665 shows an undefined
    SSA_NAME has the same ill effect if it's referenced as an PHI_NODE arg
    in the referenced SSA_NAME.  The variant of that test shows it can be
    further removed from the referenced SSA_NAME.
    
    To avoid deep recursion, precompute maybe-undefined SSA_NAMEs: start
    from known-undefined nonvirtual default defs, and propagate them to
    any PHI nodes reached by a maybe-undefined arg, as long as there
    aren't intervening non-PHI uses, that would imply the maybe-undefined
    name must be defined at that point, otherwise it would invoke
    undefined behavior.  Also test for intervening non-PHI uses of DEFs in
    the base expr.
    
    The test for intervening uses implemented herein relies on dominance;
    this could be further extended, regarding conditional uses in every
    path leading to a point as an unconditional use dominating that point,
    but I haven't implemented that.
    
    for  gcc/ChangeLog
    
            PR tree-optimization/105665
            PR tree-optimization/100810
            * tree-ssa-loop-ivopts.cc
            (ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): New.
            (ssa_name_any_use_dominates_bb_p, mark_ssa_maybe_undefs): New.
            (find_ssa_undef): Check precomputed flag and intervening uses.
            (tree_ssa_iv_optimize): Call mark_ssa_maybe_undefs.
    
    for  gcc/testsuite/ChangeLog
    
            PR tree-optimization/105665
            PR tree-optimization/100810
            * gcc.dg/torture/pr105665.c: New.
    
    (cherry picked from commit be2861fe8c527a5952257462ceca899bb43b1452)