Link to the Compiler Explorer: https://godbolt.org/z/51v98a6fd Reproducer: #include <stdio.h> long long int arr_248 = 3623214276426624192LL; unsigned short arr_255; void test() __attribute__((noipa)); char a = 42; const long long &min(const long long &c, const long long &d) { return c < d ? c : d; } void test() { arr_255 = min(a, min(a, arr_248) + 5713568809962283044LL); } int main() { test(); printf("%u\n", arr_255); if (arr_255 != 42) __builtin_abort(); } Error: >$ g++ -O1 small.cpp && ./a.out 42 >$ g++ -O2 small.cpp && ./a.out 3300 Aborted (core dumped) gcc version 12.0.1 20220401 (git://gcc.gnu.org/git/gcc.git:master 15d683d4f0b390b27c54a7c92c6e4f33195bdc93)
Started with r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc
long long int c = 3623214276426624192LL; unsigned short b; char a = 42; const long long &min(const long long &x, const long long &y) { return x < y ? x : y; } __attribute__((noipa)) void test() { b = min(a, min(a, c) + 5713568809962283044LL); } int main() { test(); if (b != 42) __builtin_abort(); } ifcombine "optimizes" the IMO correct: <bb 2> [local count: 1073741824]: a.1_1 = a; _2 = (long long int) a.1_1; _16 = c; if (_2 < _16) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] <bb 3> [local count: 536870912]: _4 = _16 + 5713568809962283044; if (_4 > _2) goto <bb 4>; [20.00%] else goto <bb 5>; [80.00%] <bb 4> [local count: 536870912]: <bb 5> [local count: 1073741824]: # _5 = PHI <_2(4), _4(3)> _6 = (short unsigned int) _5; b = _6; which for the testcase jumps from bb 2 to bb4 into: <bb 2> [local count: 1073741824]: a.1_1 = a; _2 = (long long int) a.1_1; _16 = c; _4 = _16 + 5713568809962283044; if (_2 < _4) goto <bb 3>; [60.00%] else goto <bb 4>; [40.00%] <bb 3> [local count: 536870912]: <bb 4> [local count: 1073741824]: # _5 = PHI <_2(3), _4(2)> _6 = (short unsigned int) _5; b = _6; which invokes UB on the _16 + 5713568809962283044 addition (signed integer overflow). So, either it needs to make sure the no longer conditional addition is performed in unsigned type, or it needs to give up.
Let me take a stab at this. We can enhance if (tree_ssa_ifcombine_bb (bb)) { /* Clear range info from all stmts in BB which is now executed conditional on a always true/false condition. */ reset_flow_sensitive_info_in_bb (bb); cfg_changed |= true; or adjust bb_no_side_effects_p (which already checks gimple_uses_undefined_value_p).
Hmm, I think the error happens earlier already when we simplify _2 < _16 && (_16 + 5713568809962283044 > _2) to _2 < _4 I do have a patch to do the rewrite into defined overflow which definitely fixes a latent issue but not this bug.
If so, that must be still during ifcombine though. Because what I see is that we have effectively MIN_EXPR <a, MIN_EXPR <a, c> + large_cst> until threadfull1, which decides to thread it based on the a < c comparison, if a < c, then the result is just a and the addition is optimized away as unused, otherwise effectively MIN_EXPR <c + large_cst, a> is done. I believe that is a correct transformation, because for undefined overflow arithmetics (with no wrap around) a < c && a < c + large_cst is equivalent.
I mean ... is equivalent to a < c.
(In reply to Richard Biener from comment #4) > Hmm, I think the error happens earlier already when we simplify > > _2 < _16 && (_4 > _2) to _2 < _4 > > I do have a patch to do the rewrite into defined overflow which definitely > fixes a latent issue but not this bug. Which happens through Applying pattern match.pd:6775, gimple-match.cc:54944 Applying pattern match.pd:3153, gimple-match.cc:177685 _2 < _16 & _2 < _4 match.pd:6775 -> _2 < min (_16, _4) and min (_16, _16 + 5713568809962283044) match.pd:3153 -> _16 + 5713568809962283044 (aka _4) which is all "nice" and validates that we can replace the inner condition with itself but it ignores that it only subsumes the outer condition if the outer condition is true. That's a real tough nut to crack. It basically means that we cannot really combine two conditions if the definition chain of the inner condition is under the outer condition (and involves undefined overflow), unless we somehow can decide whether the outer condition tests for overflow of any of the guarded stmts. In practice this means not simplifying interesting cases I guess and likely fold-const.cc code is susceptible to the very same issue. A "simple" workaround would pass in context BBs to maybe_fold_and_comparisons and adjust maybe_fold_comparisons_from_match_pd with a custom follow_all_ssa_edges variant, disallowing (signed) operation expansion from the inner BB. I will experiment with that to see what the fallout would be. Note combining will have similar issues with using flow-sensitive range info from inner BB defined stmts constrained by the outer check.
Created attachment 52744 [details] patch to rewrite undefined overflow stmts Here's the initial patch, not solving the testcase.
Created attachment 52745 [details] avoid combining the conditions Like this prototype which fixes the testcase and avoids combining from stmts defined in the middle BB. It could be enhanced by only considering not combining from stmts with undefined overflow behavior (or flow-sensitive info). This patch is also incomplete, maybe_fold_and_comparisons and maybe_fold_or_comparisons is used by if-conversion, ifcombine and reassoc and more importantly through recursion with itself and related simplifications in gimple-fold.cc. It's not clear where to extract the context BB from and I didn't follow all the flow through the various routines. Esp. if-conversion looks susceptible, reassoc might restrict itself to conditions from the same BB, not sure.
(In reply to Richard Biener from comment #9) > Created attachment 52745 [details] > avoid combining the conditions > > Like this prototype which fixes the testcase and avoids combining from stmts > defined in the middle BB. It could be enhanced by only considering not > combining from stmts with undefined overflow behavior (or flow-sensitive > info). > > This patch is also incomplete, maybe_fold_and_comparisons and > maybe_fold_or_comparisons is used by if-conversion, ifcombine and reassoc > and more importantly through recursion with itself and related > simplifications > in gimple-fold.cc. It's not clear where to extract the context BB from and > I didn't follow all the flow through the various routines. Esp. > if-conversion > looks susceptible, reassoc might restrict itself to conditions from the same > BB, not sure. surprisingly(?) the patch has no negative effect on the testsuite (OTOH it is incomplete and likely test coverage for vectorizer if-conversion is broader). I'm unsure whether we want to rush things for GCC 12 and to what extent we want to allow combination from the middle-block(s) (exposing, besides undefined overflow, other flow-sensitive info that depends on the outer condition being short-circuiting).
Created attachment 52746 [details] patch to rewrite undefined overflow stmts Fixed patch #1 (doesn't fix this testcase). Bootstrapped/tested OK on x86_64, I'm queuing this for stage1 for now.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:fc8d9e4497032dd295aac9414042163f92250b77 commit r12-8012-gfc8d9e4497032dd295aac9414042163f92250b77 Author: Richard Biener <rguenther@suse.de> Date: Mon Apr 4 12:23:28 2022 +0200 tree-optimization/105142 - wrong code with maybe_fold_{and,or}_comparisons The following avoids expanding definitions in regions conditionally executed under the condition A when simplifying A && B or A || B. This is done by passing down the basic-block of the outer condition to maybe_fold_{and,or}_comparisons, through the various helpers in gimple-fold.cc that might call back to maybe_fold_{and,or}_comparisons and ultimatively to maybe_fold_comparisons_from_match_pd where the fix is to provide a custom valueization hook to gimple_match_op::resimplify that avoids looking at definitions that do not dominate the outer block. For the testcase this avoids combining a stmt that invokes undefined integer overflow when the outer condition is false but it also aovids combining stmts with range information that is derived from the outer condition. The new parameter to maybe_fold_{and,or}_comparisons is defaulted to nullptr and I only adjusted the if-combine to pass down the outer block. I think other callers like tree-if-conv have the same issue but it's not straight-forward as to what to do there. 2022-04-05 Richard Biener <rguenther@suse.de> PR tree-optimization/105142 * gimple-fold.h (maybe_fold_and_comparisons): Add defaulted basic-block parameter. (maybe_fold_or_comparisons): Likewise. * gimple-fold.cc (follow_outer_ssa_edges): New. (maybe_fold_comparisons_from_match_pd): Use follow_outer_ssa_edges when an outer condition basic-block is specified. (and_comparisons_1, and_var_with_comparison, and_var_with_comparison_1, or_comparisons_1, or_var_with_comparison, or_var_with_comparison_1): Receive and pass down the outer condition basic-block. * tree-ssa-ifcombine.cc (ifcombine_ifandif): Pass down the basic-block of the outer condition. * g++.dg/torture/pr105142.C: New testcase.
The specific instance of wrong-code has been fixed, there are still latent issues though. I've queued the undefined overflow stmt rewrite for stage1.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:4b3874d803e7961f38b22fa798517a63171bb985 commit r13-3904-g4b3874d803e7961f38b22fa798517a63171bb985 Author: Richard Biener <rguenther@suse.de> Date: Tue Jul 26 11:52:49 2022 +0200 tree-optimization/105142 - improve maybe_fold_comparisons_from_match_pd fix The following improves on the fix for PR105142 which restricted the expression lookup used for maybe_fold_comparisons_from_match_pd to avoid picking up flow-sensitive info for use in places where guarding conditions do not hold. Instead of not allowing to expand SSA definitions there the following temporarily clears flow-sensitive info on the SSA names and restores it when finished matching. PR tree-optimization/105142 * gimple-fold.cc (fosa_unwind): New global. (follow_outer_ssa_edges): When the SSA definition to follow is does not dominate fosa_bb, temporarily clear flow-sensitive info. Make sure to not expand stmts with not defined overflow. (maybe_fold_comparisons_from_match_pd): Set up unwind stack for follow_outer_ssa_edges and unwind flow-sensitive info clearing after matching.