Suppose we have a test: typedef struct { unsigned data; } s1; s1 g_x[4]; extern void foo (s1 *x1, s1 *x2, int a, int b) { int i; for(i = 0; i < a; i++) if(i == b) g_x[i] = *x1; else g_x[i] = *x2; } $ ./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc -O3 -Wall -S x1.c x1.c: In function ‘foo’: x1.c:9:10: warning: array subscript is above array bounds [-Warray-bounds] g_x[i] = *x1; ^ Which come from second pass of VRP: # i_14 = PHI <4(14)> if (b_6(D) == 4) goto <bb 17>; else goto <bb 18>; <bb 17>: g_x[4] = *x1_7(D); i_11 = 5; goto <bb 3>; <bb 18>: __builtin_unreachable (); VRP checker fairly warns that g_x[4] is out-of-bounds mem ref. This code was generated by tree-ssa-loop-ivcanon.c I do not completely understand logic of cunroll pass, however dumps say: Loop 1 iterates at most 4 times. ... x1.c:7:3: note: loop with 5 iterations completely unrolled Last iteration exit edge was proved true. Forced statement unreachable: g_x[i_14] = *x2_9(D); In function `try_unroll_loop_completely' argument `maxiter' equals to 4, which is correct. Then (through n_unroll var) this value is passed to `gimple_duplicate_loop_to_header_edge' routine, which perform the peeling. This routine as far as I understand perform peel `n_unroll' copies from original loop, resulting to (n_unroll+1) overall copies (including original loop). I think of 2 solutions: - Reduce peel number by 1 - Improve remove_exits_and_undefined_stmts (), which correctly insert `gcc_unreachable' for `false' part of the if-stmt on 5-th iteration. It obviously don't warns, when limit is reduced: ./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc -O3 -Wall -msse3 -c -m32 x1.c --param max-completely-peel-times=3
Actually it is a bad interaction between DOM and complete unrolling. <bb 20>: # i_14 = PHI <i_36(19)> if (i_14 == b_6(D)) goto <bb 21>; else goto <bb 22>; <bb 21>: g_x[b_6(D)] = *x1_7(D); i_11 = i_14 + 1; goto <bb 3>; <bb 22>: __builtin_unreachable (); --- CUT --- Basic Block 21 should have been a __builtin_unreachable (); also. BB21 was bb5 after dom. BB5 was changed all the way back in dom to: <bb 5>: g_x[b_6(D)] = *x1_7(D); goto <bb 7>; It was before DOM: <bb 5>: g_x[i_14] = *x2_9(D);
Confirmed.
As long as I understand `remove_exits_and_undefined_stmts' iterate loop boundaries set marking `unreachable' stmts w/ impossible bounds. For the example we have: - for true edge basic block 6, loop depth 1 pred: 5 g_x[b_6(D)] = *x1_7(D); goto <bb 8>; succ: 8 - for false edge basic block 7, loop depth 1 pred: 5 g_x[i_14] = *x2_9(D); succ: 8 I suspect, that the problem is that `b' was propagated along `true' edge and replaced use of `i'. This removed reference to g_x from boundaries analysis for that edge: no IV is used there explicitly, only implicitly as b == i. Hence this stmt didn't hit boundaries set of the loop and wasn't marked as unreachable. BTW: this code survive rest of optimizations: movl (%edx), %edx cmpl $4, %eax movl %edx, g_x+12 jle .L1 movl (%ebx), %eax movl %eax, g_x+16 ;; REDUNDANT Looks like not simple warning, but also unnecessary code was generated.
GCC 4.9.2 has been released.
Kirill, you are correct WRT propagation of "b" for "i". Prior to DOM1 we have: ;; basic block 3, loop depth 1, count 0, freq 9100, maybe hot ;; prev block 2, next block 4, flags: (NEW, REACHABLE) ;; pred: 7 [91.0%] (TRUE_VALUE,EXECUTABLE) if (i_1 == b_6(D)) goto <bb 4>; else goto <bb 5>; ;; succ: 4 [0.0%] (TRUE_VALUE,EXECUTABLE) ;; 5 [100.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 1, count 0, freq 2, maybe hot ;; prev block 3, next block 5, flags: (NEW, REACHABLE) ;; pred: 3 [0.0%] (TRUE_VALUE,EXECUTABLE) g_x[i_1] = *x1_7(D); goto <bb 6>; ;; succ: 6 [100.0%] (FALLTHRU,EXECUTABLE) ;; basic block 5, loop depth 1, count 0, freq 9098, maybe hot ;; prev block 4, next block 6, flags: (NEW, REACHABLE) ;; pred: 3 [100.0%] (FALSE_VALUE,EXECUTABLE) g_x[i_1] = *x2_9(D); ;; succ: 6 [100.0%] (FALLTHRU,EXECUTABLE) DOM records that i_1 and b_6 are equivalent on the edge bb3->bb4. So in bb4 it replaces i_1 with b_6. Presumably that's causing problems downstream in the optimization pipeline. The easiest way to think about this is we record that i_1 can be legitimately replaced with b_6 in that context. We could just have easily recorded that b_6 can be replaced with i_1. I don't think we have any heuristics for which of those two equivalences to record, it's strictly based on the order of appearance (which is likely determined elsewhere by canonicalization rules). If you want to propose some heuristics, I'm all ears. One might be to put the object with the least number of references on the lhs. THe idea would be to try and ultimately get that use count to 0/1 which may allow that object to die at the comparison. There may be some reasonable heuristic around loop depth of the names as well. ie, do we want to replace uses of a non-loop object with a loop object or vice versa? Anyway, open to suggestions here...
(In reply to Jeffrey A. Law from comment #5) > Kirill, you are correct WRT propagation of "b" for "i". Prior to DOM1 we > have: > > ;; basic block 3, loop depth 1, count 0, freq 9100, maybe hot > ;; prev block 2, next block 4, flags: (NEW, REACHABLE) > ;; pred: 7 [91.0%] (TRUE_VALUE,EXECUTABLE) > if (i_1 == b_6(D)) > goto <bb 4>; > else > goto <bb 5>; > ;; succ: 4 [0.0%] (TRUE_VALUE,EXECUTABLE) > ;; 5 [100.0%] (FALSE_VALUE,EXECUTABLE) > > ;; basic block 4, loop depth 1, count 0, freq 2, maybe hot > ;; prev block 3, next block 5, flags: (NEW, REACHABLE) > ;; pred: 3 [0.0%] (TRUE_VALUE,EXECUTABLE) > g_x[i_1] = *x1_7(D); > goto <bb 6>; > ;; succ: 6 [100.0%] (FALLTHRU,EXECUTABLE) > > ;; basic block 5, loop depth 1, count 0, freq 9098, maybe hot > ;; prev block 4, next block 6, flags: (NEW, REACHABLE) > ;; pred: 3 [100.0%] (FALSE_VALUE,EXECUTABLE) > g_x[i_1] = *x2_9(D); > ;; succ: 6 [100.0%] (FALLTHRU,EXECUTABLE) > > > DOM records that i_1 and b_6 are equivalent on the edge bb3->bb4. So in bb4 > it replaces i_1 with b_6. Presumably that's causing problems downstream in > the optimization pipeline. The easiest way to think about this is we record > that i_1 can be legitimately replaced with b_6 in that context. We could > just have easily recorded that b_6 can be replaced with i_1. > > I don't think we have any heuristics for which of those two equivalences to > record, it's strictly based on the order of appearance (which is likely > determined elsewhere by canonicalization rules). > > If you want to propose some heuristics, I'm all ears. One might be to put > the object with the least number of references on the lhs. THe idea would > be to try and ultimately get that use count to 0/1 which may allow that > object to die at the comparison. There may be some reasonable heuristic > around loop depth of the names as well. ie, do we want to replace uses of > a non-loop object with a loop object or vice versa? > > Anyway, open to suggestions here... The rule is simple - we should always replace with the more dominating definition because that's what value-numbering would do to be able to make the other defs unused.
But replacement with the most dominating name (presumably a default def dominates everything) isn't going to help here. In many ways we'd be better off if we didn't propagate from those equality comparisons -- unless they allowed some other later simplification. But we don't have a good way to make that determination. Which ultimately let to the uncprop pass which we run very late to try and put things back the way they were. I wonder if running uncprop between DOM1 & the late unroller would be worth the extra pass. And if so, where does it make the most sense.
On Fri, 13 Feb 2015, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62217 > > --- Comment #7 from Jeffrey A. Law <law at redhat dot com> --- > But replacement with the most dominating name (presumably a default def > dominates everything) isn't going to help here. > > In many ways we'd be better off if we didn't propagate from those equality > comparisons -- unless they allowed some other later simplification. But we > don't have a good way to make that determination. That's true. We can also always choose the most immediate dominating definition for the reason that it might carry more information. But that might be a loss in some cases as well. > Which ultimately let to the > uncprop pass which we run very late to try and put things back the way they > were. > > I wonder if running uncprop between DOM1 & the late unroller would be worth the > extra pass. And if so, where does it make the most sense. uncprop is mostly to help out-of-SSA PHI coalescing so it's sth different (and should better be integrated with the out-of-SSA machinery)
Yes, any particular choice has the potential to regress in one way or another. These are heuristics after all. We're just looking for a reasonable refinement if we can find one. Dominance doesn't seem to be the right thing to be looking at to me since the crux of this issue is propagating the "copy" implied by the equality comparison just changes what SSA_NAME we reference and as a result ultimately hides stuff from later passes. It doesn't (in this case) enable further simplifications or for either SSA_NAME to become unused. A dominance test between the args of the equality comparison just doesn't seem helpful here. In fact, because both names are used in the equality test, these propagations can never cause an SSA_NAME to become unused. At best the propagation will expose some further simplification on one of the paths or it will result in one SSA_NAME having a single use (the comparison). We have no good way of guessing if the former will happen, but we can encourage the latter. But as I mentioned earlier, I really wonder if we should allow these context sensitive equivalences to be expressed in the gimple if they didn't prove useful. And that was the whole purpose behind uncprop -- to find context sensitive propagations that ultimately didn't prove useful and which result in poor coalescing or unwanted constant initializations and un propagate them. It wouldn't directly help this problem because the use is in a normal statement, but it's definitely closely related and it wouldn't be hard to repurpose that code. In fact, it might be a good thing to do in general. Essentially what it's doing is building a map of values back to SSA_NAMEs which hold those values by way of an equality comparison. At each use of the value, we can look at the list of SSA_NAMEs that hold that value and select the one that appears to be least cost based on whatever metrics we see fit.
On Mon, 16 Feb 2015, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62217 > > --- Comment #9 from Jeffrey A. Law <law at redhat dot com> --- Yes, any > particular choice has the potential to regress in one way or another. > These are heuristics after all. We're just looking for a reasonable > refinement if we can find one. Well, there is also canonicalization for CSE / jump threading. Consider if (i == 2) { if (i != j) ... else if (j == 2) ... or y = 2 * i; if (i == j) x = i + j; but yes, this is followup transforms. Unfortunately(?) DOM performs copy/constant propagation for all recorded equalities. > Dominance doesn't seem to be the right thing to be looking at to me > since the crux of this issue is propagating the "copy" implied by the > equality comparison just changes what SSA_NAME we reference and as a > result ultimately hides stuff from later passes. It doesn't (in this > case) enable further simplifications or for either SSA_NAME to become > unused. A dominance test between the args of the equality comparison > just doesn't seem helpful here. > > In fact, because both names are used in the equality test, these > propagations can never cause an SSA_NAME to become unused. At best the > propagation will expose some further simplification on one of the paths > or it will result in one SSA_NAME having a single use (the comparison). > We have no good way of guessing if the former will happen, but we can > encourage the latter. As you say, we don't know - we only know that properly canonicalizing will expose the followup transforms if there are any. It looks like we are basically taking the original order of EQ_EXPR operands (thus eventually rely on tree_swap_operands canonicalization) plus the "correctness" thing of taking into account loop depth (which is kind of a dominance relation). We are also introducing SSA_NAME_VALUE "chains" in record_equality as we are setting x SSA_NAME_VALUE to y not to SSA_NAME_VALUE (y) (we seem to do this in multiple places for some odd reason..., only tree-ssa-threadedge.c:record_temporary_equivalence seems to get this right). > But as I mentioned earlier, I really wonder if we should allow these context > sensitive equivalences to be expressed in the gimple if they didn't prove > useful. And that was the whole purpose behind uncprop -- to find context > sensitive propagations that ultimately didn't prove useful and which result in > poor coalescing or unwanted constant initializations and un propagate them. Yes (but on GIMPLE we don't care). > It wouldn't directly help this problem because the use is in a normal > statement, but it's definitely closely related and it wouldn't be hard to > repurpose that code. In fact, it might be a good thing to do in general. > > Essentially what it's doing is building a map of values back to SSA_NAMEs which > hold those values by way of an equality comparison. At each use of the value, > we can look at the list of SSA_NAMEs that hold that value and select the one > that appears to be least cost based on whatever metrics we see fit.
Note that the diagnostic part is fixed for GCC 5. We are still somehow not removing dead code.
Index: gcc/tree-ssa-dom.c =================================================================== --- gcc/tree-ssa-dom.c (revision 220755) +++ gcc/tree-ssa-dom.c (working copy) @@ -2291,11 +2291,16 @@ cprop_operand (gimple stmt, use_operand_ if (!may_propagate_copy (op, val)) return; - /* Do not propagate copies into simple IV increment statements. - See PR23821 for how this can disturb IV analysis. */ - if (TREE_CODE (val) != INTEGER_CST - && simple_iv_increment_p (stmt)) - return; + /* Do not propagate copies into BIVs. + See PR23821 and PR62217 for how this can disturb IV and + number of iteration analysis. */ + if (TREE_CODE (val) != INTEGER_CST) + { + gimple def = SSA_NAME_DEF_STMT (op); + if (gimple_code (def) == GIMPLE_PHI + && gimple_bb (def)->loop_father->header == gimple_bb (def)) + return; + } /* Dump details. */ if (dump_file && (dump_flags & TDF_DETAILS)) fixes the warning on the branch, not sure yet if the missed-optimization on the trunk. It extends an existing heuristic to not replace a BIV use in an increment to not replace any BIV use (??? Best if we'd know if the equivalence were temporary only...)
On 02/17/15 02:44, rguenther at suse dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62217 > > --- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> --- > On Mon, 16 Feb 2015, law at redhat dot com wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62217 >> >> --- Comment #9 from Jeffrey A. Law <law at redhat dot com> --- Yes, any >> particular choice has the potential to regress in one way or another. >> These are heuristics after all. We're just looking for a reasonable >> refinement if we can find one. > > Well, there is also canonicalization for CSE / jump threading. Consider > > if (i == 2) > { > if (i != j) > ... > else if (j == 2) > ... > > or > > y = 2 * i; > if (i == j) > x = i + j; > > but yes, this is followup transforms. Unfortunately(?) DOM performs > copy/constant propagation for all recorded equalities. Yea, I've been mulling what it would take to instead build equivalence classes, then when we find a use walk through the class and see which one is "best". I suspect the vast majority of the time the "best" is always going to be the same -- the major exception will be these temporary equivalences. I've also been pondering somehow marking the equivalent values with some kind of tag indicating their scope and using that to guide propagation. But I haven't prototyped anything... > > As you say, we don't know - we only know that properly canonicalizing > will expose the followup transforms if there are any. It looks like > we are basically taking the original order of EQ_EXPR operands > (thus eventually rely on tree_swap_operands canonicalization) plus > the "correctness" thing of taking into account loop depth (which is > kind of a dominance relation). Correct, we take it from the original order and thus from the earlier canonicalization via tree_swap_operands. > > We are also introducing SSA_NAME_VALUE "chains" in record_equality > as we are setting x SSA_NAME_VALUE to y not to SSA_NAME_VALUE (y) > (we seem to do this in multiple places for some odd reason..., > only tree-ssa-threadedge.c:record_temporary_equivalence seems to get > this right). Even if we set to SSA_NAME_VALUE (y) we can end up with chains. But this change might eliminate the benefit of the chain walk, which would be good. Jeff
WRT the patch in c#12, it looks reasonable for the same reasons as we avoid propagating in 23821. I can confirm that it prevents the unwanted cprop into array reference. By DOM2 we have the following array references: g_x[0] = *x2_9(D); g_x[0] = *x1_7(D); g_x[1] = *x2_9(D); g_x[1] = *x1_7(D); g_x[2] = *x2_9(D); g_x[2] = *x1_7(D); g_x[3] = *x1_7(D); g_x[3] = *x2_9(D); Assuming it bootstraps and regression tests, I'd go with it.
Author: rguenth Date: Wed Feb 18 09:48:57 2015 New Revision: 220785 URL: https://gcc.gnu.org/viewcvs?rev=220785&root=gcc&view=rev Log: 2015-02-18 Richard Biener <rguenther@suse.de> PR tree-optimization/62217 * tree-ssa-dom.c (cprop_operand): Avoid propagating copies into BIVs. * gcc.dg/tree-ssa/cunroll-11.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/cunroll-11.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-dom.c
Fixed on trunk sofar.
GCC 4.9.3 has been released.
Author: rguenth Date: Thu Feb 11 13:40:31 2016 New Revision: 233344 URL: https://gcc.gnu.org/viewcvs?rev=233344&root=gcc&view=rev Log: 2016-02-11 Richard Biener <rguenther@suse.de> Backport from mainline 2015-02-18 Richard Biener <rguenther@suse.de> PR tree-optimization/62217 * tree-ssa-dom.c (cprop_operand): Avoid propagating copies into BIVs. * gcc.dg/tree-ssa/cunroll-11.c: New testcase. 2015-06-18 Richard Biener <rguenther@suse.de> Backport from mainline 2015-06-03 Richard Biener <rguenther@suse.de> PR tree-optimization/66375 * tree-scalar-evolution.c (follow_ssa_edge_binary): First add to the evolution before following SSA edges. * gcc.dg/torture/pr66375.c: New testcase. 2015-06-23 Richard Biener <rguenther@suse.de> Backport from mainline 2015-06-09 Richard Biener <rguenther@suse.de> PR middle-end/66413 * tree-inline.c (insert_init_debug_bind): Unshare value. * gcc.dg/torture/pr66413.c: New testcase. 2015-07-08 Richard Biener <rguenther@suse.de> PR tree-optimization/66794 * gimple-ssa-isolate-paths.c (gimple_ssa_isolate_erroneous_paths): Free post-dominators. * gcc.dg/torture/pr66794.c: New testcase. 2015-07-10 Richard Biener <rguenther@suse.de> PR tree-optimization/66823 * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix inverted predicate. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr66375.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr66413.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr66794.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/tree-ssa/cunroll-11.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/gimple-ssa-isolate-paths.c branches/gcc-4_9-branch/gcc/gimple.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/tree-if-conv.c branches/gcc-4_9-branch/gcc/tree-inline.c branches/gcc-4_9-branch/gcc/tree-scalar-evolution.c branches/gcc-4_9-branch/gcc/tree-ssa-dom.c
Fixed.