Created attachment 53945 [details] Code cat case.c #230636 int b; void foo(); void(a)(); int main() { int c; int *d = &c; *d = a && 8; b = 0; for (; b < 9; ++b) *d ^= 3; if (*d) ; else foo(); } `gcc-e4faee8d02ec5d65bf418612f7181823eb08c078 (trunk) -Os` can not eliminate `foo` but `gcc-releases/gcc-12.2.0 -Os` can. `gcc-e4faee8d02ec5d65bf418612f7181823eb08c078 (trunk) -Os -S -o /dev/stdout case.c` --------- OUTPUT --------- main: .LFB0: .cfi_startproc movl $10, %edx xorl %ecx, %ecx movl $1, %eax .L2: decl %edx je .L12 xorl $3, %eax movb $1, %cl jmp .L2 .L12: testb %cl, %cl jne .L4 xorl %esi, %esi movl %esi, b(%rip) jmp .L5 .L4: movl $9, b(%rip) .L5: testl %eax, %eax jne .L8 pushq %rdx .cfi_def_cfa_offset 16 call foo xorl %eax, %eax popq %rcx .cfi_def_cfa_offset 8 ret .L8: xorl %eax, %eax ret ---------- END OUTPUT --------- `gcc-releases/gcc-12.2.0 -Os -S -o /dev/stdout case.c` --------- OUTPUT --------- main: .LFB0: .cfi_startproc movl $9, b(%rip) xorl %eax, %eax ret ---------- END OUTPUT --------- Bisects to: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e7310e24b1c0ca67b1bb507c1330b2bf39e59e32
Confirmed. Same with -O2. Visiting conditional with predicate: if (c_13 != 0) With known ranges c_13: [irange] int [-INF, +INF] NONZERO 0x3 Predicate evaluates to: DON'T KNOW Not folded vs. handling the cycle PHI for c: <bb 2> [local count: 118111600]: b = 0; goto <bb 4>; [100.00%] <bb 3> [local count: 955630225]: _1 = c_10 ^ 3; _2 = b.1_3 + 1; b = _2; <bb 4> [local count: 1073741824]: # c_10 = PHI <1(2), _1(3)> b.1_3 = b; if (b.1_3 <= 8) goto <bb 3>; [89.00%] else goto <bb 5>; [11.00%] <bb 5> [local count: 118111600]: # c_13 = PHI <c_10(4)> if (c_13 != 0) Value ranges after VRP: _1: int [1, 2] c_2: int [1, 2] this is yet another case where proper propagation is important. I'm questioning the idea that on-demand ranger is a good solution and replacing VRP1 was premature? I'm asking again as to what the plan was for cases like this? bit-CCP propagation arrives with Simulating statement: c_10 = PHI <1(2), _1(3)> Visiting PHI node: c_10 = PHI <1(2), _1(3)> Argument #0 (2 -> 4 executable) 1 Value: CONSTANT 1 Argument #1 (3 -> 4 executable) _1 Value: CONSTANT 0x0 (0x3) PHI node value: CONSTANT 0x0 (0x3) but it doesn't know that always either bit zero or one is set.
Sorry, I've been out for a few weeks. This isn't an on-demand issue. All versions of VRP do a full walk and resolve during the walk. This issue is a combination of lack of iteration and not optimistic evaluations on back edges. We do use on-demand to query the back edge to alleviate a number of issues with lack of iteration. The problem is that with a lack of iteration, we can't take the fully optimistic approach and assume that the value of c_10 is [1, 1] for a full iteration of the loop and then determine during the second iteration that it is always either 1 or 2. <bb 3> [local count: 955630225]: _1 = c_10 ^ 3; <bb 4> [local count: 1073741824]: # c_10 = PHI <1(2), _1(3)> b.1_3 = b; if (b.1_3 <= 8) goto <bb 3>; [89.00%] else goto <bb 5>; [11.00%] <bb 5> [local count: 118111600]: # c_13 = PHI <c_10(4)> if (c_13 != 0) When we first try to evaluate # c_10 = PHI <1(2), _1(3)> the back edge 3->4 has not been processed, so it is evaluated and _1 needs to be calculated. Unfortunately the value of c_10 has not been resolved yet, so it defaults to something pessimistically safe and assumes it is varying and we get the results you see. We have a few possible approaches. In an ideal world, we could use the path ranger to answer questions on the back edge instead of fully evaluating it. This would allow us to do a "pretend" first iteration and use the value of [1,1] for c_10, and produce _1 = result of [1,2], which in turn would then cause us to evaluate c_10 as [1,2]. It is too late in the cycle to experiment with that approach I think. I have also (still am) experimented with changing that initial value to be more optimistic. When we first evaluate a statement, We store an initial value so that if it is encountered again during evaluation, we can avoid a cycle. That initial value is what gets used by any calculations along a back edge. When we return from resolving all the inputs, we do a final evaluation of the statement and store the real global value. We have existing infrastructure which uses time stamps that should allow us to calculate one value when doing the back edge, and then recalculate it for real when we actually process that block. I have not convinced myself that this is 100% safe yet. A third option would be to recognize this is a fairly common pattern that we have a 2 input PHI node like this. and before setting the initial value we could walk the use-def chains and see if the PHI is feeding itself and make some other determinations about the range (ie, increasing, decreasing, specific values, etc) and use that for the initial estimate instead. This would give us similar results to what we did before and is likely safer than depending on timestamps to do recalculations. It is just not quite as general. Im experimenting with the latter 2 approaches this week to determine what seems safest at this point in the cycle.
Re-confirmed. More importantly it's not only -Os which regresses but also -O2. All VRP passes are now "EVRP" style, doing a single walk of the CFG via the dominator walk in substitute_and_fold and triggering ranger "on demand" from there. That's a fundamental regression but it was a concious one maybe not fully anticipating the effect. SCEV can only help in a subset (but probably the most important set) of cases here. It might be possible to have ranger itself mimic what the old SSA propagator VRP did for SSA SCCs (which was tuned to be good enough but not too expensive).
I don't see us realistically fixing this for GCC 13, so postponing to GCC 14 and downgrading from P1 to P2. Besides bringing back a pass that performs propagation itself possibly using value-range and range-op helpers it might be possible to teach the on-demand ranger to process SSA SCCs in the special way the old code did (but only when asked for?). But teaching ranger about SCCs probably requires extensive changes in the code doing the global "propagation", and I'm not sure that's a) desirable or b) even possible in some reasonable manner. Adding ad-hoc SCC discovery ontop of the existing "propagation" probably does not make much sense.
(In reply to Richard Biener from comment #4) > I don't see us realistically fixing this for GCC 13, so postponing to GCC 14 > and downgrading from P1 to P2. > I am currently developing a phi analyzer for ranges which provides better initial range for phi cycles. It attempts to locate a an initial value and an update statement which are the sole impact of a PHI group and anaylze their impact on the range. It should catch cases like this in the next release. It is a self contained class, and would be simple to backport to any ranger version of GCC when complete and tested.
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee Author: Andrew MacLeod <amacleod@redhat.com> Date: Wed May 24 09:52:26 2023 -0400 Gimple range PHI analyzer and testcases Provide a PHI analyzer framework to provive better initial values for PHI nodes which formk groups with initial values and single statements which modify the PHI values in some predicatable way. PR tree-optimization/107822 PR tree-optimization/107986 gcc/ * Makefile.in (OBJS): Add gimple-range-phi.o. * gimple-range-cache.h (ranger_cache::m_estimate): New phi_analyzer pointer member. * gimple-range-fold.cc (fold_using_range::range_of_phi): Use phi_analyzer if no loop info is available. * gimple-range-phi.cc: New file. * gimple-range-phi.h: New file. * tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer. gcc/testsuite/ * gcc.dg/pr107822.c: New. * gcc.dg/pr107986-1.c: New.
Fixed.