Bug 107822 - [13/14/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)
Summary: [13/14/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 14.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2022-11-22 17:51 UTC by Yann Girsberger
Modified: 2023-08-09 04:27 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.2.1
Known to fail: 13.0
Last reconfirmed: 2023-02-27 00:00:00


Attachments
Code (115 bytes, text/plain)
2022-11-22 17:51 UTC, Yann Girsberger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yann Girsberger 2022-11-22 17:51:01 UTC
Created attachment 53945 [details]
Code

cat case.c #230636
int b;
void foo();
void(a)();
int main() {
  int c;
  int *d = &c;
  *d = a && 8;
  b = 0;
  for (; b < 9; ++b)
    *d ^= 3;
  if (*d)
    ;
  else
    foo();
}

`gcc-e4faee8d02ec5d65bf418612f7181823eb08c078 (trunk) -Os` can not eliminate `foo` but `gcc-releases/gcc-12.2.0 -Os` can.

`gcc-e4faee8d02ec5d65bf418612f7181823eb08c078 (trunk) -Os -S -o /dev/stdout case.c`
--------- OUTPUT ---------
main:
.LFB0:
	.cfi_startproc
	movl	$10, %edx
	xorl	%ecx, %ecx
	movl	$1, %eax
.L2:
	decl	%edx
	je	.L12
	xorl	$3, %eax
	movb	$1, %cl
	jmp	.L2
.L12:
	testb	%cl, %cl
	jne	.L4
	xorl	%esi, %esi
	movl	%esi, b(%rip)
	jmp	.L5
.L4:
	movl	$9, b(%rip)
.L5:
	testl	%eax, %eax
	jne	.L8
	pushq	%rdx
	.cfi_def_cfa_offset 16
	call	foo
	xorl	%eax, %eax
	popq	%rcx
	.cfi_def_cfa_offset 8
	ret
.L8:
	xorl	%eax, %eax
	ret
---------- END OUTPUT ---------


`gcc-releases/gcc-12.2.0 -Os -S -o /dev/stdout case.c`
--------- OUTPUT ---------
main:
.LFB0:
	.cfi_startproc
	movl	$9, b(%rip)
	xorl	%eax, %eax
	ret
---------- END OUTPUT ---------


Bisects to: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e7310e24b1c0ca67b1bb507c1330b2bf39e59e32
Comment 1 Richard Biener 2022-12-21 13:48:41 UTC
Confirmed.  Same with -O2.

Visiting conditional with predicate: if (c_13 != 0)

With known ranges 
        c_13: [irange] int [-INF, +INF] NONZERO 0x3

Predicate evaluates to: DON'T KNOW
Not folded

vs. handling the cycle PHI for c:

  <bb 2> [local count: 118111600]:
  b = 0;
  goto <bb 4>; [100.00%]

  <bb 3> [local count: 955630225]:
  _1 = c_10 ^ 3;
  _2 = b.1_3 + 1;
  b = _2;

  <bb 4> [local count: 1073741824]:
  # c_10 = PHI <1(2), _1(3)>
  b.1_3 = b;
  if (b.1_3 <= 8)
    goto <bb 3>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 5> [local count: 118111600]:
  # c_13 = PHI <c_10(4)>
  if (c_13 != 0)

Value ranges after VRP:

_1: int [1, 2]
c_2: int [1, 2]


this is yet another case where proper propagation is important.  I'm questioning the idea that on-demand ranger is a good solution and replacing VRP1 was premature?  I'm asking again as to what the plan was for cases like this?

bit-CCP propagation arrives with

Simulating statement: c_10 = PHI <1(2), _1(3)>

Visiting PHI node: c_10 = PHI <1(2), _1(3)>
        Argument #0 (2 -> 4 executable)
        1       Value: CONSTANT 1
        Argument #1 (3 -> 4 executable)
        _1      Value: CONSTANT 0x0 (0x3)

    PHI node value: CONSTANT 0x0 (0x3)

but it doesn't know that always either bit zero or one is set.
Comment 2 Andrew Macleod 2023-01-03 17:46:55 UTC
Sorry, I've been out for a few weeks.

This isn't an on-demand issue. All versions of VRP do a full walk and resolve during the walk. This issue is a combination of lack of iteration and not optimistic evaluations on back edges. 

We do use on-demand to query the back edge to alleviate a number of issues with lack of iteration. The problem is that with a lack of iteration, we can't take the fully optimistic approach and assume that the value of c_10 is [1, 1] for a full iteration of the loop and then determine during the second iteration that it is always either 1 or 2.

    <bb 3> [local count: 955630225]:
    _1 = c_10 ^ 3;

    <bb 4> [local count: 1073741824]:
    # c_10 = PHI <1(2), _1(3)>
    b.1_3 = b;
    if (b.1_3 <= 8)
      goto <bb 3>; [89.00%]
    else
      goto <bb 5>; [11.00%]

    <bb 5> [local count: 118111600]:
    # c_13 = PHI <c_10(4)>
    if (c_13 != 0)

When we first try to evaluate 
# c_10 = PHI <1(2), _1(3)>
the back edge 3->4 has not been processed, so it is evaluated and _1 needs to be calculated. Unfortunately the value of c_10 has not been resolved yet, so it defaults to something pessimistically safe and assumes it is varying and we get the results you see.


We have a few possible approaches. In an ideal world, we could use the path ranger to answer questions on the back edge instead of fully evaluating it.  This would allow us to do a "pretend" first iteration and use the value of [1,1] for c_10, and produce _1 =  result of [1,2], which in turn would then cause us to evaluate c_10 as [1,2].  It is too late in the cycle to experiment with that approach I think.

I have also (still am) experimented with changing that initial value to be more optimistic.  When we first evaluate a statement, We store an initial value so that if it is encountered again during evaluation, we can avoid a cycle. That initial value is what gets used by any calculations along a back edge. When we return from resolving all the inputs, we do a final evaluation of the statement and store the real global value.

We have existing infrastructure which uses time stamps that should allow us to calculate one value when doing the back edge, and then recalculate it for real when we actually process that block.  I have not convinced myself that this is 100% safe yet.

A third option would be to recognize this is a fairly common pattern that we have a 2 input PHI node like this. and before setting the initial value we could walk the use-def chains and see if the PHI is feeding itself and make some other determinations about the range (ie, increasing, decreasing, specific values, etc) and use that for the initial estimate instead.   This would give us similar results to what we did before and is likely safer than depending on timestamps to do recalculations. It is just not quite as general.      

Im experimenting with the latter 2 approaches this week to determine what seems safest at this point in the cycle.
Comment 3 Richard Biener 2023-02-27 10:15:04 UTC
Re-confirmed.  More importantly it's not only -Os which regresses but also -O2.

All VRP passes are now "EVRP" style, doing a single walk of the CFG
via the dominator walk in substitute_and_fold and triggering ranger
"on demand" from there.

That's a fundamental regression but it was a concious one maybe not
fully anticipating the effect.  SCEV can only help in a subset (but
probably the most important set) of cases here.

It might be possible to have ranger itself mimic what the old SSA propagator
VRP did for SSA SCCs (which was tuned to be good enough but not too
expensive).
Comment 4 Richard Biener 2023-03-24 08:28:16 UTC
I don't see us realistically fixing this for GCC 13, so postponing to GCC 14
and downgrading from P1 to P2.

Besides bringing back a pass that performs propagation itself possibly using
value-range and range-op helpers it might be possible to teach the on-demand
ranger to process SSA SCCs in the special way the old code did (but
only when asked for?).  But teaching ranger about SCCs probably requires
extensive changes in the code doing the global "propagation", and I'm not
sure that's a) desirable or b) even possible in some reasonable manner.
Adding ad-hoc SCC discovery ontop of the existing "propagation" probably
does not make much sense.
Comment 5 Andrew Macleod 2023-03-27 20:47:57 UTC
(In reply to Richard Biener from comment #4)
> I don't see us realistically fixing this for GCC 13, so postponing to GCC 14
> and downgrading from P1 to P2.
> 

I am currently developing a phi analyzer for ranges which provides better initial range for phi cycles.  It attempts to locate a an initial value and an update statement which are the sole impact of a PHI group and anaylze their impact on the range. 

It should catch cases like this in the next release.  It is a self contained class, and would be simple to backport to any ranger version of GCC when complete and tested.
Comment 6 GCC Commits 2023-05-24 21:19:11 UTC
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee

commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Wed May 24 09:52:26 2023 -0400

    Gimple range PHI analyzer and testcases
    
    Provide a PHI analyzer framework to provive better initial values for
    PHI nodes which formk groups with initial values and single statements
    which modify the PHI values in some predicatable way.
    
            PR tree-optimization/107822
            PR tree-optimization/107986
            gcc/
            * Makefile.in (OBJS): Add gimple-range-phi.o.
            * gimple-range-cache.h (ranger_cache::m_estimate): New
            phi_analyzer pointer member.
            * gimple-range-fold.cc (fold_using_range::range_of_phi): Use
            phi_analyzer if no loop info is available.
            * gimple-range-phi.cc: New file.
            * gimple-range-phi.h: New file.
            * tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.
    
            gcc/testsuite/
            * gcc.dg/pr107822.c: New.
            * gcc.dg/pr107986-1.c: New.
Comment 7 Andrew Pinski 2023-08-09 04:27:26 UTC
Fixed.