Bug 116546 - Missed optimization of redundant comparison
Summary: Missed optimization of redundant comparison
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2024-08-30 22:49 UTC by Peter Bergner
Modified: 2024-12-28 23:36 UTC (History)
1 user (show)

See Also:
Host:
Target: powerpc64le-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-08-30 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Bergner 2024-08-30 22:49:46 UTC
In the following test case, the "n & 4" test is redundant and should be eliminated, since the "n &= 7;" statement limits n's potential values to [0,7].  The "n >= 4" test further narrows its potential values to [4,7], making the "n & 4" test always true, allowing the elimination of the code path calling bar().  

bergner@ltcden2-lp1:~$ cat test.c 
extern long foo (void);
extern long bar (void);

long
test (long n)
{
  n &= 7;
  if (n >= 4) {
    if (n & 4)
      return foo ();
    else
      return bar ();
  }
  return 0;
}

Current gcc (-O2) does optimize this on powerpc64le-linux and removes the bar() code path. However, if we change the "n >= 4" test to either "n > 4" or "n == 4", then we fail to eliminate the code path to bar(), even though the "n & 4" test remains redundant.

Using -O2 -mcpu=power10 (-mcpu=power10 allows tail calls making the resulting asm easier to read, but it isn't required) produces for the "n >= 4" test:

test:
	andi. 3,3,0x4
	bne 0,.L4
	li 3,0
	blr
	.p2align 4,,15
.L4:
	b foo@notoc

Where as the "n > 4" test results in:

test:
	rldicl 2,3,0,61
	cmpdi 0,2,4
	ble 0,.L2
	andi. 3,3,0x4
	beq 0,.L3
	b foo@notoc
	.p2align 4,,15
.L2:
	li 3,0
	blr
	.p2align 4,,15
.L3:
	b bar@notoc

Interestingly to me, the optimized code keeps the inner redundant "n & 4" test (andi. 3,3,0x4) and removes the outer tests.  I didn't expect that.
Comment 1 Andrew Pinski 2024-08-30 23:06:32 UTC
I think this is a dup ...
Comment 2 Andrew Pinski 2024-08-30 23:28:52 UTC
Disabling the forwprop1 allows VRP to handle this.

`-fdisable-tree-forwprop1`

Basically forwprop is combining `(n &4) & 7` which then confuses VRP.