This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
- From: "ktkachov at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 18 Sep 2015 14:32:23 +0000
- Subject: [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628
Bug ID: 67628
Summary: [tree-optimization] (a && b) && c shows better codegen
than a && (b && c)
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Consider the two functions:
int
foo1 (int a, int b, int c, int d)
{
return a > b && b <= c && c > d;
}
int
foo2 (int a, int b, int c, int d)
{
return a > b && (b <= c && c > d);
}
On aarch64 foo1 generates:
foo1:
cmp w1, w2
ccmp w2, w3, 4, le
ccmp w0, w1, 4, gt
cset w0, gt
ret
but for foo2 generates:
foo2:
cmp w0, w1
ble .L4
cmp w1, w2
cset w1, le
cmp w2, w3
cset w0, gt
and w0, w1, w0
ret
Something similar is observed on x86_64 where foo2 contains a conditional
branch instruction where foo1 is a single basic block
In foo2 we end up generating multiple basic blocks whereas for foo1 we manage
to merge them all into 1 basic block which ends up going through the
conditional-compare pass nicely.
Looking at the final .optimized tree dump the foo1 tree is:
_BoolD.2673 _1;
_BoolD.2673 _4;
_BoolD.2673 _6;
_BoolD.2673 _10;
_BoolD.2673 _11;
intD.7 _12;
;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;; prev block 0, next block 1, flags: (NEW, REACHABLE)
;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE)
# RANGE [0, 1]
_4 = a_2(D) > b_3(D);
# RANGE [0, 1]
_6 = b_3(D) <= c_5(D);
# RANGE [0, 1]
_10 = c_5(D) > d_8(D);
# RANGE [0, 1]
_1 = _6 & _10;
# RANGE [0, 1]
_11 = _1 & _4;
# RANGE [0, 1] NONZERO 1
_12 = (intD.7) _11;
# VUSE <.MEM_9(D)>
return _12;
;; succ: EXIT [100.0%]
whereas for foo2 it's more complex:
intD.7 iftmp.0_1;
_BoolD.2673 _5;
_BoolD.2673 _7;
_BoolD.2673 _8;
intD.7 _10;
_BoolD.2673 _11;
;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;; prev block 0, next block 3, flags: (NEW, REACHABLE)
;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE)
if (a_2(D) > b_3(D))
goto <bb 3>;
else
goto <bb 4>;
;; succ: 3 [50.0%] (TRUE_VALUE,EXECUTABLE)
;; 4 [50.0%] (FALSE_VALUE,EXECUTABLE)
;; basic block 3, loop depth 0, count 0, freq 5000, maybe hot
;; prev block 2, next block 4, flags: (NEW, REACHABLE)
;; pred: 2 [50.0%] (TRUE_VALUE,EXECUTABLE)
# RANGE [0, 1]
_5 = b_3(D) <= c_4(D);
# RANGE [0, 1]
_7 = c_4(D) > d_6(D);
# RANGE [0, 1]
_8 = _5 & _7;
;; succ: 4 [100.0%] (FALLTHRU,EXECUTABLE)
;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot
;; prev block 3, next block 1, flags: (NEW, REACHABLE)
;; pred: 3 [100.0%] (FALLTHRU,EXECUTABLE)
;; 2 [50.0%] (FALSE_VALUE,EXECUTABLE)
# _11 = PHI <_8(3), 0(2)>
# RANGE [0, 1] NONZERO 1
iftmp.0_1 = (intD.7) _11;
# VUSE <.MEM_9(D)>
return iftmp.0_1;
;; succ: EXIT [100.0%]
If we were to pick some kind of canonicalization for these equivalent
expressions it would make life easier for later passes to generate consistent
code.