Bug 96703 - Failure to optimize combined comparison of variables and of variable with 0 to two comparisons with 0
Summary: Failure to optimize combined comparison of variables and of variable with 0 t...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 enhancement
Target Milestone: ---
Assignee: Arjun Shankar
URL:
Keywords: easyhack, missed-optimization
Depends on:
Blocks: 19987
  Show dependency treegraph
 
Reported: 2020-08-19 10:15 UTC by Gabriel Ravier
Modified: 2023-09-04 04:31 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-08-24 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2020-08-19 10:15:55 UTC
bool f(int x, int y)
{
    return x > y && y == 0;
}

This can be optimized to `return (y == 0) && (x > 0);` (This transformation doesn't by itself make the code faster, but it probably helps with pipelined CPUs (avoids dependency on both variables for the first comparison) and looks like it would most likely make other optimizations easier). This transformation is done by LLVM, but not by GCC.
Comment 1 Andrew Pinski 2020-08-24 22:36:35 UTC
Confirmed, a small one.
Comment 2 Andrew Pinski 2023-09-04 04:31:29 UTC
Hmm for
```
#define cst 0x1234

bool f(int x, int y)
{
    return x > y && y == cst;
}

bool f0(int x, int y)
{
    return x > cst && y == cst;
}
```

currently for GCC on aarch64:
```
f:
        cmp     w0, w1
        mov     w2, 4660
        ccmp    w1, w2, 0, gt
        cset    w0, eq
        ret
f0:
        mov     w2, 4660
        cmp     w0, w2
        ccmp    w1, w2, 0, gt
        cset    w0, eq
        ret
```
The f is actually better because the first cmp is indepdent from the move.
So for a dual issue CPU, f would be better almost always. Even if the move does not occupy an issue slot.

For RISCV not doing is actually better:
        li      a5,4096
        addi    a5,a5,564
        sub     a5,a1,a5
        sgt     a0,a0,a1
        seqz    a5,a5
        and     a0,a5,a0
        ret

vs
        li      a5,4096
        addi    a5,a5,564
        sub     a1,a1,a5
        seqz    a1,a1
        sgt     a0,a0,a5
        and     a0,a1,a0
        ret

The sgt without doing this is indepdent of the constant forming.

Now 0 could be handled as a special case because most targets handle 0 nicely.

I see doing it is better for power but I don't know if that is true in general or just the constants I tried.