Bug 106327

Summary: side-effect-free _x variance not optimized to unpredicated instruction
Product: gcc Reporter: Yichao Yu <yyc1992>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: WAITING ---    
Severity: normal CC: rsandifo
Priority: P3 Keywords: aarch64-sve, missed-optimization
Version: 12.1.0   
Target Milestone: ---   
Host: Target: aarch64
Build: Known to work:
Known to fail: Last reconfirmed: 2022-08-31 00:00:00

Description Yichao Yu 2022-07-16 20:11:38 UTC
Related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106326 .

According to the Arm C Language Extension for SVE, when the _x predicate is used,

> The compiler can then pick whichever form of instruction seems to give the best code. This includes using unpredicated instructions, where available and suitable

Because of this, I'm expecting the following to be optimized to a single add instruction, as if a `svptrue_b64()` predicate is used.

```
svfloat64_t add(svfloat64_t a, svfloat64_t b)
{
    auto und_ok = svcmpge(svptrue_b64(), a, b);
    return svadd_x(und_ok, a, b);
}
```

However, gcc compiles this as _m and generates

```
        ptrue   p0.b, all
        fcmge   p0.d, p0/z, z0.d, z1.d
        fadd    z0.d, p0/m, z0.d, z1.d
```

In general, is there any reason not to treat an `add_x` (also other side-effect-free functions) with an unknown predicate as unpredicated one?
Comment 1 Richard Sandiford 2022-08-31 11:38:41 UTC
This is because performing the addition on the inactive lanes
could trigger an IEEE exception.  The code is optimised to an
unpredicated FADD with -ffast-math.