Bug 113882 - V4SF->V4HI could be implemented using V4SF->V4SI and then truncation to V4HI
Summary: V4SF->V4HI could be implemented using V4SF->V4SI and then truncation to V4HI
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-02-12 02:13 UTC by Andrew Pinski
Modified: 2024-06-26 16:55 UTC (History)
1 user (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-05-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2024-02-12 02:13:41 UTC
Take:
```
void f(short *a, float *b)
{
        a[0] = b[0];
        a[1] = b[1];
        a[2] = b[2];
        a[3] = b[3];
}

void f1(float *a, short *b)
{
        a[0] = b[0];
        a[1] = b[1];
        a[2] = b[2];
        a[3] = b[3];
}
```
GCC can SLP f1 (which does V4SF->V4HI) but not f1.
LLVM can though:
```
f:
        ldr     q0, [x1]
        fcvtzs  v0.4s, v0.4s
        xtn     v0.4h, v0.4s
        str     d0, [x0]
        ret
```
Comment 1 Richard Biener 2024-02-12 09:06:23 UTC
The vectoizer has some of these tricks but the intermediate conversion allowed is somewhat hard-coded.  I think the C standard says SF -> HI invokes undefined behavior on overflow so the conversion should be valid.
Comment 2 Andrew Pinski 2024-05-11 22:51:49 UTC
I have someone working on this,
Comment 3 Pengxuan Zheng 2024-06-26 16:55:56 UTC
In fact, GCC is able to vectorize through intermediate conversions if we pass -fno-trapping-math. There's a bug (PR54192) open discussing if the flag should be set by default.