Bug 116475 - autovect: may be optimized for min/max
Summary: autovect: may be optimized for min/max
Status: RESOLVED DUPLICATE of bug 102512
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-08-24 04:36 UTC by YunQiang Su
Modified: 2024-08-24 04:57 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description YunQiang Su 2024-08-24 04:36:22 UTC
If we need get the minimal of 8 floats in an array.
We may have code like this

float min(float *x) {
        float ret = x[0];
        for (int i=0; i<8; i++) {       // from 0 in this line
                ret = ret<x[i] ? ret : x[i];
        }
        return ret;
}

While if we compile it with
   aarch64-linux-gnu-gcc -O3 -ffast-math -S xx.c
We get
        ldp     q0, q1, [x0]
        ld1r    {v31.4s}, [x0]      # <-- not needed
        fminnm  v31.4s, v1.4s, v31.4s  # <-- not needed
        fminnm  v0.4s, v31.4s, v0.4s
        fminnmv s0, v0.4s
        ret




And maybe we can also use
float min(float *x) {
        float ret = x[0];
        for (int i=1; i<8; i++) {             // from 1 in this line
                ret = ret<x[i] ? ret : x[i];
        }
        return ret;
}


It will be even worse
        ldr     q31, [x0, 4]
        ld1r    {v30.4s}, [x0]
        ldp     s0, s29, [x0, 20]
        fminnm  v31.4s, v31.4s, v30.4s
        ldr     s30, [x0, 28]
        fminnm  s0, s0, s29
        fminnmv s31, v31.4s
        fminnm  s31, s30, s31
        fminnm  s0, s0, s31
        ret
Comment 1 Andrew Pinski 2024-08-24 04:55:03 UTC
I thought I saw this before.
Comment 2 Andrew Pinski 2024-08-24 04:57:05 UTC
Yep pr 102512

*** This bug has been marked as a duplicate of bug 102512 ***