116475 – autovect: may be optimized for min/max

Bug 116475 - autovect: may be optimized for min/max

Summary: autovect: may be optimized for min/max

Status:	RESOLVED DUPLICATE of bug 102512

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	15.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2024-08-24 04:36 UTC by YunQiang Su
Modified:	2024-08-24 04:57 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description YunQiang Su 2024-08-24 04:36:22 UTC

If we need get the minimal of 8 floats in an array.
We may have code like this

float min(float *x) {
        float ret = x[0];
        for (int i=0; i<8; i++) {       // from 0 in this line
                ret = ret<x[i] ? ret : x[i];
        }
        return ret;
}

While if we compile it with
   aarch64-linux-gnu-gcc -O3 -ffast-math -S xx.c
We get
        ldp     q0, q1, [x0]
        ld1r    {v31.4s}, [x0]      # <-- not needed
        fminnm  v31.4s, v1.4s, v31.4s  # <-- not needed
        fminnm  v0.4s, v31.4s, v0.4s
        fminnmv s0, v0.4s
        ret




And maybe we can also use
float min(float *x) {
        float ret = x[0];
        for (int i=1; i<8; i++) {             // from 1 in this line
                ret = ret<x[i] ? ret : x[i];
        }
        return ret;
}


It will be even worse
        ldr     q31, [x0, 4]
        ld1r    {v30.4s}, [x0]
        ldp     s0, s29, [x0, 20]
        fminnm  v31.4s, v31.4s, v30.4s
        ldr     s30, [x0, 28]
        fminnm  s0, s0, s29
        fminnmv s31, v31.4s
        fminnm  s31, s30, s31
        fminnm  s0, s0, s31
        ret

Comment 1 Andrew Pinski 2024-08-24 04:55:03 UTC

I thought I saw this before.

Comment 2 Andrew Pinski 2024-08-24 04:57:05 UTC

Yep pr 102512

*** This bug has been marked as a duplicate of bug 102512 ***