100874 – [12 Regression] slight missed optimization with min<a,CST>-CST on aarch64 (subs_compare_2.c)

Bug 100874 - [12 Regression] slight missed optimization with min<a,CST>-CST on aarch64 (subs_compare_2.c)

Summary: [12 Regression] slight missed optimization with min<a,CST>-CST on aarch64 (su...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	12.0

Importance:	P3 normal
Target Milestone:	12.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization, testsuite-fail

Duplicates (1):	100873 (view as bug list)
Depends on:
Blocks:

Reported:	2021-06-02 09:32 UTC by Andrew Pinski
Modified:	2022-02-15 18:15 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:	aarch64--
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2021-06-02 09:32:35 UTC

Take:
int f(int a) { return a < 4 ? a - 4 : 0; }
int f1(int a) { int x = a - 4; if (a < 4) return x; return 0; }
int f2(int a) { int t = a < 4 ? a : 4; return t - 4; }

In GCC 11, we produce:
f(int):
        mov     w1, 4
        cmp     w0, w1
        csel    w0, w0, w1, le
        sub     w0, w0, #4
        ret
f1(int):
        subs    w0, w0, 4
        csel    w0, w0, wzr, lt
        ret
f2(int):
        mov     w1, 4
        cmp     w0, w1
        csel    w0, w0, w1, le
        sub     w0, w0, #4
        ret

On the trunk all three give the same code gen (due to PHI-OPT being improved) but of what f and f2 used to give.
All three should produce what f1 had produed instead.
This is gcc.target/aarch64/subs_compare_2.c

Comment 1 Andrew Pinski 2021-06-02 09:33:58 UTC

Note clang even does produce the same code gen for all three:
https://godbolt.org/z/hq357jdMd

Also I put how we could fix this in https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571699.html

Comment 2 Andrew Pinski 2021-06-02 09:35:24 UTC

*** Bug 100873 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Pinski 2021-06-02 09:39:58 UTC

Note here is a 4th variant of the function:
int f1a(int a) { int x = a - 4; return  (a < 4) ? x : 0; }

Comment 4 GCC Commits 2022-02-15 18:10:22 UTC

The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:8e84b2b37a541b27feea69769fc314d534464ebd

commit r12-7249-g8e84b2b37a541b27feea69769fc314d534464ebd
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Tue Feb 15 18:09:35 2022 +0000

    aarch64: Fix subs_compare_2.c regression [PR100874]
    
    subs_compare_2.c tests that we can use a SUBS+CSEL sequence for:
    
    unsigned int
    foo (unsigned int a, unsigned int b)
    {
      unsigned int x = a - 4;
      if (a < 4)
        return x;
      else
        return 0;
    }
    
    As Andrew notes in the PR, this is effectively MIN (x, 4) - 4,
    and it is now recognised as such by phiopt.  Previously it was
    if-converted in RTL instead.
    
    I tried to look for ways to generalise this to other situations
    and to other ?:-style operations, not just max and min.  However,
    for general ?: we tend to push an outer â- CSTâ into the arms of
    the ?: -- at least if one of them simplifies -- so I didn't find
    any useful abstraction.
    
    This patch therefore adds a pattern specifically for
    max/min(a,cst)-cst.  I'm not thrilled at having to do this,
    but it seems like the least worst fix in the circumstances.
    Also, max(a,cst)-cst for unsigned a is a useful saturating
    subtraction idiom and so is arguably worth its own code
    for that reason.
    
    gcc/
            PR target/100874
            * config/aarch64/aarch64-protos.h (aarch64_maxmin_plus_const):
            Declare.
            * config/aarch64/aarch64.cc (aarch64_maxmin_plus_const): New function.
            * config/aarch64/aarch64.md (*aarch64_minmax_plus): New pattern.
    
    gcc/testsuite/
            * gcc.target/aarch64/max_plus_1.c: New test.
            * gcc.target/aarch64/max_plus_2.c: Likewise.
            * gcc.target/aarch64/max_plus_3.c: Likewise.
            * gcc.target/aarch64/max_plus_4.c: Likewise.
            * gcc.target/aarch64/max_plus_5.c: Likewise.
            * gcc.target/aarch64/max_plus_6.c: Likewise.
            * gcc.target/aarch64/max_plus_7.c: Likewise.
            * gcc.target/aarch64/min_plus_1.c: Likewise.
            * gcc.target/aarch64/min_plus_2.c: Likewise.
            * gcc.target/aarch64/min_plus_3.c: Likewise.
            * gcc.target/aarch64/min_plus_4.c: Likewise.
            * gcc.target/aarch64/min_plus_5.c: Likewise.
            * gcc.target/aarch64/min_plus_6.c: Likewise.
            * gcc.target/aarch64/min_plus_7.c: Likewise.

Comment 5 Richard Sandiford 2022-02-15 18:15:43 UTC

Fixed.