Bug 112600 - Failed to optimize saturating addition using __builtin_add_overflow
Summary: Failed to optimize saturating addition using __builtin_add_overflow
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-11-17 21:40 UTC by Jonathan Wakely
Modified: 2024-01-22 12:46 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-11-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Wakely 2023-11-17 21:40:48 UTC
These two implementations of C++26 saturating addition (std::add_sat<unsigned>) have equivalent behaviour:

unsigned
add_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_add_overflow(x, y, &z))
	    return z;
    return -1u;
}

unsigned
add_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x + y;
    res |= -(res < x);
    return res;
}


For -O3 on x86_64 GCC uses a branch for the first one:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        jc      .L3
        mov     eax, edi
        ret
.L3:
        or      eax, -1
        ret

For the second one we get better code:

add_sat2(unsigned int, unsigned int):
        add     edi, esi
        sbb     eax, eax
        or      eax, edi
        ret



Clang compiles them both to the same code:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        mov     eax, -1
        cmovae  eax, edi
        ret
Comment 1 Jonathan Wakely 2023-11-17 21:41:47 UTC
Similar results for aarch64 with GCC:

add_sat(unsigned int, unsigned int):
        adds    w0, w0, w1
        bcs     .L7
        ret
.L7:
        mov     w0, -1
        ret
add_sat2(unsigned int, unsigned int):
        adds    w0, w0, w1
        csinv   w0, w0, wzr, cc
        ret
Comment 2 Jonathan Wakely 2023-11-17 21:45:28 UTC
For similar saturating subtraction functions:

unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_sub_overflow(x, y, &z))
	    return z;
    return 0;
}

unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x - y;
    res &= -(res <= x);;
    return res;
}

GCC x86_64 gives:

sub_sat(unsigned int, unsigned int):
        sub     edi, esi
        jb      .L3
        mov     eax, edi
        ret
.L3:
        xor     eax, eax
        ret
sub_sat2(unsigned int, unsigned int):
        sub     edi, esi
        mov     eax, 0
        cmovnb  eax, edi
        ret

GCC aarch64 gives:

sub_sat(unsigned int, unsigned int):
        subs    w2, w0, w1
        mov     w3, 0
        cmp     w0, w1
        csel    w0, w2, w3, cs
        ret
sub_sat2(unsigned int, unsigned int):
        subs    w0, w0, w1
        csel    w0, w0, wzr, cs
        ret


Clang x86_64 gives:

sub_sat(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret
sub_sat2(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret
Comment 3 Andrew Pinski 2023-11-19 21:43:40 UTC
Confirmed.
Comment 4 Richard Biener 2023-11-20 09:23:27 UTC
Note we don't have a good middle-end representation for (integer) saturation.

Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate
value (or the value in case the actual value is left unspecified when
overflow occurs) as additional argument would work.  So have the first
fold into

  <bb 2> :
  _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u);
  _1 = REALPART_EXPR (_8);
  return _1;

of course that defers the code-generation problem to RTL expansion and
would require to pattern match

    res = x + y;
    res |= -(res < x);

to the same for canonicalization purposes.  I would expect that some
targets implement saturating integer arithmetic (not sure about
multiplication or division though).
Comment 5 Tamar Christina 2024-01-22 12:46:12 UTC
Yeah, this is hurting us a lot on vectors as well:

https://godbolt.org/z/ecnGadxcG

The first one isn't vectorizable and the second one we generates too complicated code as the pattern vec_cond is expanded to something quite complicated.

It was too complicated for the intern we had at the time, but I think basically we should still do the conclusion of this thread no? https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html

i.e. we should just make proper saturating IFN.

The only remaining question is, should we make them optab backed or can we do something reasonably better for most target with better fallback code.

This seems to indicate yes since the REALPART_EXPR seems to screw things up a bit.