These two implementations of C++26 saturating addition (std::add_sat<unsigned>) have equivalent behaviour: unsigned add_sat(unsigned x, unsigned y) noexcept { unsigned z; if (!__builtin_add_overflow(x, y, &z)) return z; return -1u; } unsigned add_sat2(unsigned x, unsigned y) noexcept { unsigned res; res = x + y; res |= -(res < x); return res; } For -O3 on x86_64 GCC uses a branch for the first one: add_sat(unsigned int, unsigned int): add edi, esi jc .L3 mov eax, edi ret .L3: or eax, -1 ret For the second one we get better code: add_sat2(unsigned int, unsigned int): add edi, esi sbb eax, eax or eax, edi ret Clang compiles them both to the same code: add_sat(unsigned int, unsigned int): add edi, esi mov eax, -1 cmovae eax, edi ret
Similar results for aarch64 with GCC: add_sat(unsigned int, unsigned int): adds w0, w0, w1 bcs .L7 ret .L7: mov w0, -1 ret add_sat2(unsigned int, unsigned int): adds w0, w0, w1 csinv w0, w0, wzr, cc ret
For similar saturating subtraction functions: unsigned sub_sat(unsigned x, unsigned y) noexcept { unsigned z; if (!__builtin_sub_overflow(x, y, &z)) return z; return 0; } unsigned sub_sat2(unsigned x, unsigned y) noexcept { unsigned res; res = x - y; res &= -(res <= x);; return res; } GCC x86_64 gives: sub_sat(unsigned int, unsigned int): sub edi, esi jb .L3 mov eax, edi ret .L3: xor eax, eax ret sub_sat2(unsigned int, unsigned int): sub edi, esi mov eax, 0 cmovnb eax, edi ret GCC aarch64 gives: sub_sat(unsigned int, unsigned int): subs w2, w0, w1 mov w3, 0 cmp w0, w1 csel w0, w2, w3, cs ret sub_sat2(unsigned int, unsigned int): subs w0, w0, w1 csel w0, w0, wzr, cs ret Clang x86_64 gives: sub_sat(unsigned int, unsigned int): xor eax, eax sub edi, esi cmovae eax, edi ret sub_sat2(unsigned int, unsigned int): xor eax, eax sub edi, esi cmovae eax, edi ret
Confirmed.
Note we don't have a good middle-end representation for (integer) saturation. Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate value (or the value in case the actual value is left unspecified when overflow occurs) as additional argument would work. So have the first fold into <bb 2> : _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u); _1 = REALPART_EXPR (_8); return _1; of course that defers the code-generation problem to RTL expansion and would require to pattern match res = x + y; res |= -(res < x); to the same for canonicalization purposes. I would expect that some targets implement saturating integer arithmetic (not sure about multiplication or division though).
Yeah, this is hurting us a lot on vectors as well: https://godbolt.org/z/ecnGadxcG The first one isn't vectorizable and the second one we generates too complicated code as the pattern vec_cond is expanded to something quite complicated. It was too complicated for the intern we had at the time, but I think basically we should still do the conclusion of this thread no? https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html i.e. we should just make proper saturating IFN. The only remaining question is, should we make them optab backed or can we do something reasonably better for most target with better fallback code. This seems to indicate yes since the REALPART_EXPR seems to screw things up a bit.