Bug 112600 - Failed to optimize saturating addition using __builtin_add_overflow
Summary: Failed to optimize saturating addition using __builtin_add_overflow
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-11-17 21:40 UTC by Jonathan Wakely
Modified: 2024-09-12 18:50 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-11-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Wakely 2023-11-17 21:40:48 UTC
These two implementations of C++26 saturating addition (std::add_sat<unsigned>) have equivalent behaviour:

unsigned
add_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_add_overflow(x, y, &z))
	    return z;
    return -1u;
}

unsigned
add_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x + y;
    res |= -(res < x);
    return res;
}


For -O3 on x86_64 GCC uses a branch for the first one:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        jc      .L3
        mov     eax, edi
        ret
.L3:
        or      eax, -1
        ret

For the second one we get better code:

add_sat2(unsigned int, unsigned int):
        add     edi, esi
        sbb     eax, eax
        or      eax, edi
        ret



Clang compiles them both to the same code:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        mov     eax, -1
        cmovae  eax, edi
        ret
Comment 1 Jonathan Wakely 2023-11-17 21:41:47 UTC
Similar results for aarch64 with GCC:

add_sat(unsigned int, unsigned int):
        adds    w0, w0, w1
        bcs     .L7
        ret
.L7:
        mov     w0, -1
        ret
add_sat2(unsigned int, unsigned int):
        adds    w0, w0, w1
        csinv   w0, w0, wzr, cc
        ret
Comment 2 Jonathan Wakely 2023-11-17 21:45:28 UTC
For similar saturating subtraction functions:

unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_sub_overflow(x, y, &z))
	    return z;
    return 0;
}

unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x - y;
    res &= -(res <= x);;
    return res;
}

GCC x86_64 gives:

sub_sat(unsigned int, unsigned int):
        sub     edi, esi
        jb      .L3
        mov     eax, edi
        ret
.L3:
        xor     eax, eax
        ret
sub_sat2(unsigned int, unsigned int):
        sub     edi, esi
        mov     eax, 0
        cmovnb  eax, edi
        ret

GCC aarch64 gives:

sub_sat(unsigned int, unsigned int):
        subs    w2, w0, w1
        mov     w3, 0
        cmp     w0, w1
        csel    w0, w2, w3, cs
        ret
sub_sat2(unsigned int, unsigned int):
        subs    w0, w0, w1
        csel    w0, w0, wzr, cs
        ret


Clang x86_64 gives:

sub_sat(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret
sub_sat2(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret
Comment 3 Andrew Pinski 2023-11-19 21:43:40 UTC
Confirmed.
Comment 4 Richard Biener 2023-11-20 09:23:27 UTC
Note we don't have a good middle-end representation for (integer) saturation.

Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate
value (or the value in case the actual value is left unspecified when
overflow occurs) as additional argument would work.  So have the first
fold into

  <bb 2> :
  _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u);
  _1 = REALPART_EXPR (_8);
  return _1;

of course that defers the code-generation problem to RTL expansion and
would require to pattern match

    res = x + y;
    res |= -(res < x);

to the same for canonicalization purposes.  I would expect that some
targets implement saturating integer arithmetic (not sure about
multiplication or division though).
Comment 5 Tamar Christina 2024-01-22 12:46:12 UTC
Yeah, this is hurting us a lot on vectors as well:

https://godbolt.org/z/ecnGadxcG

The first one isn't vectorizable and the second one we generates too complicated code as the pattern vec_cond is expanded to something quite complicated.

It was too complicated for the intern we had at the time, but I think basically we should still do the conclusion of this thread no? https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html

i.e. we should just make proper saturating IFN.

The only remaining question is, should we make them optab backed or can we do something reasonably better for most target with better fallback code.

This seems to indicate yes since the REALPART_EXPR seems to screw things up a bit.
Comment 6 GCC Commits 2024-05-16 12:09:10 UTC
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183

commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183
Author: Pan Li <pan2.li@intel.com>
Date:   Wed May 15 10:14:05 2024 +0800

    Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
    
    This patch would like to add the middle-end presentation for the
    saturation add.  Aka set the result of add to the max when overflow.
    It will take the pattern similar as below.
    
    SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
    
    Take uint8_t as example, we will have:
    
    * SAT_ADD (1, 254)   => 255.
    * SAT_ADD (1, 255)   => 255.
    * SAT_ADD (2, 255)   => 255.
    * SAT_ADD (255, 255) => 255.
    
    Given below example for the unsigned scalar integer uint64_t:
    
    uint64_t sat_add_u64 (uint64_t x, uint64_t y)
    {
      return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
    }
    
    Before this patch:
    uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
    {
      long unsigned int _1;
      _Bool _2;
      long unsigned int _3;
      long unsigned int _4;
      uint64_t _7;
      long unsigned int _10;
      __complex__ long unsigned int _11;
    
    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
      _1 = REALPART_EXPR <_11>;
      _10 = IMAGPART_EXPR <_11>;
      _2 = _10 != 0;
      _3 = (long unsigned int) _2;
      _4 = -_3;
      _7 = _1 | _4;
      return _7;
    ;;    succ:       EXIT
    
    }
    
    After this patch:
    uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
    {
      uint64_t _7;
    
    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
      return _7;
    ;;    succ:       EXIT
    }
    
    The below tests are passed for this patch:
    1. The riscv fully regression tests.
    3. The x86 bootstrap tests.
    4. The x86 fully regression tests.
    
            PR target/51492
            PR target/112600
    
    gcc/ChangeLog:
    
            * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
            to the return true switch case(s).
            * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
            * match.pd: Add unsigned SAT_ADD match(es).
            * optabs.def (OPTAB_NL): Remove fixed-point limitation for
            us/ssadd.
            * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
            extern func decl generated in match.pd match.
            (match_saturation_arith): New func impl to match the saturation arith.
            (math_opts_dom_walker::after_dom_children): Try match saturation
            arith when IOR expr.
    
    Signed-off-by: Pan Li <pan2.li@intel.com>
Comment 7 GCC Commits 2024-05-16 12:09:44 UTC
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010

commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li <pan2.li@intel.com>
Date:   Wed May 15 10:14:06 2024 +0800

    Vect: Support new IFN SAT_ADD for unsigned vector int
    
    For vectorize, we leverage the existing vect pattern recog to find
    the pattern similar to scalar and let the vectorizer to perform
    the rest part for standard name usadd<mode>3 in vector mode.
    The riscv vector backend have insn "Vector Single-Width Saturating
    Add and Subtract" which can be leveraged when expand the usadd<mode>3
    in vector mode.  For example:
    
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      unsigned i;
    
      for (i = 0; i < n; i++)
        out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
    }
    
    Before this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
      ivtmp_58 = _80 * 8;
      vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
      vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
      vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
      mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
      vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
        ... }, vect__7.11_66);
      .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
      vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
      vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
      vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
      ivtmp_79 = ivtmp_78 - _80;
      ...
    }
    
    After this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
      ivtmp_46 = _62 * 8;
      vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
      vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
      vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
      .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
      ...
    }
    
    The below test suites are passed for this patch.
    * The riscv fully regression tests.
    * The x86 bootstrap tests.
    * The x86 fully regression tests.
    
            PR target/51492
            PR target/112600
    
    gcc/ChangeLog:
    
            * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
            func decl generated by match.pd match.
            (vect_recog_sat_add_pattern): New func impl to recog the pattern
            for unsigned SAT_ADD.
    
    Signed-off-by: Pan Li <pan2.li@intel.com>
Comment 8 GCC Commits 2024-05-17 14:57:10 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:b59de4113262f2bee14147eb17eb3592f03d9556

commit r15-634-gb59de4113262f2bee14147eb17eb3592f03d9556
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Fri May 17 09:55:49 2024 +0200

    i386: Rename sat_plusminus expanders to standard names [PR112600]
    
    Rename <sse2_avx2>_<insn><mode>3<mask_name> expander to a standard ssadd,
    usadd, sssub and ussub name to enable corresponding optab expansion.
    
    Also add named expander for MMX modes.
    
            PR middle-end/112600
    
    gcc/ChangeLog:
    
            * config/i386/mmx.md (<insn><mode>3): New expander.
            * config/i386/sse.md
            (<sse2_avx2>_<sat_plusminus:insn><mode>3<mask_name>):
            Rename expander to <sat_plusminus:insn><mode>3<mask_name>.
            (<umaxmin:code><mode>3): Update for rename.
            * config/i386/i386-builtin.def: Update for rename.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-1a.c: New test.
            * gcc.target/i386/pr112600-1b.c: New test.
Comment 9 GCC Commits 2024-05-18 02:17:02 UTC
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:34ed2b4593fa98b613632d0dde30b6ba3e7ecad9

commit r15-642-g34ed2b4593fa98b613632d0dde30b6ba3e7ecad9
Author: Pan Li <pan2.li@intel.com>
Date:   Fri May 17 18:49:46 2024 +0800

    RISC-V: Implement IFN SAT_ADD for both the scalar and vector
    
    The patch implement the SAT_ADD in the riscv backend as the
    sample for both the scalar and vector.  Given below vector
    as example:
    
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      unsigned i;
    
      for (i = 0; i < n; i++)
        out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
    }
    
    Before this patch:
    vec_sat_add_u64:
      ...
      vsetvli a5,a3,e64,m1,ta,ma
      vle64.v v0,0(a1)
      vle64.v v1,0(a2)
      slli    a4,a5,3
      sub     a3,a3,a5
      add     a1,a1,a4
      add     a2,a2,a4
      vadd.vv v1,v0,v1
      vmsgtu.vv       v0,v0,v1
      vmerge.vim      v1,v1,-1,v0
      vse64.v v1,0(a0)
      ...
    
    After this patch:
    vec_sat_add_u64:
      ...
      vsetvli a5,a3,e64,m1,ta,ma
      vle64.v v1,0(a1)
      vle64.v v2,0(a2)
      slli    a4,a5,3
      sub     a3,a3,a5
      add     a1,a1,a4
      add     a2,a2,a4
      vsaddu.vv       v1,v1,v2  <=  Vector Single-Width Saturating Add
      vse64.v v1,0(a0)
      ...
    
    The below test suites are passed for this patch.
    * The riscv fully regression tests.
    * The aarch64 fully regression tests.
    * The x86 bootstrap tests.
    * The x86 fully regression tests.
    
            PR target/51492
            PR target/112600
    
    gcc/ChangeLog:
    
            * config/riscv/autovec.md (usadd<mode>3): New pattern expand for
            the unsigned SAT_ADD in vector mode.
            * config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
            to expand usadd<mode>3 pattern.
            (expand_vec_usadd): Ditto but for vector.
            * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
            the vsadd insn.
            (expand_vec_usadd): New func impl to expand usadd<mode>3 for vector.
            * config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
            expand usadd<mode>3 for scalar.
            * config/riscv/riscv.md (usadd<mode>3): New pattern expand for
            the unsigned SAT_ADD in scalar mode.
            * config/riscv/vector.md: Allow VLS mode for vsaddu.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
            * gcc.target/riscv/sat_arith.h: New test.
            * gcc.target/riscv/sat_u_add-1.c: New test.
            * gcc.target/riscv/sat_u_add-2.c: New test.
            * gcc.target/riscv/sat_u_add-3.c: New test.
            * gcc.target/riscv/sat_u_add-4.c: New test.
            * gcc.target/riscv/sat_u_add-run-1.c: New test.
            * gcc.target/riscv/sat_u_add-run-2.c: New test.
            * gcc.target/riscv/sat_u_add-run-3.c: New test.
            * gcc.target/riscv/sat_u_add-run-4.c: New test.
            * gcc.target/riscv/scalar_sat_binary.h: New test.
    
    Signed-off-by: Pan Li <pan2.li@intel.com>
Comment 10 GCC Commits 2024-06-05 08:38:45 UTC
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755

commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li <pan2.li@intel.com>
Date:   Tue May 28 15:37:44 2024 +0800

    Internal-fn: Support new IFN SAT_SUB for unsigned scalar int
    
    This patch would like to add the middle-end presentation for the
    saturation sub.  Aka set the result of add to the min when downflow.
    It will take the pattern similar as below.
    
    SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
    
    For example for uint8_t, we have
    
    * SAT_SUB (255, 0)   => 255
    * SAT_SUB (1, 2)     => 0
    * SAT_SUB (254, 255) => 0
    * SAT_SUB (0, 255)   => 0
    
    Given below SAT_SUB for uint64
    
    uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
    {
      return (x - y) & (-(TYPE)(x >= y));
    }
    
    Before this patch:
    uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
    {
      _Bool _1;
      long unsigned int _3;
      uint64_t _6;
    
    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _1 = x_4(D) >= y_5(D);
      _3 = x_4(D) - y_5(D);
      _6 = _1 ? _3 : 0;
      return _6;
    ;;    succ:       EXIT
    }
    
    After this patch:
    uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
    {
      uint64_t _6;
    
    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
      return _6;
    ;;    succ:       EXIT
    }
    
    The below tests are running for this patch:
    *. The riscv fully regression tests.
    *. The x86 bootstrap tests.
    *. The x86 fully regression tests.
    
            PR target/51492
            PR target/112600
    
    gcc/ChangeLog:
    
            * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
            * match.pd: Add new match for SAT_SUB.
            * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
            * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
            new decl for generated in match.pd.
            (build_saturation_binary_arith_call): Add new helper function
            to build the gimple call to binary SAT alu.
            (match_saturation_arith): Rename from.
            (match_unsigned_saturation_add): Rename to.
            (match_unsigned_saturation_sub): Add new func to match the
            unsigned sat sub.
            (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
            try when COND_EXPR.
    
    Signed-off-by: Pan Li <pan2.li@intel.com>
Comment 11 Uroš Bizjak 2024-06-05 20:42:51 UTC
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat<unsigned>) have equivalent behaviour:
> 
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
>     unsigned z;
>     if (!__builtin_add_overflow(x, y, &z))
> 	    return z;
>     return -1u;
> }

[...]

> For -O3 on x86_64 GCC uses a branch for the first one:
> 
> add_sat(unsigned int, unsigned int):
>         add     edi, esi
>         jc      .L3
>         mov     eax, edi
>         ret
> .L3:
>         or      eax, -1
>         ret

The reason for failed if-conversion to cmove is due to the "weird" compare arguments, the consequence of addsi3_cc_overflow_1 definition:

(insn 9 4 10 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:SI (reg:SI 106)
                        (reg:SI 107))
                    (reg:SI 106)))
            (set (reg:SI 104)
                (plus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

the noce_try_cmove path fails in noce_emit_cmove:

Breakpoint 1, noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at ../../git/gcc/gcc/ifcvt.cc:1774
1774                return NULL_RTX;
(gdb) list
1766          /* Don't even try if the comparison operands are weird
1767             except that the target supports cbranchcc4.  */
1768          if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769              || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770            {
1771              if (!have_cbranchcc4
1772                  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773                  || cmp_b != const0_rtx)
1774                return NULL_RTX;
1775            }
1776
1777          target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode },
1778                                          vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0  noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at ../../git/gcc/gcc/ifcvt.cc:1774
#1  0x00000000020d995b in noce_try_cmove (if_info=0x7fffffffd750) at ../../git/gcc/gcc/ifcvt.cc:1884
#2  0x00000000020dec37 in noce_process_if_block (if_info=0x7fffffffd750) at ../../git/gcc/gcc/ifcvt.cc:4149
#3  0x00000000020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80, then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
    at ../../git/gcc/gcc/ifcvt.cc:4716
#4  0x00000000020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at ../../git/gcc/gcc/ifcvt.cc:4921
#5  0x00000000020e3255 in if_convert (after_combine=true) at ../../git/gcc/gcc/ifcvt.cc:6068

(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
    (reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void

The above cmp_a RTX fails general_operand check.

Please note that similar testcase:

unsigned
sub_sat(unsigned x, unsigned y)
{
    unsigned z;
    return __builtin_sub_overflow(x, y, &z) ? 0 : z;
}

results in the expected:

        subl    %esi, %edi      # 52    [c=4 l=2]  *subsi_3/0
        movl    $0, %eax        # 53    [c=4 l=5]  *movsi_internal/0
        cmovnb  %edi, %eax      # 54    [c=4 l=3]  *movsicc_noc/0
        ret             # 50    [c=0 l=1]  simple_return_internal

due to:

(insn 9 4 10 2 (parallel [
            (set (reg:CC 17 flags)
                (compare:CC (reg:SI 106)
                    (reg:SI 107)))
            (set (reg:SI 104)
                (minus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":28:12 416 {*subsi_3}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should be improved to handle the above "weird" operand form.

Let's ask Jakub.
Comment 12 GCC Commits 2024-06-06 17:45:41 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:366d45c8d4911dc7874d2e64cf2583c0133b8dd5

commit r15-1077-g366d45c8d4911dc7874d2e64cf2583c0133b8dd5
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Jun 6 19:18:41 2024 +0200

    testsuite/i386: Add vector sat_sub testcases [PR112600]
    
            PR middle-end/112600
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-2a.c: New test.
            * gcc.target/i386/pr112600-2b.c: New test.
Comment 13 GCC Commits 2024-06-08 10:19:51 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:de05e44b2ad9638d04173393b1eae3c38b2c3864

commit r15-1113-gde05e44b2ad9638d04173393b1eae3c38b2c3864
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Sat Jun 8 12:17:11 2024 +0200

    i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]
    
    The following testcase:
    
    unsigned
    add_sat(unsigned x, unsigned y)
    {
        unsigned z;
        return __builtin_add_overflow(x, y, &z) ? -1u : z;
    }
    
    currently compiles (-O2) to:
    
    add_sat:
            addl    %esi, %edi
            jc      .L3
            movl    %edi, %eax
            ret
    .L3:
            orl     $-1, %eax
            ret
    
    We can expand through usadd{m}3 optab to use carry flag from the addition
    and generate branchless code using SBB instruction implementing:
    
        unsigned res = x + y;
        res |= -(res < x);
    
    add_sat:
            addl    %esi, %edi
            sbbl    %eax, %eax
            orl     %edi, %eax
            ret
    
            PR target/112600
    
    gcc/ChangeLog:
    
            * config/i386/i386.md (usadd<mode>3): New expander.
            (x86_mov<mode>cc_0_m1_neg): Use SWI mode iterator.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-a.c: New test.
Comment 14 GCC Commits 2024-06-09 10:10:42 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01

commit r15-1122-g8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Sun Jun 9 12:09:13 2024 +0200

    i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]
    
    The following testcase:
    
    unsigned
    sub_sat (unsigned x, unsigned y)
    {
      unsigned res;
      res = x - y;
      res &= -(x >= y);
      return res;
    }
    
    currently compiles (-O2) to:
    
    sub_sat:
            movl    %edi, %edx
            xorl    %eax, %eax
            subl    %esi, %edx
            cmpl    %esi, %edi
            setnb   %al
            negl    %eax
            andl    %edx, %eax
            ret
    
    We can expand through ussub{m}3 optab to use carry flag from the subtraction
    and generate code using SBB instruction implementing:
    
        unsigned res = x - y;
        res &= ~(-(x < y));
    
    sub_sat:
            subl    %esi, %edi
            sbbl    %eax, %eax
            notl    %eax
            andl    %edi, %eax
            ret
    
            PR target/112600
    
    gcc/ChangeLog:
    
            * config/i386/i386.md (ussub<mode>3): New expander.
            (sub<mode>_3): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-b.c: New test.
Comment 15 GCC Commits 2024-06-11 16:59:31 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:05b95238be648c9cf8af2516930af6a7b637a2b8

commit r15-1183-g05b95238be648c9cf8af2516930af6a7b637a2b8
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Jun 11 16:00:31 2024 +0200

    i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]
    
    For TARGET_CMOV targets emit insn sequence involving conditonal move.
    
    .SAT_ADD:
    
            addl    %esi, %edi
            movl    $-1, %eax
            cmovnc  %edi, %eax
            ret
    
    .SAT_SUB:
    
            subl    %esi, %edi
            movl    $0, %eax
            cmovnc  %edi, %eax
            ret
    
            PR target/112600
    
    gcc/ChangeLog:
    
            * config/i386/i386.md (usadd<mode>3): Emit insn sequence
            involving conditional move for TARGET_CMOVE targets.
            (ussub<mode>3): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-a.c: Also scan for cmov.
            * gcc.target/i386/pr112600-b.c: Ditto.
Comment 16 GCC Commits 2024-09-12 18:50:13 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:19d751601d012bbe31512d26f968e75873a408ab

commit r15-3612-g19d751601d012bbe31512d26f968e75873a408ab
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Sep 12 20:34:28 2024 +0200

    i386: Implement SAT_ADD for signed vector integers
    
    Enable V4QI, V2QI and V2HI mode signed saturated arithmetic insn patterns
    and add a couple of testcases to test for PADDSB and PADDSW instructions.
    
            PR target/112600
    
    gcc/ChangeLog:
    
            * config/i386/mmx.md (<sat_plusminus:insn><mode>3): Rename
            from *<sat_plusminus:insn><mode>3.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr112600-3a.c: New test.
            * gcc.target/i386/pr112600-3b.c: New test.