These two implementations of C++26 saturating addition (std::add_sat<unsigned>) have equivalent behaviour: unsigned add_sat(unsigned x, unsigned y) noexcept { unsigned z; if (!__builtin_add_overflow(x, y, &z)) return z; return -1u; } unsigned add_sat2(unsigned x, unsigned y) noexcept { unsigned res; res = x + y; res |= -(res < x); return res; } For -O3 on x86_64 GCC uses a branch for the first one: add_sat(unsigned int, unsigned int): add edi, esi jc .L3 mov eax, edi ret .L3: or eax, -1 ret For the second one we get better code: add_sat2(unsigned int, unsigned int): add edi, esi sbb eax, eax or eax, edi ret Clang compiles them both to the same code: add_sat(unsigned int, unsigned int): add edi, esi mov eax, -1 cmovae eax, edi ret
Similar results for aarch64 with GCC: add_sat(unsigned int, unsigned int): adds w0, w0, w1 bcs .L7 ret .L7: mov w0, -1 ret add_sat2(unsigned int, unsigned int): adds w0, w0, w1 csinv w0, w0, wzr, cc ret
For similar saturating subtraction functions: unsigned sub_sat(unsigned x, unsigned y) noexcept { unsigned z; if (!__builtin_sub_overflow(x, y, &z)) return z; return 0; } unsigned sub_sat2(unsigned x, unsigned y) noexcept { unsigned res; res = x - y; res &= -(res <= x);; return res; } GCC x86_64 gives: sub_sat(unsigned int, unsigned int): sub edi, esi jb .L3 mov eax, edi ret .L3: xor eax, eax ret sub_sat2(unsigned int, unsigned int): sub edi, esi mov eax, 0 cmovnb eax, edi ret GCC aarch64 gives: sub_sat(unsigned int, unsigned int): subs w2, w0, w1 mov w3, 0 cmp w0, w1 csel w0, w2, w3, cs ret sub_sat2(unsigned int, unsigned int): subs w0, w0, w1 csel w0, w0, wzr, cs ret Clang x86_64 gives: sub_sat(unsigned int, unsigned int): xor eax, eax sub edi, esi cmovae eax, edi ret sub_sat2(unsigned int, unsigned int): xor eax, eax sub edi, esi cmovae eax, edi ret
Confirmed.
Note we don't have a good middle-end representation for (integer) saturation. Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate value (or the value in case the actual value is left unspecified when overflow occurs) as additional argument would work. So have the first fold into <bb 2> : _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u); _1 = REALPART_EXPR (_8); return _1; of course that defers the code-generation problem to RTL expansion and would require to pattern match res = x + y; res |= -(res < x); to the same for canonicalization purposes. I would expect that some targets implement saturating integer arithmetic (not sure about multiplication or division though).
Yeah, this is hurting us a lot on vectors as well: https://godbolt.org/z/ecnGadxcG The first one isn't vectorizable and the second one we generates too complicated code as the pattern vec_cond is expanded to something quite complicated. It was too complicated for the intern we had at the time, but I think basically we should still do the conclusion of this thread no? https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html i.e. we should just make proper saturating IFN. The only remaining question is, should we make them optab backed or can we do something reasonably better for most target with better fallback code. This seems to indicate yes since the REALPART_EXPR seems to screw things up a bit.
The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183 commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183 Author: Pan Li <pan2.li@intel.com> Date: Wed May 15 10:14:05 2024 +0800 Internal-fn: Support new IFN SAT_ADD for unsigned scalar int This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. Given below example for the unsigned scalar integer uint64_t: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;; pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;; succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;; pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;; succ: EXIT } The below tests are passed for this patch: 1. The riscv fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD to the return true switch case(s). * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD match(es). * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern func decl generated in match.pd match. (match_saturation_arith): New func impl to match the saturation arith. (math_opts_dom_walker::after_dom_children): Try match saturation arith when IOR expr. Signed-off-by: Pan Li <pan2.li@intel.com>
The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010 commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010 Author: Pan Li <pan2.li@intel.com> Date: Wed May 15 10:14:06 2024 +0800 Vect: Support new IFN SAT_ADD for unsigned vector int For vectorize, we leverage the existing vect pattern recog to find the pattern similar to scalar and let the vectorizer to perform the rest part for standard name usadd<mode>3 in vector mode. The riscv vector backend have insn "Vector Single-Width Saturating Add and Subtract" which can be leveraged when expand the usadd<mode>3 in vector mode. For example: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { unsigned i; for (i = 0; i < n; i++) out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i])); } Before this patch: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]); ivtmp_58 = _80 * 8; vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0); vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0); vect__7.11_66 = vect__4.7_61 + vect__6.10_65; mask__8.12_67 = vect__4.7_61 > vect__7.11_66; vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66); .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72); vectp_x.5_60 = vectp_x.5_59 + ivtmp_58; vectp_y.8_64 = vectp_y.8_63 + ivtmp_58; vectp_out.16_75 = vectp_out.16_74 + ivtmp_58; ivtmp_79 = ivtmp_78 - _80; ... } After this patch: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]); ivtmp_46 = _62 * 8; vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0); vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0); vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53); .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54); ... } The below test suites are passed for this patch. * The riscv fully regression tests. * The x86 bootstrap tests. * The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func decl generated by match.pd match. (vect_recog_sat_add_pattern): New func impl to recog the pattern for unsigned SAT_ADD. Signed-off-by: Pan Li <pan2.li@intel.com>
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:b59de4113262f2bee14147eb17eb3592f03d9556 commit r15-634-gb59de4113262f2bee14147eb17eb3592f03d9556 Author: Uros Bizjak <ubizjak@gmail.com> Date: Fri May 17 09:55:49 2024 +0200 i386: Rename sat_plusminus expanders to standard names [PR112600] Rename <sse2_avx2>_<insn><mode>3<mask_name> expander to a standard ssadd, usadd, sssub and ussub name to enable corresponding optab expansion. Also add named expander for MMX modes. PR middle-end/112600 gcc/ChangeLog: * config/i386/mmx.md (<insn><mode>3): New expander. * config/i386/sse.md (<sse2_avx2>_<sat_plusminus:insn><mode>3<mask_name>): Rename expander to <sat_plusminus:insn><mode>3<mask_name>. (<umaxmin:code><mode>3): Update for rename. * config/i386/i386-builtin.def: Update for rename. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-1a.c: New test. * gcc.target/i386/pr112600-1b.c: New test.
The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:34ed2b4593fa98b613632d0dde30b6ba3e7ecad9 commit r15-642-g34ed2b4593fa98b613632d0dde30b6ba3e7ecad9 Author: Pan Li <pan2.li@intel.com> Date: Fri May 17 18:49:46 2024 +0800 RISC-V: Implement IFN SAT_ADD for both the scalar and vector The patch implement the SAT_ADD in the riscv backend as the sample for both the scalar and vector. Given below vector as example: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { unsigned i; for (i = 0; i < n; i++) out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i])); } Before this patch: vec_sat_add_u64: ... vsetvli a5,a3,e64,m1,ta,ma vle64.v v0,0(a1) vle64.v v1,0(a2) slli a4,a5,3 sub a3,a3,a5 add a1,a1,a4 add a2,a2,a4 vadd.vv v1,v0,v1 vmsgtu.vv v0,v0,v1 vmerge.vim v1,v1,-1,v0 vse64.v v1,0(a0) ... After this patch: vec_sat_add_u64: ... vsetvli a5,a3,e64,m1,ta,ma vle64.v v1,0(a1) vle64.v v2,0(a2) slli a4,a5,3 sub a3,a3,a5 add a1,a1,a4 add a2,a2,a4 vsaddu.vv v1,v1,v2 <= Vector Single-Width Saturating Add vse64.v v1,0(a0) ... The below test suites are passed for this patch. * The riscv fully regression tests. * The aarch64 fully regression tests. * The x86 bootstrap tests. * The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * config/riscv/autovec.md (usadd<mode>3): New pattern expand for the unsigned SAT_ADD in vector mode. * config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl to expand usadd<mode>3 pattern. (expand_vec_usadd): Ditto but for vector. * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit the vsadd insn. (expand_vec_usadd): New func impl to expand usadd<mode>3 for vector. * config/riscv/riscv.cc (riscv_expand_usadd): New func impl to expand usadd<mode>3 for scalar. * config/riscv/riscv.md (usadd<mode>3): New pattern expand for the unsigned SAT_ADD in scalar mode. * config/riscv/vector.md: Allow VLS mode for vsaddu. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test. * gcc.target/riscv/sat_arith.h: New test. * gcc.target/riscv/sat_u_add-1.c: New test. * gcc.target/riscv/sat_u_add-2.c: New test. * gcc.target/riscv/sat_u_add-3.c: New test. * gcc.target/riscv/sat_u_add-4.c: New test. * gcc.target/riscv/sat_u_add-run-1.c: New test. * gcc.target/riscv/sat_u_add-run-2.c: New test. * gcc.target/riscv/sat_u_add-run-3.c: New test. * gcc.target/riscv/sat_u_add-run-4.c: New test. * gcc.target/riscv/scalar_sat_binary.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755 commit r15-1030-gabe6d39365476e6be724815d09d072e305018755 Author: Pan Li <pan2.li@intel.com> Date: Tue May 28 15:37:44 2024 +0800 Internal-fn: Support new IFN SAT_SUB for unsigned scalar int This patch would like to add the middle-end presentation for the saturation sub. Aka set the result of add to the min when downflow. It will take the pattern similar as below. SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y)); For example for uint8_t, we have * SAT_SUB (255, 0) => 255 * SAT_SUB (1, 2) => 0 * SAT_SUB (254, 255) => 0 * SAT_SUB (0, 255) => 0 Given below SAT_SUB for uint64 uint64_t sat_sub_u64 (uint64_t x, uint64_t y) { return (x - y) & (-(TYPE)(x >= y)); } Before this patch: uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y) { _Bool _1; long unsigned int _3; uint64_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _1 = x_4(D) >= y_5(D); _3 = x_4(D) - y_5(D); _6 = _1 ? _3 : 0; return _6; ;; succ: EXIT } After this patch: uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y) { uint64_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call] return _6; ;; succ: EXIT } The below tests are running for this patch: *. The riscv fully regression tests. *. The x86 bootstrap tests. *. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB. * match.pd: Add new match for SAT_SUB. * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add new decl for generated in match.pd. (build_saturation_binary_arith_call): Add new helper function to build the gimple call to binary SAT alu. (match_saturation_arith): Rename from. (match_unsigned_saturation_add): Rename to. (match_unsigned_saturation_sub): Add new func to match the unsigned sat sub. (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching try when COND_EXPR. Signed-off-by: Pan Li <pan2.li@intel.com>
(In reply to Jonathan Wakely from comment #0) > These two implementations of C++26 saturating addition > (std::add_sat<unsigned>) have equivalent behaviour: > > unsigned > add_sat(unsigned x, unsigned y) noexcept > { > unsigned z; > if (!__builtin_add_overflow(x, y, &z)) > return z; > return -1u; > } [...] > For -O3 on x86_64 GCC uses a branch for the first one: > > add_sat(unsigned int, unsigned int): > add edi, esi > jc .L3 > mov eax, edi > ret > .L3: > or eax, -1 > ret The reason for failed if-conversion to cmove is due to the "weird" compare arguments, the consequence of addsi3_cc_overflow_1 definition: (insn 9 4 10 2 (parallel [ (set (reg:CCC 17 flags) (compare:CCC (plus:SI (reg:SI 106) (reg:SI 107)) (reg:SI 106))) (set (reg:SI 104) (plus:SI (reg:SI 106) (reg:SI 107))) ]) "sadd.c":7:12 477 {addsi3_cc_overflow_1} (expr_list:REG_DEAD (reg:SI 107) (expr_list:REG_DEAD (reg:SI 106) (nil)))) the noce_try_cmove path fails in noce_emit_cmove: Breakpoint 1, noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at ../../git/gcc/gcc/ifcvt.cc:1774 1774 return NULL_RTX; (gdb) list 1766 /* Don't even try if the comparison operands are weird 1767 except that the target supports cbranchcc4. */ 1768 if (! general_operand (cmp_a, GET_MODE (cmp_a)) 1769 || ! general_operand (cmp_b, GET_MODE (cmp_b))) 1770 { 1771 if (!have_cbranchcc4 1772 || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC 1773 || cmp_b != const0_rtx) 1774 return NULL_RTX; 1775 } 1776 1777 target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode }, 1778 vtrue, vfalse, GET_MODE (x), (gdb) bt #0 noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at ../../git/gcc/gcc/ifcvt.cc:1774 #1 0x00000000020d995b in noce_try_cmove (if_info=0x7fffffffd750) at ../../git/gcc/gcc/ifcvt.cc:1884 #2 0x00000000020dec37 in noce_process_if_block (if_info=0x7fffffffd750) at ../../git/gcc/gcc/ifcvt.cc:4149 #3 0x00000000020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80, then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1) at ../../git/gcc/gcc/ifcvt.cc:4716 #4 0x00000000020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at ../../git/gcc/gcc/ifcvt.cc:4921 #5 0x00000000020e3255 in if_convert (after_combine=true) at ../../git/gcc/gcc/ifcvt.cc:6068 (gdb) p debug_rtx (cmp_a) (plus:SI (reg:SI 106) (reg:SI 107)) $1 = void (gdb) p debug_rtx (cmp_b) (reg:SI 106) $2 = void The above cmp_a RTX fails general_operand check. Please note that similar testcase: unsigned sub_sat(unsigned x, unsigned y) { unsigned z; return __builtin_sub_overflow(x, y, &z) ? 0 : z; } results in the expected: subl %esi, %edi # 52 [c=4 l=2] *subsi_3/0 movl $0, %eax # 53 [c=4 l=5] *movsi_internal/0 cmovnb %edi, %eax # 54 [c=4 l=3] *movsicc_noc/0 ret # 50 [c=0 l=1] simple_return_internal due to: (insn 9 4 10 2 (parallel [ (set (reg:CC 17 flags) (compare:CC (reg:SI 106) (reg:SI 107))) (set (reg:SI 104) (minus:SI (reg:SI 106) (reg:SI 107))) ]) "sadd.c":28:12 416 {*subsi_3} (expr_list:REG_DEAD (reg:SI 107) (expr_list:REG_DEAD (reg:SI 106) (nil)))) So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should be improved to handle the above "weird" operand form. Let's ask Jakub.
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:366d45c8d4911dc7874d2e64cf2583c0133b8dd5 commit r15-1077-g366d45c8d4911dc7874d2e64cf2583c0133b8dd5 Author: Uros Bizjak <ubizjak@gmail.com> Date: Thu Jun 6 19:18:41 2024 +0200 testsuite/i386: Add vector sat_sub testcases [PR112600] PR middle-end/112600 gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-2a.c: New test. * gcc.target/i386/pr112600-2b.c: New test.
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:de05e44b2ad9638d04173393b1eae3c38b2c3864 commit r15-1113-gde05e44b2ad9638d04173393b1eae3c38b2c3864 Author: Uros Bizjak <ubizjak@gmail.com> Date: Sat Jun 8 12:17:11 2024 +0200 i386: Implement .SAT_ADD for unsigned scalar integers [PR112600] The following testcase: unsigned add_sat(unsigned x, unsigned y) { unsigned z; return __builtin_add_overflow(x, y, &z) ? -1u : z; } currently compiles (-O2) to: add_sat: addl %esi, %edi jc .L3 movl %edi, %eax ret .L3: orl $-1, %eax ret We can expand through usadd{m}3 optab to use carry flag from the addition and generate branchless code using SBB instruction implementing: unsigned res = x + y; res |= -(res < x); add_sat: addl %esi, %edi sbbl %eax, %eax orl %edi, %eax ret PR target/112600 gcc/ChangeLog: * config/i386/i386.md (usadd<mode>3): New expander. (x86_mov<mode>cc_0_m1_neg): Use SWI mode iterator. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-a.c: New test.
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01 commit r15-1122-g8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01 Author: Uros Bizjak <ubizjak@gmail.com> Date: Sun Jun 9 12:09:13 2024 +0200 i386: Implement .SAT_SUB for unsigned scalar integers [PR112600] The following testcase: unsigned sub_sat (unsigned x, unsigned y) { unsigned res; res = x - y; res &= -(x >= y); return res; } currently compiles (-O2) to: sub_sat: movl %edi, %edx xorl %eax, %eax subl %esi, %edx cmpl %esi, %edi setnb %al negl %eax andl %edx, %eax ret We can expand through ussub{m}3 optab to use carry flag from the subtraction and generate code using SBB instruction implementing: unsigned res = x - y; res &= ~(-(x < y)); sub_sat: subl %esi, %edi sbbl %eax, %eax notl %eax andl %edi, %eax ret PR target/112600 gcc/ChangeLog: * config/i386/i386.md (ussub<mode>3): New expander. (sub<mode>_3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-b.c: New test.
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:05b95238be648c9cf8af2516930af6a7b637a2b8 commit r15-1183-g05b95238be648c9cf8af2516930af6a7b637a2b8 Author: Uros Bizjak <ubizjak@gmail.com> Date: Tue Jun 11 16:00:31 2024 +0200 i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600] For TARGET_CMOV targets emit insn sequence involving conditonal move. .SAT_ADD: addl %esi, %edi movl $-1, %eax cmovnc %edi, %eax ret .SAT_SUB: subl %esi, %edi movl $0, %eax cmovnc %edi, %eax ret PR target/112600 gcc/ChangeLog: * config/i386/i386.md (usadd<mode>3): Emit insn sequence involving conditional move for TARGET_CMOVE targets. (ussub<mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-a.c: Also scan for cmov. * gcc.target/i386/pr112600-b.c: Ditto.
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:19d751601d012bbe31512d26f968e75873a408ab commit r15-3612-g19d751601d012bbe31512d26f968e75873a408ab Author: Uros Bizjak <ubizjak@gmail.com> Date: Thu Sep 12 20:34:28 2024 +0200 i386: Implement SAT_ADD for signed vector integers Enable V4QI, V2QI and V2HI mode signed saturated arithmetic insn patterns and add a couple of testcases to test for PADDSB and PADDSW instructions. PR target/112600 gcc/ChangeLog: * config/i386/mmx.md (<sat_plusminus:insn><mode>3): Rename from *<sat_plusminus:insn><mode>3. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-3a.c: New test. * gcc.target/i386/pr112600-3b.c: New test.