Created attachment 55805 [details] bug example template<typename T> inline constexpr T addc(T a,T b,bool carryin,bool& carryout) noexcept { T s; auto c1 = __builtin_add_overflow(a, b, __builtin_addressof(s)); auto c2 = __builtin_add_overflow(s, carryin, __builtin_addressof(s)); carryout = c1 | c2; return s; } void test_addc(unsigned long long* a,unsigned long long* b,unsigned long long* r) { bool carry{}; r[0]=addc(a[0],b[0],carry,carry); r[1]=addc(a[1],b[1],carry,carry); } I copied the example from gcc documentation https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html But gcc seems to fail to understand the pattern correctly even with what the document describes. gcc: https://godbolt.org/z/Whaaahn41 clang: https://godbolt.org/z/Ma6rvaYd6 Looks like a bug here.
Just use __int128 addition if all you want is double-word addition (or long long for 32-bit arches)?
Created attachment 55809 [details] gcc14-pr111209.patch Anyway, here is a patch that makes it match, but it is getting ugly to avoid making it match prematurely and break other matching.
(In reply to Jakub Jelinek from comment #1) > Just use __int128 addition if all you want is double-word addition (or long > long for 32-bit arches)? Well, I've presented this merely as an illustrative example. The length can actually be arbitrary. I've directly taken the code from the GCC documentation, but it doesn't appear to perform as the document asserts. " Built-in Function: unsigned int __builtin_addc (unsigned int a, unsigned int b, unsigned int carry_in, unsigned int *carry_out) Built-in Function: unsigned long int __builtin_addcl (unsigned long int a, unsigned long int b, unsigned int carry_in, unsigned long int *carry_out) Built-in Function: unsigned long long int __builtin_addcll (unsigned long long int a, unsigned long long int b, unsigned long long int carry_in, unsigned long long int *carry_out) These built-in functions are equivalent to: ({ __typeof__ (a) s; \ __typeof__ (a) c1 = __builtin_add_overflow (a, b, &s); \ __typeof__ (a) c2 = __builtin_add_overflow (s, carry_in, &s); \ *(carry_out) = c1 | c2; \ s; }) i.e. they add 3 unsigned values, set what the last argument points to to 1 if any of the two additions overflowed (otherwise 0) and return the sum of those 3 unsigned values. Note, while all the first 3 arguments can have arbitrary values, better code will be emitted if one of them (preferrably the third one) has only values 0 or 1 (i.e. carry-in). " Additionally, it's advisable to steer clear of using __uint128_t in certain situations. This data type is not compatible with the Microsoft compiler and 32-bit machines. Moreover, the compiler does not effectively optimize the associated costs.
(In reply to cqwrteur from comment #3) > (In reply to Jakub Jelinek from comment #1) > > Just use __int128 addition if all you want is double-word addition (or long > > long for 32-bit arches)? > > Well, I've presented this merely as an illustrative example. The length can > actually be arbitrary. No, it was working with all the other lengths.
(In reply to Jakub Jelinek from comment #4) > (In reply to cqwrteur from comment #3) > > (In reply to Jakub Jelinek from comment #1) > > > Just use __int128 addition if all you want is double-word addition (or long > > > long for 32-bit arches)? > > > > Well, I've presented this merely as an illustrative example. The length can > > actually be arbitrary. > > No, it was working with all the other lengths. This might come across as unusual. I frequently engage in manipulations involving the carry flag. like this implementation for 128 bit division (for 32 bit machine and Microsoft compiler) auto shift = static_cast<unsigned>(::std::countl_zero(divisorhigh) - ::std::countl_zero(dividendhigh)); divisorhigh = ::fast_io::intrinsics::shiftleft(divisorlow,divisorhigh,shift); divisorlow <<= shift; quotientlow = 0; bool carry; do { carry=0; dividendlow=intrinsics::subc(dividendlow,divisorlow,carry,carry); dividendhigh=intrinsics::subc(dividendhigh,divisorhigh,carry,carry); constexpr T zero{}; T mask{zero-carry}; T templow{divisorlow&mask},temphigh{divisorhigh&mask}; carry=!carry; quotientlow=intrinsics::addc(quotientlow,quotientlow,carry,carry); carry=0; dividendlow=intrinsics::addc(dividendlow,templow,carry,carry); dividendhigh=intrinsics::addc(dividendhigh,temphigh,carry,carry); divisorlow = intrinsics::shiftright(divisorlow,divisorhigh,1u); divisorhigh >>= 1u; } while(shift--); return {quotientlow,0,dividendlow,dividendhigh};
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Aug 29 10:46:01 2023 +0200 tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209] The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the middle, which adds/subtracts carry-in (from lower limbs) and computes carry-out (to higher limbs). Before optimizations (unless user writes it intentionally that way already), all the steps look the same, but optimizations simplify the handling of the least significant limb (one which adds/subtracts 0 carry-in) to just a single .{ADD,SUB}_OVERFLOW and the handling of the most significant limb if the computed carry-out is ignored to normal addition/subtraction of multiple operands. Now, match_uaddc_usubc has code to turn that least significant .{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if a more significant limb above it is matched into .U{ADD,SUB}C; this isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is functionally equal to .UADDC (x, y, 0) (provided the types of operands are the same and result is complex type with that type element), and it also has code to match the most significant limb with ignored carry-out (in that case one pattern match turns both the penultimate limb pair of .{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction of the 4 values (2 carries) into another .U{ADD,SUB}C. As the following patch shows, what we weren't handling is the case when one uses either the __builtin_{add,sub}c builtins or hand written forms thereof (either __builtin_*_overflow or even that written by hand) for just 2 limbs, where the least significant has 0 carry-in and the most significant ignores carry-out. The following patch matches that, e.g. _16 = .ADD_OVERFLOW (_1, _2); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _15 = _3 + _4; _12 = _15 + _18; into _16 = .UADDC (_1, _2, 0); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _19 = .UADDC (_3, _4, _18); _12 = IMAGPART_EXPR <_19>; so that we can emit better code. As the 2 later comments show, we must do that carefully, because the pass walks the IL from first to last stmt in a bb and we must avoid pattern matching this way something that should be matched on a later instruction differently. 2023-08-29 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 PR middle-end/111209 * tree-ssa-math-opts.cc (match_uaddc_usubc): Match also just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored carry-out on higher limb. Don't match it though if it could be matched later on 4 argument addition/subtraction. * gcc.target/i386/pr79173-12.c: New test.