[Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop

madhur4127 at gmail dot com gcc-bugzilla@gcc.gnu.org
Mon May 4 16:44:37 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945

            Bug ID: 94945
           Summary: Missed optimization: Carry chain not recognized in
                    manually unrolled loop
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Context: Big integer addition using ADC (_addcarry_u64).

See Godbolt link: https://godbolt.org/z/rMxe6W

Example:
Suppose the case of big integer addition:

// pa, pb: pointer to big integer A, B
// n: size of big integer A, B
// pr: pointer to result

void add(const uint64_t * __restrict__ pa, const uint64_t * __restrict__ pb,
uint64_t * __restrict__ pr, unsigned n) {
    unsigned char carry = 0;
    unsigned i;
    for(i = 0; i<n; i += 4) {
        carry = _addcarry_u64(carry, pa[i+0], pb[i+0], &pr[i+0]);
        carry = _addcarry_u64(carry, pa[i+1], pb[i+1], &pr[i+1]);
        carry = _addcarry_u64(carry, pa[i+2], pb[i+2], &pr[i+2]);
        carry = _addcarry_u64(carry, pa[i+3], pb[i+3], &pr[i+3]);
    }
}


Without loop unrolling GCC saves the Carry Flag at the end of the loop and
again sets the saved carry flag in the next iteration. GCC doesn't recognize
the propagation of Carry Flag across loop iterations even when manually
unrolling the loop (while Clang does). GCC saves the carry and triggers it
again in this fashion (2 iterations shown):

        mov     ecx, eax  # i, i
        add     r9b, -1   # carry,
        mov     rdx, QWORD PTR [rdi+rcx*8]        # tmp132, *_6
        adc     rdx, QWORD PTR [rsi+rcx*8]        # tmp132, *_4
        mov     QWORD PTR [r8+rcx*8], rdx #* pr, tmp132
        setc    r9b     #, _48   <--------SAVING CARRY

        lea     ecx, [rax+1]      # tmp134,
        mov     rdx, QWORD PTR [rdi+rcx*8]        # tmp140, *_15
        add     r9b, -1   # _48, <--------SETTING CARRY
        adc     rdx, QWORD PTR [rsi+rcx*8]        # tmp140, *_13
        mov     QWORD PTR [r8+rcx*8], rdx #* pr, tmp140
        setc    r9b     #, _47


Another optimization:
Trigger loop unrolling (without the need to manually unrolling) and propagate
the carry without the need to save/set it in between.

Side Note:
Is this the fastest and optimal way to add two big integers? Considering ASM to
be the last resort?


More information about the Gcc-bugs mailing list