[Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop
madhur4127 at gmail dot com
gcc-bugzilla@gcc.gnu.org
Mon May 4 16:44:37 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945
Bug ID: 94945
Summary: Missed optimization: Carry chain not recognized in
manually unrolled loop
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: madhur4127 at gmail dot com
Target Milestone: ---
Context: Big integer addition using ADC (_addcarry_u64).
See Godbolt link: https://godbolt.org/z/rMxe6W
Example:
Suppose the case of big integer addition:
// pa, pb: pointer to big integer A, B
// n: size of big integer A, B
// pr: pointer to result
void add(const uint64_t * __restrict__ pa, const uint64_t * __restrict__ pb,
uint64_t * __restrict__ pr, unsigned n) {
unsigned char carry = 0;
unsigned i;
for(i = 0; i<n; i += 4) {
carry = _addcarry_u64(carry, pa[i+0], pb[i+0], &pr[i+0]);
carry = _addcarry_u64(carry, pa[i+1], pb[i+1], &pr[i+1]);
carry = _addcarry_u64(carry, pa[i+2], pb[i+2], &pr[i+2]);
carry = _addcarry_u64(carry, pa[i+3], pb[i+3], &pr[i+3]);
}
}
Without loop unrolling GCC saves the Carry Flag at the end of the loop and
again sets the saved carry flag in the next iteration. GCC doesn't recognize
the propagation of Carry Flag across loop iterations even when manually
unrolling the loop (while Clang does). GCC saves the carry and triggers it
again in this fashion (2 iterations shown):
mov ecx, eax # i, i
add r9b, -1 # carry,
mov rdx, QWORD PTR [rdi+rcx*8] # tmp132, *_6
adc rdx, QWORD PTR [rsi+rcx*8] # tmp132, *_4
mov QWORD PTR [r8+rcx*8], rdx #* pr, tmp132
setc r9b #, _48 <--------SAVING CARRY
lea ecx, [rax+1] # tmp134,
mov rdx, QWORD PTR [rdi+rcx*8] # tmp140, *_15
add r9b, -1 # _48, <--------SETTING CARRY
adc rdx, QWORD PTR [rsi+rcx*8] # tmp140, *_13
mov QWORD PTR [r8+rcx*8], rdx #* pr, tmp140
setc r9b #, _47
Another optimization:
Trigger loop unrolling (without the need to manually unrolling) and propagate
the carry without the need to save/set it in between.
Side Note:
Is this the fastest and optimal way to add two big integers? Considering ASM to
be the last resort?
More information about the Gcc-bugs
mailing list