[Bug middle-end/37443] fast 64-bit divide by constant 32-bit
ajrobb at bigfoot dot com
gcc-bugzilla@gcc.gnu.org
Tue Sep 16 00:18:00 GMT 2008
------- Comment #1 from ajrobb at bigfoot dot com 2008-09-16 00:17 -------
I have tested my initial routine with more values and it turns out that the
short-cut of omitting the least significant multiplication is wrong. I added
the 4th multiply to generate the full 128-bit precision. I also took the
multiplier used by gcc 4.2.3 on x86_64-linux-gnu (ubuntu) (4056481920730334085)
and (28+64) bit shift to achieve the division by 1220703125u (largest 32-bit
power of 5):
uint64_t fast(uint64_t x) {
static const uint64_t fact = 4056481920730334085ull;
uint32_t c = ((x & 0xffffffffu) * (fact & 0xffffffffu)) >> 32;
uint64_t a = ((x & 0xffffffffu) * (fact >> 32) + c);
/*
* Split the addition of the two middle 64-bit intermediate results
* into 2 stages to avoid possible overflow.
* (not actually an issue with this value of 'fact')
*/
uint32_t b = ((x >> 32) * (fact & 0xffffffffu) + (a & 0xffffffffu)) >> 32;
return ((x >> 32) * (fact >> 32) + (a >> 32) + b) >> 28;
}
After hand coding the assembler on 32-bit x86, the extra multiply increased the
run time from 15s to 17s - still much faster than 'slow' (64-bit divide API
call).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
More information about the Gcc-bugs
mailing list