[Bug c/98479] New: Missed optimization opportunity for unsigned __int128 modulo
dabler at gmail dot com
gcc-bugzilla@gcc.gnu.org
Wed Dec 30 14:03:10 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98479
Bug ID: 98479
Summary: Missed optimization opportunity for unsigned __int128
modulo
Product: gcc
Version: 9.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: dabler at gmail dot com
Target Milestone: ---
I have found that manually calculating the % operator on __int128 is
significantly faster than the built-in compiler operator. I will show you how
to calculate modulo 9, but the method can be used to calculate modulo any other
number.
First, consider the built-in compiler operator:
uint64_t mod9_v1(unsigned __int128 n)
{
return n % 9;
}
Now consider my manual implementation:
uint64_t mod9_v2(unsigned __int128 n)
{
uint64_t r = 0;
r += (uint32_t)(n);
r += (uint32_t)(n >> 32) * (uint64_t)4;
r += (uint32_t)(n >> 64) * (uint64_t)7;
r += (uint32_t)(n >> 96);
return r % 9;
}
Measuring over 100,000,000 random numbers gives the following results:
mod9_v1 | 3.986052 secs
mod9_v2 | 1.814339 secs
GCC 9.3.0 with -march=native -O3 was used on AMD Ryzen Threadripper 2990WX.
Note that 2^32 == 4 (mod 9), 2^64 == 7 (mod 9), etc.
More information about the Gcc-bugs
mailing list