[Bug c/98479] New: Missed optimization opportunity for unsigned __int128 modulo

dabler at gmail dot com gcc-bugzilla@gcc.gnu.org
Wed Dec 30 14:03:10 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98479

            Bug ID: 98479
           Summary: Missed optimization opportunity for unsigned __int128
                    modulo
           Product: gcc
           Version: 9.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dabler at gmail dot com
  Target Milestone: ---

I have found that manually calculating the % operator on __int128 is
significantly faster than the built-in compiler operator. I will show you how
to calculate modulo 9, but the method can be used to calculate modulo any other
number.

First, consider the built-in compiler operator:

uint64_t mod9_v1(unsigned __int128 n)
{
        return n % 9;
}

Now consider my manual implementation:

uint64_t mod9_v2(unsigned __int128 n)
{
        uint64_t r = 0;

        r += (uint32_t)(n);
        r += (uint32_t)(n >> 32) * (uint64_t)4;
        r += (uint32_t)(n >> 64) * (uint64_t)7;
        r += (uint32_t)(n >> 96);

        return r % 9;
}

Measuring over 100,000,000 random numbers gives the following results:

    mod9_v1 | 3.986052 secs
    mod9_v2 | 1.814339 secs

GCC 9.3.0 with -march=native -O3 was used on AMD Ryzen Threadripper 2990WX.

Note that 2^32 == 4 (mod 9), 2^64 == 7 (mod 9), etc.


More information about the Gcc-bugs mailing list