__builtin_popcount

Ian Lance Taylor iant@google.com
Sat May 21 18:06:00 GMT 2011


Zeev Tarantov <zeev.tarantov@gmail.com> writes:

> This computes the population count using an 8-bit look up table by
> iterating over the 8 bytes of the input and summing the looked-up
> values.
> This is the right code for "int popcount(unsigned long x)", not for
> "int popcount (unsigned int x)".
> It performs twice the amount of work needed.

First I should say that for x86_64, if you know that you are using
processors with SSE4.2 or ABM support, you can use -mpopcnt, or an
appropriate -march= option, to direct gcc to use the hardware popcnt
instruction.

Other than that, this is in effect a minor optimization bug.  The
underlying reason is that for simplicity in dealing with the library
support functions, gcc always promotes to the register size before
calling them.  This zero-extension costs nothing on x86_64, and for most
library functions it makes little performance difference whether they
operate on a 32-bit or 64-bit value.  The __builtin_popcount function is
an exception.

Please consider filing a bug report; see http://gcc.gnu.org/bugs/ .

Ian



More information about the Gcc-help mailing list