This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/36041] New: Speed up builtin_popcountll
- From: "intvnut at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 25 Apr 2008 00:34:19 -0000
- Subject: [Bug c/36041] New: Speed up builtin_popcountll
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
The current __builtin_popcountll (and likely __builtin_popcount) are fairly
slow as compared to a simple, short C version derived from what can be found in
Knuth's recent publications. The following short function is about 3x as fast
as the __builtin version, which runs counter to the idea that __builtin_XXX
provides access to implementations that are exemplars for a given platform.
unsigned int popcount64(unsigned long long x)
{
x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL);
x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL);
return (x * 0x0101010101010101ULL) >> 56;
}
This version has the additional benefit that it omits the lookup table that the
current "builtin" version uses.
I measured the above function vs. __builtin_popcountll with a loop like the
following:
t1 = clock();
for (j = 0; j < 1000000; j++)
for (i = 0; i < 1024; i++)
pt = popcount64(data[i]);
t2 = clock();
printf("popcount64 = %d clocks\n", t2 - t1);
...where data[] is a u64 that's preinitialized.
I'll attach the exact source I used, which also includes two other possible
implementations of popcountll.
--
Summary: Speed up builtin_popcountll
Product: gcc
Version: 4.2.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: intvnut at gmail dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041