[Bug middle-end/36041] Speed up builtin_popcountll
gpiez at web dot de
gcc-bugzilla@gcc.gnu.org
Fri Oct 26 15:51:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
Gunther Piez <gpiez at web dot de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gpiez at web dot de
--- Comment #10 from Gunther Piez <gpiez at web dot de> 2012-10-26 15:51:24 UTC ---
Just noted the exceptional slowness of the provided __builtin_popcountll() even
on ARMv5.
I already used the above parallel bit count algorithm in the case that a native
bit count instruction (like the SSE popcnt or NEON vcnt) is not present, but
native 64 bit registers are available.
But on a 32 bit architecture like ARM I figured it made sense to just use the
__builtin_popcountll() because the many 64 bit instructions in the algorithm
may be very slow without NEON or similar support on a pure 32 bit architecture.
But "optimizing" my code with some macro magic to make it use the library
popcount made the whole program 25% slower, although only a minor part of it
actually does use the popcount instruction.
More information about the Gcc-bugs
mailing list