This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/36041] Speed up builtin_popcountll


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

Gunther Piez <gpiez at web dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gpiez at web dot de

--- Comment #10 from Gunther Piez <gpiez at web dot de> 2012-10-26 15:51:24 UTC ---
Just noted the exceptional slowness of the provided __builtin_popcountll() even
on ARMv5.

I already used the above parallel bit count algorithm in the case that a native
bit count instruction (like the SSE popcnt or NEON vcnt) is not present, but
native 64 bit registers are available. 

But on a 32 bit architecture like ARM I figured it made sense to just use the
__builtin_popcountll() because the many 64 bit instructions in the algorithm
may be very slow without NEON or similar support on a pure 32 bit architecture.

But "optimizing" my code with some macro magic to make it use the library
popcount made the whole program 25% slower, although only a minor part of it
actually does use the popcount instruction.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]