[Bug libstdc++/97759] New: Could std::has_single_bit implementation be faster?

Mon Nov 9 00:01:25 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759

            Bug ID: 97759
           Summary: Could std::has_single_bit implementation be faster?
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc-bugs at marehr dot dialup.fu-berlin.de
  Target Milestone: ---

Hello gcc-team,

we are thrilled that C++20 offers some efficient bit implementation and that we
could exchange some of our own implementation with the standardized ones,
making the code more accessible.

I replaced our implementation and noticed that `std::has_single_bit` was slower
than what we had before by around 30%. (The other functions matched our
timings.)

Additionally, we have a (micro-)benchmark that compares the standard arithmetic
bit trick
(https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2) with
the implementation where popcount == 1. We decided to use the arithmetic
version, because we measured that it was faster than popcount on our machines
(mostly intel processors).

Interestingly, it seems that the popcount benchmark matches the
std::has_single_bit time-wise, so I guess that std::has_single_bit is
implemented via popcount.

Those timings could be reproduced at an unknown location
https://quick-bench.com/q/Y28keu_mSh25WwhO05T4SKrbHpk

I don't know how to fix this, but I would expect that the optimizer would
recognize popcount=1 and knows that there is a more efficient version. Or
change the implementation to arithmetic, where again the optimizer could decide
to replace that by a popcount if that is more efficient on some architecture?

Thank you!