[Bug libstdc++/98226] New: Slow std::countr_one

zaikin.icc at gmail dot com gcc-bugzilla@gcc.gnu.org
Thu Dec 10 16:51:20 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98226

            Bug ID: 98226
           Summary: Slow std::countr_one
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zaikin.icc at gmail dot com
  Target Milestone: ---

The function std::countr_one() from C++20 is slow. For a given x (say, unsigned
int) it in fact  calls std::countr_zero(~x) which in turn calls
__builtin_ctz(~x). Calling __builtin_ctz(~x) directly from std::countr_one()
would increase the performance. The test case contains three sources, each of
which finally does the same but with different performance.

Test case:
---
test1.cpp:
#include <bit>

int main()
{
  unsigned j;
  for (unsigned i=0; i<(1 << 30); i++) {
    j = std::countr_one(i);
  }
}
---
g++ -std=c++20 ./test1.cpp -o test1


test2.cpp:
#include <bit>

int main()
{
  unsigned j;
  for (unsigned i=0; i<(1 << 30); i++) {
    j = std::countr_zero(~i);
  }
}
---
g++ -std=c++20 ./test2.cpp -o test2


test3.cpp:
#include <bit>

int main()
{
  unsigned j;
  for (unsigned i=0; i<(1 << 30); i++) {
    j = __builtin_ctz(~i);
  }
}
---
g++ -std=c++20 ./test3.cpp -o test3

The user time is reported below:
time ./test1
5.266s
time ./test2
3.028s
time ./test3
0.741s

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
10.2.0-5ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-10-WJNXnb/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-WJNXnb/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (Ubuntu 10.2.0-5ubuntu1~20.04)


More information about the Gcc-bugs mailing list