Gcc is generating an unnecessary and redundant xor eax eax in the following code #include <bit> #include <stdint.h> uint32_t pop(uint32_t n) { return std::popcount(n); } pop(unsigned int): xor eax, eax popcnt eax, edi ret https://godbolt.org/z/81o1Y6T5x
This happens with __builtin_popcount as well, not just std::popcount. This appears to have started in GCC 4.9.2. https://godbolt.org/z/4dGWvT5zr
Seems to effect all
This is by design. /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency for bit-manipulation instructions. */ DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi", m_SANDYBRIDGE | m_CORE_AVX2 | m_GENERIC)
That is not unnecessary nor redundant, but fully intentional. See e.g. PR62011.
See PR 62011 for more details.
Ah thank you @Andrew Pinski @Jakub Jelinek
Does the false dependency still apply to modern CPUs?
(In reply to Jeremy R. from comment #7) > Does the false dependency still apply to modern CPUs? How modern is modern? Skylake fixed this for lzcnt and tzcnt. Cannon Lake (and Ice Lake) fixed this for popcnt. https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf This is from https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-counter-with-64-bit-introduces-crazy-performance-deviati/54429148#54429148
Thank you for the resources and for your insight, it's much appreciated. Is there interest in updating the intentional false-dependency logic to not fire for architectures newer than cannonlake?