Bug 101821 - Redundant xor eax eax
Summary: Redundant xor eax eax
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 101822
  Show dependency treegraph
 
Reported: 2021-08-08 22:45 UTC by Jeremy R.
Modified: 2023-09-21 13:52 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy R. 2021-08-08 22:45:27 UTC
Gcc is generating an unnecessary and redundant xor eax eax in the following code

#include <bit>
#include <stdint.h>
uint32_t pop(uint32_t n) {
    return std::popcount(n);
}


pop(unsigned int):
        xor     eax, eax
        popcnt  eax, edi
        ret

https://godbolt.org/z/81o1Y6T5x
Comment 1 Jeremy R. 2021-08-08 22:47:41 UTC
This happens with __builtin_popcount as well, not just std::popcount. This appears to have started in GCC 4.9.2. https://godbolt.org/z/4dGWvT5zr
Comment 2 Jeremy R. 2021-08-08 22:53:08 UTC
Seems to effect all
Comment 3 Andrew Pinski 2021-08-08 22:53:59 UTC
This is by design.

/* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
   for bit-manipulation instructions.  */
DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
          m_SANDYBRIDGE | m_CORE_AVX2 | m_GENERIC)
Comment 4 Jakub Jelinek 2021-08-08 22:55:59 UTC
That is not unnecessary nor redundant, but fully intentional.
See e.g. PR62011.
Comment 5 Andrew Pinski 2021-08-08 22:56:21 UTC
See PR 62011 for more details.
Comment 6 Jeremy R. 2021-08-08 23:01:09 UTC
Ah thank you @Andrew Pinski @Jakub Jelinek
Comment 7 Jeremy R. 2021-08-08 23:05:28 UTC
Does the false dependency still apply to modern CPUs?
Comment 8 Andrew Pinski 2021-08-08 23:12:20 UTC
(In reply to Jeremy R. from comment #7)
> Does the false dependency still apply to modern CPUs?
How modern is modern?

Skylake fixed this for lzcnt and tzcnt.
Cannon Lake (and Ice Lake) fixed this for popcnt.

https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf

This is from https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-counter-with-64-bit-introduces-crazy-performance-deviati/54429148#54429148
Comment 9 Jeremy R. 2021-08-08 23:21:49 UTC
Thank you for the resources and for your insight, it's much appreciated.
Is there interest in updating the intentional false-dependency logic to not fire for architectures newer than cannonlake?