This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/82298] New: x86 BMI: no peephole for BZHI
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 22 Sep 2017 16:30:21 +0000
- Subject: [Bug target/82298] New: x86 BMI: no peephole for BZHI
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82298
Bug ID: 82298
Summary: x86 BMI: no peephole for BZHI
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
gcc never seems to emit BZHI on its own.
// exact BZHI behaviour for all inputs (with no C UB)
unsigned bzhi_exact(unsigned x, unsigned c) {
c &= 0xff;
if (c <= 31) {
x &= ((1U << c) - 1);
// 1ULL defeats clang's peephole, but is a convenient way to avoid UB for
count=32.
}
return x;
}
// https://godbolt.org/g/tZKnV3
unsigned long bzhi_l(unsigned long x, unsigned c) {
return x & ((1UL << c) - 1);
}
Out-of-range shift UB allows peepholing to BZHI for the simpler case, so these
(respectively) should compile to
bzhil %esi, %edi, %edi
bzhiq %rsi, %rdi, %rax
But we actually get (gcc8 -O3 -march=haswell (-mbmi2))
movq $-1, %rax
shlx %rsi, %rax, %rdx
andn %rdi, %rdx, %rax
ret
Or that with a test&branch for bzhi_exact. Clang succeeds at peepholing BZHI
here, but it still does the &0xff and the test&branch to skip BZHI when it
would do nothing. It's easy to imagine cases where the source would use a
conditional to avoid UB when it wants to leave x unmodified for c==32, and the
range is 1 to 32:
unsigned bzhi_1_to_32(unsigned x, unsigned c) {
if (c != 32)
x &= ((1U << c) - 1);
return x;
}
BZHI is defined to saturate the index to OperandSize, so it copies src1
unmodified when the low 8 bits of src2 are >= 32 or >= 64. (See the Operation
section of http://felixcloutier.com/x86/BZHI.html. The text description is
wrong, claiming it saturates to OperandSize-1, which would zero the high bit.)
Other ways to express it (which clang fails to peephole to BZHI, like gcc):
unsigned bzhi2(unsigned x, unsigned c) {
// c &= 0xff;
// if(c < 32) {
x &= (0xFFFFFFFFUL >> (32-c));
// }
return x;
}
unsigned bzhi3(unsigned long x, unsigned c) {
// c &= 0xff;
return x & ~(-1U << c);
}
Related: pr65871 suggested this, but was really about taking advantage of flags
set by __builtin_ia32_bzhi_si so it is correctly closed. pr66872 suggested
transforming x & ((1 << t) - 1); to x & ~(-1 << t); to enable ANDN. Compiling
both to BZHI when BMI2 is available was mentioned, but the the main subject of
that bug either.