This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/82298] New: x86 BMI: no peephole for BZHI


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82298

            Bug ID: 82298
           Summary: x86 BMI: no peephole for BZHI
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

gcc never seems to emit BZHI on its own.

// exact BZHI behaviour for all inputs (with no C UB)
unsigned bzhi_exact(unsigned x, unsigned c) {
    c &= 0xff;
    if (c <= 31) {
      x &= ((1U << c) - 1);
      // 1ULL defeats clang's peephole, but is a convenient way to avoid UB for
count=32.
    }
    return x;
}
// https://godbolt.org/g/tZKnV3

unsigned long bzhi_l(unsigned long x, unsigned c) {
    return x & ((1UL << c) - 1);
}

Out-of-range shift UB allows peepholing to BZHI for the simpler case, so these
(respectively) should compile to

        bzhil   %esi, %edi, %edi
        bzhiq   %rsi, %rdi, %rax

But we actually get (gcc8 -O3 -march=haswell (-mbmi2))

        movq    $-1, %rax
        shlx    %rsi, %rax, %rdx
        andn    %rdi, %rdx, %rax
        ret

Or that with a test&branch for bzhi_exact.  Clang succeeds at peepholing BZHI
here, but it still does the &0xff and the test&branch to skip BZHI when it
would do nothing.  It's easy to imagine cases where the source would use a
conditional to avoid UB when it wants to leave x unmodified for c==32, and the
range is 1 to 32:

unsigned bzhi_1_to_32(unsigned x, unsigned c) {
    if (c != 32)
        x &= ((1U << c) - 1);
    return x;
}


BZHI is defined to saturate the index to OperandSize, so it copies src1
unmodified when the low 8 bits of src2 are >= 32 or >= 64.  (See the Operation
section of http://felixcloutier.com/x86/BZHI.html.  The text description is
wrong, claiming it saturates to OperandSize-1, which would zero the high bit.)

Other ways to express it (which clang fails to peephole to BZHI, like gcc):

unsigned bzhi2(unsigned x, unsigned c) {
    //  c &= 0xff;
    //  if(c < 32) {
      x &= (0xFFFFFFFFUL >> (32-c));
    //  }
    return x;
}

unsigned bzhi3(unsigned long x, unsigned c) {
    // c &= 0xff;
    return x & ~(-1U << c);
}



Related: pr65871 suggested this, but was really about taking advantage of flags
set by __builtin_ia32_bzhi_si so it is correctly closed.  pr66872 suggested
transforming x & ((1 << t) - 1); to x & ~(-1 << t); to enable ANDN.  Compiling
both to BZHI when BMI2 is available was mentioned, but the the main subject of
that bug either.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]