[Bug target/82259] New: missed optimization: use LEA to add 1 to flip the low bit when copying before AND with 1

peter at cordes dot ca gcc-bugzilla@gcc.gnu.org
Tue Sep 19 16:27:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82259

            Bug ID: 82259
           Summary: missed optimization: use LEA to add 1 to flip the low
                    bit when copying before AND with 1
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

bool bt_signed(int x, unsigned bit) {
        bit = 13;
        return !(x & (1<<bit));
}
// https://godbolt.org/g/rzdtzm
        movl    %edi, %eax
        sarl    $13, %eax
        notl    %eax
        andl    $1, %eax
        ret

This is pretty good, but we could do better by using addition instead of a
separate NOT.  (XOR is add-without-carry.  Adding 1 will always flip the low
bit).

        sarl    $13, %edi
        lea     1(%edi), %eax
        andl    $1, %eax
        ret

If partial-registers aren't a problem, this will be even better on most CPUs:

        bt      $13, %edi
        setz    %al
        ret

related: bug 47769 about missed BTR peepholes.  That probably covers the missed
BT.

But *this* bug is about the LEA+AND vs. MOV+NOT+AND optimization.  This might
be relevant for other 2-operand ISAs with mostly destructive instructions, like
ARM Thumb.


Related:

bool bt_unsigned(unsigned x, unsigned bit) {
        //bit = 13;
        return !(x & (1<<bit));  // 1U avoids test/set
}

        movl    %esi, %ecx
        movl    $1, %eax
        sall    %cl, %eax
        testl   %edi, %eax
        sete    %al
        ret

This is weird.  The code generated with  1U << bit  is like the bt_signed code
above and has identical results, so gcc should emit whatever is optimal for
both cases.  There are similar differences on ARM32.

(With a fixed count, it just makes the difference between NOT vs. XOR $1.)

If we're going to use setcc, it's definitely *much* better to use  bt  instead
of a variable-count shift + test.

        bt      %esi, %edi
        setz    %al
        ret


More information about the Gcc-bugs mailing list