[Bug tree-optimization/83377] New: Missed optimization (x86): Bit operations should be converted to arithmetic

matthew at wil dot cx gcc-bugzilla@gcc.gnu.org
Mon Dec 11 16:49:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83377

            Bug ID: 83377
           Summary: Missed optimization (x86): Bit operations should be
                    converted to arithmetic
           Product: gcc
           Version: 7.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthew at wil dot cx
  Target Milestone: ---

GCC fails to optimise

if (x & 2) y = (x &~ 2)

into

if (x & 2) y = (x - 2)

in cases where that would be advantageous.

Here's one example, which I'm sure will upset the C pedants, but is relatively
common in codebases which use typed pointers based on the lower bits.

int a(void *);
struct s { struct s *s; };
int b(void *x) {
        void *p = x;
        if ((unsigned long)x & 2)
                p = ((struct s *)((unsigned long)x & ~2UL))->s;
        return a(p);
}
int c(void *x) {
        void *p = x;
        if ((unsigned long)x & 2)
                p = ((struct s *)((unsigned long)x - 2))->s;
        return a(p);
}

On x86, the difference between the assembly output is clear; function c is
smaller than function b:

0000000000000000 <b>:
   0:   40 f6 c7 02             test   $0x2,%dil
   4:   74 07                   je     d <b+0xd>
   6:   48 83 e7 fd             and    $0xfffffffffffffffd,%rdi
   a:   48 8b 3f                mov    (%rdi),%rdi
   d:   e9 00 00 00 00          jmpq   12 <b+0x12>
  12:   0f 1f 40 00             nopl   0x0(%rax)
  16:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  1d:   00 00 00 

0000000000000020 <c>:
  20:   40 f6 c7 02             test   $0x2,%dil
  24:   74 04                   je     2a <c+0xa>
  26:   48 8b 7f fe             mov    -0x2(%rdi),%rdi
  2a:   e9 00 00 00 00          jmpq   2f <c+0xf>

This is true with both -O3 and -O2.  I have other functions where the savings
are greater (two insns and seven bytes), but the root cause is the same; a
failure to optimise an AND into a SUB (which can then be fused with a load)

I'm filing this one under tree-optimisation rather than RTL, because I think
it's a common feature in CPUs to have load (reg + offset), and relatively
uncommon (in fact I don't know of one) to have load (reg & mask).  I'm sure
some CPUs don't even have (reg + offset) addressing modes, but they wouldn't be
harmed by such an optimisation.


More information about the Gcc-bugs mailing list