[Bug c/92716] New: -Os doesn't inline byteswap function even though it's a single instruction

jwerner at chromium dot org gcc-bugzilla@gcc.gnu.org
Thu Nov 28 20:29:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92716

            Bug ID: 92716
           Summary: -Os doesn't inline byteswap function even though it's
                    a single instruction
           Product: gcc
           Version: 8.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jwerner at chromium dot org
  Target Milestone: ---

I compiled the following test code for both x86_64 and aarch64 on gcc 8.3.0:

static inline unsigned int byteswap(unsigned int x)                             
{                                                                               
        return (((x >> 24) & 0xff) << 0) |                                      
               (((x >> 16) & 0xff) << 8) |                                      
               (((x >> 8) & 0xff) << 16) |                                      
               (((x >> 0) & 0xff) << 24);                                       
}                                                                               

unsigned int test(unsigned int a, unsigned int b, unsigned int c) {             
        return byteswap(a) + byteswap(b) + byteswap(c);                         
}

On x86_64 I get:

0000000000000000 <byteswap> (File Offset: 0x40):
   0:   89 f8                   mov    %edi,%eax
   2:   0f c8                   bswap  %eax
   4:   c3                      retq   

0000000000000005 <test> (File Offset: 0x45):
   5:   e8 f6 ff ff ff          callq  0 <byteswap> (File Offset: 0x40)
   a:   89 f7                   mov    %esi,%edi
   c:   89 c1                   mov    %eax,%ecx
   e:   e8 ed ff ff ff          callq  0 <byteswap> (File Offset: 0x40)
  13:   89 d7                   mov    %edx,%edi
  15:   01 c1                   add    %eax,%ecx
  17:   e8 e4 ff ff ff          callq  0 <byteswap> (File Offset: 0x40)
  1c:   01 c8                   add    %ecx,%eax
  1e:   c3                      retq   

And on aarch64 I get:

0000000000000000 <byteswap> (File Offset: 0x40):
   0:   5ac00800        rev     w0, w0
   4:   d65f03c0        ret

0000000000000008 <test> (File Offset: 0x48):
   8:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
   c:   910003fd        mov     x29, sp
  10:   97fffffc        bl      0 <byteswap> (File Offset: 0x40)
  14:   2a0003e3        mov     w3, w0
  18:   2a0103e0        mov     w0, w1
  1c:   97fffff9        bl      0 <byteswap> (File Offset: 0x40)
  20:   0b000063        add     w3, w3, w0
  24:   2a0203e0        mov     w0, w2
  28:   97fffff6        bl      0 <byteswap> (File Offset: 0x40)
  2c:   0b000060        add     w0, w3, w0
  30:   a8c17bfd        ldp     x29, x30, [sp],#16
  34:   d65f03c0        ret

So the good news is that GCC recognized this code as a byteswap function that
can be implemented with a single instruction on both of these platforms. The
bad news is that it then doesn't seem to realize that inlining this single
instruction leads to smaller code size than wrapping it in a function and
calling it, even if it is called many times. If I instead compile with -O2, the
function is inlined as expected. (I also tried with clang 8.0.1 which manages
to inline correctly even with -Os.)


More information about the Gcc-bugs mailing list