[Bug c/92716] New: -Os doesn't inline byteswap function even though it's a single instruction
jwerner at chromium dot org
gcc-bugzilla@gcc.gnu.org
Thu Nov 28 20:29:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92716
Bug ID: 92716
Summary: -Os doesn't inline byteswap function even though it's
a single instruction
Product: gcc
Version: 8.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: jwerner at chromium dot org
Target Milestone: ---
I compiled the following test code for both x86_64 and aarch64 on gcc 8.3.0:
static inline unsigned int byteswap(unsigned int x)
{
return (((x >> 24) & 0xff) << 0) |
(((x >> 16) & 0xff) << 8) |
(((x >> 8) & 0xff) << 16) |
(((x >> 0) & 0xff) << 24);
}
unsigned int test(unsigned int a, unsigned int b, unsigned int c) {
return byteswap(a) + byteswap(b) + byteswap(c);
}
On x86_64 I get:
0000000000000000 <byteswap> (File Offset: 0x40):
0: 89 f8 mov %edi,%eax
2: 0f c8 bswap %eax
4: c3 retq
0000000000000005 <test> (File Offset: 0x45):
5: e8 f6 ff ff ff callq 0 <byteswap> (File Offset: 0x40)
a: 89 f7 mov %esi,%edi
c: 89 c1 mov %eax,%ecx
e: e8 ed ff ff ff callq 0 <byteswap> (File Offset: 0x40)
13: 89 d7 mov %edx,%edi
15: 01 c1 add %eax,%ecx
17: e8 e4 ff ff ff callq 0 <byteswap> (File Offset: 0x40)
1c: 01 c8 add %ecx,%eax
1e: c3 retq
And on aarch64 I get:
0000000000000000 <byteswap> (File Offset: 0x40):
0: 5ac00800 rev w0, w0
4: d65f03c0 ret
0000000000000008 <test> (File Offset: 0x48):
8: a9bf7bfd stp x29, x30, [sp,#-16]!
c: 910003fd mov x29, sp
10: 97fffffc bl 0 <byteswap> (File Offset: 0x40)
14: 2a0003e3 mov w3, w0
18: 2a0103e0 mov w0, w1
1c: 97fffff9 bl 0 <byteswap> (File Offset: 0x40)
20: 0b000063 add w3, w3, w0
24: 2a0203e0 mov w0, w2
28: 97fffff6 bl 0 <byteswap> (File Offset: 0x40)
2c: 0b000060 add w0, w3, w0
30: a8c17bfd ldp x29, x30, [sp],#16
34: d65f03c0 ret
So the good news is that GCC recognized this code as a byteswap function that
can be implemented with a single instruction on both of these platforms. The
bad news is that it then doesn't seem to realize that inlining this single
instruction leads to smaller code size than wrapping it in a function and
calling it, even if it is called many times. If I instead compile with -O2, the
function is inlined as expected. (I also tried with clang 8.0.1 which manages
to inline correctly even with -Os.)
More information about the Gcc-bugs
mailing list