This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/77491] New: Suboptimal code produced with unnecessary moving of values on/off stack
- From: "dhowells at redhat dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 05 Sep 2016 16:00:26 +0000
- Subject: [Bug c/77491] New: Suboptimal code produced with unnecessary moving of values on/off stack
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77491
Bug ID: 77491
Summary: Suboptimal code produced with unnecessary moving of
values on/off stack
Product: gcc
Version: 6.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: dhowells at redhat dot com
Target Milestone: ---
Created attachment 39567
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39567&action=edit
Test source
The attached program produces unnecessary instructions moving registers on and
off of the stack. Compiled with Fedora 24 gcc-6.1.1-3 20160621, using gcc -Os,
for the first function I see:
0000000000000000 <jump>:
0: 9c pushfq
1: 59 pop %rcx
2: fa cli
3: 8b 07 mov (%rdi),%eax
5: 89 44 24 fc mov %eax,-0x4(%rsp)
9: 8b 54 24 fc mov -0x4(%rsp),%edx
d: 83 fa 17 cmp $0x17,%edx
10: 0f 94 c0 sete %al
13: 75 06 jne 1b <jump+0x1b>
15: c7 07 2b 00 00 00 movl $0x2b,(%rdi)
1b: 51 push %rcx
1c: 9d popfq
1d: 8b 54 24 fc mov -0x4(%rsp),%edx
21: 89 16 mov %edx,(%rsi)
23: c3 retq
The instruction at 9 is unnecessary - either the value in EDX could be moved
directly to EAX, or the comparison at d could be made against EAX.
The instructions at 5, 1d and 21 could be combined to place the result directly
in (ESI) rather than shuffling it on and off the stack.
Looking at the second function:
0000000000000024 <jump2>:
24: 9c pushfq
25: 58 pop %rax
26: fa cli
27: 8b 17 mov (%rdi),%edx
29: 89 54 24 fc mov %edx,-0x4(%rsp)
2d: 8b 54 24 fc mov -0x4(%rsp),%edx
31: 83 fa 17 cmp $0x17,%edx
34: 75 06 jne 3c <jump2+0x18>
36: c7 07 2b 00 00 00 movl $0x2b,(%rdi)
3c: 50 push %rax
3d: 9d popfq
3e: 8b 44 24 fc mov -0x4(%rsp),%eax
42: 89 44 24 f8 mov %eax,-0x8(%rsp)
46: 8b 44 24 f8 mov -0x8(%rsp),%eax
4a: c3 retq
It would be best if the flags were stashed in ECX, not EAX, as happens with the
first function. This would allow the return value to be set in instruction 27.
The comparison in 31 could then be against EAX directly. Instructions 29, 2d,
3e, 42 and 46 are all redundant.
Changing the #if in the code to disable the inline asm doesn't show all that
much improvement in either function. Doing this also allows it to be built for
aarch64 - which also shows unnecessary stack shuffling:
0000000000000000 <jump>:
0: d10043ff sub sp, sp, #0x10
4: b9400002 ldr w2, [x0]
8: b9000fe2 str w2, [sp,#12]
c: b9400fe2 ldr w2, [sp,#12]
10: 71005c5f cmp w2, #0x17
14: 1a9f17e3 cset w3, eq
18: 54000061 b.ne 24 <jump+0x24>
1c: 52800562 mov w2, #0x2b // #43
20: b9000002 str w2, [x0]
24: b9400fe0 ldr w0, [sp,#12]
28: b9000020 str w0, [x1]
2c: 2a0303e0 mov w0, w3
30: 910043ff add sp, sp, #0x10
34: d65f03c0 ret
0000000000000038 <jump2>:
38: d10043ff sub sp, sp, #0x10
3c: b9400001 ldr w1, [x0]
40: b9000fe1 str w1, [sp,#12]
44: b9400fe1 ldr w1, [sp,#12]
48: 71005c3f cmp w1, #0x17
4c: 54000061 b.ne 58 <jump2+0x20>
50: 52800561 mov w1, #0x2b // #43
54: b9000001 str w1, [x0]
58: b9400fe0 ldr w0, [sp,#12]
5c: b9000be0 str w0, [sp,#8]
60: b9400be0 ldr w0, [sp,#8]
64: 910043ff add sp, sp, #0x10
68: d65f03c0 ret