[Bug rtl-optimization/70164] New: Code/performance regression due to poor register allocation on Cortex-M0
andre.simoesdiasvieira at arm dot com
gcc-bugzilla@gcc.gnu.org
Thu Mar 10 10:58:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164
Bug ID: 70164
Summary: Code/performance regression due to poor register
allocation on Cortex-M0
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: andre.simoesdiasvieira at arm dot com
Target Milestone: ---
Created attachment 37920
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37920&action=edit
current ira dump
After a quick investigation of the testcase in
gcc/testsuite/gcc.target/arm/pr45701-1.c for cortex-m0 on trunk I found out
that the test case was failing due to a change in the register allocation after
revision r226901.
Before this register allocation would choose to load the global 'hist_verify'
onto r6 representing 'old_verify' prior to the function call to
pre_process_line. This old_verify is used after the function call. With the
patch it decides to load it onto r3, a caller-saved register, which means it
has to be spilled before the function call and reloaded after.
Before patch:
history_expand_line_internal:
push {r3, r4, r5, r6, r7, lr}
ldr r3, .L5
ldr r5, .L5+4
ldr r4, [r3]
movs r3, #0
ldr r6, [r5] ; <--- load of 'hist_verify' onto r6
movs r7, r0
str r3, [r5]
bl pre_process_line
adds r6, r4, r6
str r6, [r5]
movs r4, r0
cmp r7, r0
bne .L2
bl str_len
adds r0, r0, #1
bl x_malloc
movs r1, r4
bl str_cpy
movs r4, r0
.L2:
movs r0, r4
@ sp needed
pop {r3, r4, r5, r6, r7, pc}
Current:
history_expand_line_internal:
push {r0, r1, r2, r4, r5, r6, r7, lr}
ldr r3, .L3
ldr r5, .L3+4
ldr r6, [r3]
ldr r3, [r5] ; <--- load of 'hist_verify' onto r3
movs r7, r0
str r3, [sp, #4] ; <--- Spill
movs r3, #0
str r3, [r5]
bl pre_process_line
ldr r3, [sp, #4] ; <--- Reload
movs r4, r0
adds r6, r6, r3
str r6, [r5]
cmp r7, r0
bne .L1
bl str_len
adds r0, r0, #1
bl x_malloc
movs r1, r4
bl str_cpy
movs r4, r0
.L1:
movs r0, r4
@ sp needed
pop {r1, r2, r3, r4, r5, r6, r7, pc}
I have also attached the dumps for ira and reload for both pre-patch and
current. In the current reload dump insn 9 represents the load onto r3 and insn
62 the spill. In pre-patch ira/reload the load is in insn 10.
I am not familiar with RA in GCC, so I'm not entirely sure what code to blame
for this sub-optimal allocation, any comments or pointers would be most
welcome.
More information about the Gcc-bugs
mailing list