[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
steven at gcc dot gnu dot org
gcc-bugzilla@gcc.gnu.org
Mon Feb 8 10:47:00 GMT 2010
------- Comment #3 from steven at gcc dot gnu dot org 2010-02-08 10:47 -------
Trunk today produces this (with -dAP hacked to print slim RTL):
.file "t.c"
.text
.align 2
.global longfunc
.type longfunc, %function
longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
@ basic block 2
@ 8 ip:SI=r2:SI*r1:SI
@ REG_DEAD: r1:SI
mul ip, r2, r1 @ 8 *arm_mulsi3/2 [length = 4]
@ 35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;}
@ REG_DEAD: r5:SI
@ REG_DEAD: r4:SI
@ REG_FRAME_RELATED_EXPR: sequence
stmfd sp!, {r4, r5} @ 35 *push_multi [length = 4]
@ 9 r1:SI=r0:SI*r3:SI+ip:SI
@ REG_DEAD: ip:SI
@ REG_DEAD: r3:SI
@ REG_DEAD: r0:SI
mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4]
@ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@ REG_DEAD: r2:SI
umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4]
@ 11 r1:SI=r1:SI+r5:SI
@ REG_DEAD: r5:SI
add r1, r1, r5 @ 11 *arm_addsi3/1 [length = 4]
@ 12 r5:SI=r1:SI
mov r5, r1 @ 12 *arm_movsi_insn/1 [length = 4]
@ 31 r0:SI=r4:SI
mov r0, r4 @ 31 *arm_movsi_insn/1 [length = 4]
@ 38 unspec/v{return;}
ldmfd sp!, {r4, r5}
bx lr
.size longfunc, .-longfunc
.ident "GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision
156595]"
Questions for those who know ARM:
* What is the purpose of insn 12 here? It looks to me like this is dead code,
since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
wrong).
* After combine we have these two insns:
9 r138:SI=r142:SI*r3:SI+r139:SI
REG_DEAD: r3:SI
REG_DEAD: r139:SI
10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI)
REG_DEAD: r144:SI
REG_DEAD: r142:SI
which translate to the mla insn and to the umull insn that uses r4 and r5:
@ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@ REG_DEAD: r2:SI
umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4]
@ 9 r1:SI=r0:SI*r3:SI+ip:SI
@ REG_DEAD: ip:SI
@ REG_DEAD: r3:SI
@ REG_DEAD: r0:SI
mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4]
Note how the sched1 pass has switched the two insns around. The register
allocator now decides to use two new registers here, because r0 and r3 are both
live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
available in insn 10 -- but this is too late.
Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
and 10, when sched2 wants to swap them back again?
(Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in
insn 10, because the sched2 pass fails to update the notes when it moves insn 9
before insn 10. But that's a separate issue...)
* If I compile with -fno-schedule-insns, I still don't get the optimal code:
mul ip, r2, r1
str r4, [sp, #-4]!
mla r1, r0, r3, ip
umull r3, r4, r2, r0
add r1, r1, r4
mov r4, r1
mov r0, r3
ldmfd sp!, {r4}
bx lr
This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that
is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where
r3 comes before r2. That's another thing for a target maintainer to look into.
If IRA would select r2:DI, you would also lose the save/restore of r4 and get
the perfect code of comment #2.
So two issues:
1. Why does the sched1 pass schedule insn 10 before insn 9?
2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
More information about the Gcc-bugs
mailing list