This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/49473] [arm] poor scheduling of loads
- From: "ramana at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 20 Jul 2011 16:00:37 +0000
- Subject: [Bug target/49473] [arm] poor scheduling of loads
- Auto-submitted: auto-generated
- References: <bug-49473-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011.07.20 15:59:59
CC| |ramana at gcc dot gnu.org
Ever Confirmed|0 |1
--- Comment #2 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2011-07-20 15:59:59 UTC ---
> - the add at .LPIC0 will stall for two cycles because the preceding load has a
> result latency of three. The two subsequent MOVs could have been scheduled in
> these slots since they don't have any data dependency on the ADD;
This looks like it might be to do with the latency of the call instruction at
least for the LPIC0 case. The scheduler thinks that r0 isn't ready really till
cycle 34 or so and hence the compiler can't hoist the mov r5, r0 above the add
r4, pc, r4 .
The case around LPIC1 doesn't seem to show up in a recent build of trunk I have
:
.L5:
ldr r1, .L7+24 @ 135 pic_load_addr_32bit [length = 4]
add r2, r5, #32768 @ 25 *arm_addsi3/1 [length = 4]
mov r0, r7 @ 27 *arm_movsi_insn/1 [length = 4]
.LPIC1:
add r1, pc, r1 @ 28 pic_add_dot_plus_eight [length = 4]
add r2, r2, #180 @ 29 *arm_addsi3/1 [length = 4]
bl gst_structure_get_int(PLT) @ 30 *call_value_symbol
This is the bit I see with a more recent version of trunk and that looks better
than what was shown in this case.
We need to dig further into the 1136 TRM for the other comments in this report.
Ramana