[lra] patch to solve most scalability problems for LRA
Steven Bosscher
stevenb.gcc@gmail.com
Wed Oct 10 14:57:00 GMT 2012
On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:
> The following patch solves most of LRA scalability problems.
>
> It switches on simpler algorithms in LRA. The first it switches off
> trying to reassign hard registers to spilled pseudos (they usually for such
> huge functions have long live ranges -- so the possibility to assign
> them something very small but trying to reassign them a hard registers
> is to expensive), inheritance, live range splitting, and memory
> coalescing optimizations. It seems that rematerialization is too
> important for performance -- so I don't switch it off. As splitting is
> also necessary for generation of caller saves code, I switch off
> caller-saves in IRA and force IRA to do non-regional RA.
Hi Vlad,
I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...
lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
integrated RA : 145.26 (18%)
LRA non-specific : 46.94 ( 6%)
LRA virtuals elimination: 51.56 ( 6%)
LRA reload inheritance : 0.03 ( 0%)
LRA create live ranges : 46.67 ( 6%)
LRA hard reg assignment : 0.55 ( 0%)
lra-branch today + ira-speedup-1.diff:
integrated RA : 111.19 (15%) usr
LRA non-specific : 21.16 ( 3%) usr
LRA virtuals elimination: 0.65 ( 0%) usr
LRA reload inheritance : 0.01 ( 0%) usr
LRA create live ranges : 56.33 ( 8%) usr
LRA hard reg assignment : 0.58 ( 0%) usr
lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
integrated RA : 89.43 (11%) usr
LRA non-specific : 21.43 ( 3%) usr
LRA virtuals elimination: 0.61 ( 0%) usr
LRA reload inheritance : 6.10 ( 1%) usr
LRA create live ranges : 88.64 (11%) usr
LRA hard reg assignment : 45.17 ( 6%) usr
LRA coalesce pseudo regs: 2.24 ( 0%) usr
Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.
Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...
Ciao!
Steven
More information about the Gcc-patches
mailing list