On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:
The following patch solves most of LRA scalability problems.
It switches on simpler algorithms in LRA. The first it switches off
trying to reassign hard registers to spilled pseudos (they usually for such
huge functions have long live ranges -- so the possibility to assign
them something very small but trying to reassign them a hard registers
is to expensive), inheritance, live range splitting, and memory
coalescing optimizations. It seems that rematerialization is too
important for performance -- so I don't switch it off. As splitting is
also necessary for generation of caller saves code, I switch off
caller-saves in IRA and force IRA to do non-regional RA.
Hi Vlad,
I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...
lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
integrated RA : 145.26 (18%)
LRA non-specific : 46.94 ( 6%)
LRA virtuals elimination: 51.56 ( 6%)
LRA reload inheritance : 0.03 ( 0%)
LRA create live ranges : 46.67 ( 6%)
LRA hard reg assignment : 0.55 ( 0%)
lra-branch today + ira-speedup-1.diff:
integrated RA : 111.19 (15%) usr
LRA non-specific : 21.16 ( 3%) usr
LRA virtuals elimination: 0.65 ( 0%) usr
LRA reload inheritance : 0.01 ( 0%) usr
LRA create live ranges : 56.33 ( 8%) usr
LRA hard reg assignment : 0.58 ( 0%) usr
lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
integrated RA : 89.43 (11%) usr
LRA non-specific : 21.43 ( 3%) usr
LRA virtuals elimination: 0.61 ( 0%) usr
LRA reload inheritance : 6.10 ( 1%) usr
LRA create live ranges : 88.64 (11%) usr
LRA hard reg assignment : 45.17 ( 6%) usr
LRA coalesce pseudo regs: 2.24 ( 0%) usr
Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.
Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...