[lra] patch to solve most scalability problems for LRA

Wed Oct 10 14:57:00 GMT 2012

On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:
>   The following patch solves most of LRA scalability problems.
>
>   It switches on simpler algorithms in LRA.  The first it switches off
> trying to reassign hard registers to spilled pseudos (they usually for such
> huge functions have long live ranges -- so the possibility to assign
> them something very small but trying to reassign them a hard registers
> is to expensive), inheritance, live range splitting, and memory
> coalescing optimizations.  It seems that rematerialization is too
> important for performance -- so I don't switch it off.  As splitting is
> also necessary for generation of caller saves code, I switch off
> caller-saves in IRA and force IRA to do non-regional RA.

Hi Vlad,

I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...

lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
 integrated RA           : 145.26 (18%)
 LRA non-specific        :  46.94 ( 6%)
 LRA virtuals elimination:  51.56 ( 6%)
 LRA reload inheritance  :   0.03 ( 0%)
 LRA create live ranges  :  46.67 ( 6%)
 LRA hard reg assignment :   0.55 ( 0%)

lra-branch today + ira-speedup-1.diff:
 integrated RA           : 111.19 (15%) usr
 LRA non-specific        :  21.16 ( 3%) usr
 LRA virtuals elimination:   0.65 ( 0%) usr
 LRA reload inheritance  :   0.01 ( 0%) usr
 LRA create live ranges  :  56.33 ( 8%) usr
 LRA hard reg assignment :   0.58 ( 0%) usr

lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
 integrated RA           :  89.43 (11%) usr
 LRA non-specific        :  21.43 ( 3%) usr
 LRA virtuals elimination:   0.61 ( 0%) usr
 LRA reload inheritance  :   6.10 ( 1%) usr
 LRA create live ranges  :  88.64 (11%) usr
 LRA hard reg assignment :  45.17 ( 6%) usr
 LRA coalesce pseudo regs:   2.24 ( 0%) usr

Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.

Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...

Ciao!
Steven