This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [lra] patch to solve most scalability problems for LRA


On 12-10-10 10:53 AM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:
The following patch solves most of LRA scalability problems.

   It switches on simpler algorithms in LRA.  The first it switches off
trying to reassign hard registers to spilled pseudos (they usually for such
huge functions have long live ranges -- so the possibility to assign
them something very small but trying to reassign them a hard registers
is to expensive), inheritance, live range splitting, and memory
coalescing optimizations.  It seems that rematerialization is too
important for performance -- so I don't switch it off.  As splitting is
also necessary for generation of caller saves code, I switch off
caller-saves in IRA and force IRA to do non-regional RA.
Hi Vlad,

I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...

lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
  integrated RA           : 145.26 (18%)
  LRA non-specific        :  46.94 ( 6%)
  LRA virtuals elimination:  51.56 ( 6%)
  LRA reload inheritance  :   0.03 ( 0%)
  LRA create live ranges  :  46.67 ( 6%)
  LRA hard reg assignment :   0.55 ( 0%)

lra-branch today + ira-speedup-1.diff:
  integrated RA           : 111.19 (15%) usr
  LRA non-specific        :  21.16 ( 3%) usr
  LRA virtuals elimination:   0.65 ( 0%) usr
  LRA reload inheritance  :   0.01 ( 0%) usr
  LRA create live ranges  :  56.33 ( 8%) usr
  LRA hard reg assignment :   0.58 ( 0%) usr

lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
  integrated RA           :  89.43 (11%) usr
  LRA non-specific        :  21.43 ( 3%) usr
  LRA virtuals elimination:   0.61 ( 0%) usr
  LRA reload inheritance  :   6.10 ( 1%) usr
  LRA create live ranges  :  88.64 (11%) usr
  LRA hard reg assignment :  45.17 ( 6%) usr
  LRA coalesce pseudo regs:   2.24 ( 0%) usr

Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.

Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...

I have no idea.

  I can not confirm it on an Intel Corei7 machine.  Here is my timing.
Removing lra_simple_p makes the worst compilation time, but the best
code size.

  It is also interesting that your IRA range patch results in
different code generation (i can not explain it too now). I saw the same
on a small test (black jack playing and betting strategy).

  Another interesting thing is that IRA times are the same (with and
without simplified allocation for LRA).

--- branch this morning
integrated RA : 48.41 (13%) usr 0.25 ( 3%) sys 48.72 (13%) wall 223608 kB (19%) ggc
LRA non-specific : 14.47 ( 4%) usr 0.15 ( 2%) sys 14.57 ( 4%) wall 41443 kB ( 4%) ggc
LRA virtuals elimination: 0.40 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall 36037 kB ( 3%) ggc
LRA reload inheritance : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 1209 kB ( 0%) ggc
LRA create live ranges : 17.37 ( 5%) usr 0.21 ( 3%) sys 17.56 ( 5%) wall 5182 kB ( 0%) ggc
LRA hard reg assignment : 1.77 ( 0%) usr 0.02 ( 0%) sys 1.76 ( 0%) wall 0 kB ( 0%) ggc
LRA coalesce pseudo regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
real=377.25 user=367.58 system=8.36 share=99%% maxrss=33540720 ins=280 outs=92544 mfaults=4448012 waits=17
text data bss dec hex filename
6395340 16 607 6395963 61983b s.o
--- branch this morning + ira range patch
integrated RA : 36.03 (10%) usr 0.03 ( 0%) sys 36.20 (10%) wall 223608 kB (19%) ggc
LRA non-specific : 14.57 ( 4%) usr 0.14 ( 2%) sys 14.89 ( 4%) wall 41453 kB ( 4%) ggc
LRA virtuals elimination: 0.36 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall 36040 kB ( 3%) ggc
LRA reload inheritance : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 1210 kB ( 0%) ggc
LRA create live ranges : 17.36 ( 5%) usr 0.21 ( 3%) sys 17.53 ( 5%) wall 5184 kB ( 0%) ggc
LRA hard reg assignment : 1.78 ( 1%) usr 0.02 ( 0%) sys 1.79 ( 0%) wall 0 kB ( 0%) ggc
LRA coalesce pseudo regs: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
TOTAL : 351.82 7.50 360.52 1149460 kB
real=362.68 user=353.65 system=7.84 share=99%% maxrss=33540432 ins=224 outs=92544 mfaults=4073281 waits=17
text data bss dec hex filename
6395424 16 607 6396047 61988f s.o
--- branch this morning + ira range patch + removing lra_simple_p
integrated RA : 37.87 ( 9%) usr 0.14 ( 2%) sys 38.30 ( 9%) wall 744114 kB (45%) ggc
LRA non-specific : 13.52 ( 3%) usr 0.05 ( 1%) sys 13.60 ( 3%) wall 39171 kB ( 2%) ggc
LRA virtuals elimination: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.40 ( 0%) wall 33096 kB ( 2%) ggc
LRA reload inheritance : 3.31 ( 1%) usr 0.00 ( 0%) sys 3.36 ( 1%) wall 5217 kB ( 0%) ggc
LRA create live ranges : 39.75 (10%) usr 0.42 ( 5%) sys 40.53 (10%) wall 5694 kB ( 0%) ggc
LRA hard reg assignment : 31.87 ( 8%) usr 0.03 ( 0%) sys 31.94 ( 8%) wall 0 kB ( 0%) ggc
LRA coalesce pseudo regs: 1.14 ( 0%) usr 0.00 ( 0%) sys 1.15 ( 0%) wall 0 kB ( 0%) ggc
real=424.69 user=414.47 system=8.06 share=99%% maxrss=36546048 ins=34992 outs=91528 mfaults=4253004 waits=175
text data bss dec hex filename
6278007 16 607 6278630 5fcde6 s.o



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]