This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: LRA for x86/x86-64 [0/9]


On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> To put it in another perspective, here are my timings of trunk vs lra
> (both checkouts done today):
>
> trunk:
>  integrated RA           : 181.68 (24%) usr   1.68 (11%) sys 183.43
> (24%) wall  643564 kB (20%) ggc
>  reload                  :  11.00 ( 1%) usr   0.18 ( 1%) sys  11.17 (
> 1%) wall   32394 kB ( 1%) ggc
>  TOTAL                 : 741.64            14.76           756.41
>       3216164 kB
>
> lra branch:
>  integrated RA           : 174.65 (16%) usr   1.33 ( 8%) sys 176.33
> (16%) wall  643560 kB (20%) ggc
>  reload                  : 399.69 (36%) usr   2.48 (15%) sys 402.69
> (36%) wall   41852 kB ( 1%) ggc
>  TOTAL                 :1102.06            16.05          1120.83
>       3231738 kB
>
> That's a 49% slowdown. The difference is completely accounted for by
> the timing difference between reload and LRA.

With Vlad's patch to switch off expensive LRA parts for extreme
functions ([lra revision 192093]), the numbers are:

 integrated RA           : 154.27 (17%) usr   1.27 ( 8%) sys 155.64
(17%) wall  131534 kB ( 5%) ggc
 LRA non-specific        :  69.67 ( 8%) usr   0.79 ( 5%) sys  70.40 (
8%) wall   18805 kB ( 1%) ggc
 LRA virtuals elimination:  55.53 ( 6%) usr   0.00 ( 0%) sys  55.49 (
6%) wall   20465 kB ( 1%) ggc
 LRA reload inheritance  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
0%) wall      57 kB ( 0%) ggc
 LRA create live ranges  :  80.46 ( 4%) usr   1.05 ( 6%) sys  81.49 (
4%) wall    2459 kB ( 0%) ggc
 LRA hard reg assignment :   1.78 ( 0%) usr   0.05 ( 0%) sys   1.85 (
0%) wall       0 kB ( 0%) ggc
 reload                  :   6.38 ( 1%) usr   0.13 ( 1%) sys   6.51 (
1%) wall       0 kB ( 0%) ggc
 TOTAL                 : 917.42            16.35           933.78
      2720151 kB

Recalling trunk total time (r191835):

>  TOTAL                 : 741.64            14.76           756.41

the slowdown due to LRA is down from 49% to 23%, with still room for
improvement (even without crippling LRA further). Size with the
expensive LRA parts switched off is still better thank trunk:
$ size slow.o*
   text    data     bss     dec     hex filename
3499938       8     583 3500529  3569f1 slow.o.00_trunk_r191835
3386117       8     583 3386708  33ad54 slow.o.01_lra_r191626
3439755       8     583 3440346  347eda slow.o.02_lra_r192093

The lra-branch outperforms trunk on everything else I've thrown at it,
in terms of compile time and code size at least, and also e.g. on
Fortran polyhedron runtime.

Ciao!
Steven


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]