This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFC: LRA for x86/x86-64 [0/9]
- From: Steven Bosscher <stevenb dot gcc at gmail dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 4 Oct 2012 20:43:26 +0200
- Subject: Re: RFC: LRA for x86/x86-64 [0/9]
- References: <5064D9AD.5070504@redhat.com> <CABu31nPaws_d+QoBdO_PxSXjHsZ8Kb10aOM_9GeWQKhoyG6_mA@mail.gmail.com> <5065C066.4040600@redhat.com> <5066486B.70205@redhat.com> <CABu31nP-19ZPkW6twoWZbPqThsY6zhyvhiW445Co+Gr_CykL=A@mail.gmail.com>
On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> To put it in another perspective, here are my timings of trunk vs lra
> (both checkouts done today):
>
> trunk:
> integrated RA : 181.68 (24%) usr 1.68 (11%) sys 183.43
> (24%) wall 643564 kB (20%) ggc
> reload : 11.00 ( 1%) usr 0.18 ( 1%) sys 11.17 (
> 1%) wall 32394 kB ( 1%) ggc
> TOTAL : 741.64 14.76 756.41
> 3216164 kB
>
> lra branch:
> integrated RA : 174.65 (16%) usr 1.33 ( 8%) sys 176.33
> (16%) wall 643560 kB (20%) ggc
> reload : 399.69 (36%) usr 2.48 (15%) sys 402.69
> (36%) wall 41852 kB ( 1%) ggc
> TOTAL :1102.06 16.05 1120.83
> 3231738 kB
>
> That's a 49% slowdown. The difference is completely accounted for by
> the timing difference between reload and LRA.
With Vlad's patch to switch off expensive LRA parts for extreme
functions ([lra revision 192093]), the numbers are:
integrated RA : 154.27 (17%) usr 1.27 ( 8%) sys 155.64
(17%) wall 131534 kB ( 5%) ggc
LRA non-specific : 69.67 ( 8%) usr 0.79 ( 5%) sys 70.40 (
8%) wall 18805 kB ( 1%) ggc
LRA virtuals elimination: 55.53 ( 6%) usr 0.00 ( 0%) sys 55.49 (
6%) wall 20465 kB ( 1%) ggc
LRA reload inheritance : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 (
0%) wall 57 kB ( 0%) ggc
LRA create live ranges : 80.46 ( 4%) usr 1.05 ( 6%) sys 81.49 (
4%) wall 2459 kB ( 0%) ggc
LRA hard reg assignment : 1.78 ( 0%) usr 0.05 ( 0%) sys 1.85 (
0%) wall 0 kB ( 0%) ggc
reload : 6.38 ( 1%) usr 0.13 ( 1%) sys 6.51 (
1%) wall 0 kB ( 0%) ggc
TOTAL : 917.42 16.35 933.78
2720151 kB
Recalling trunk total time (r191835):
> TOTAL : 741.64 14.76 756.41
the slowdown due to LRA is down from 49% to 23%, with still room for
improvement (even without crippling LRA further). Size with the
expensive LRA parts switched off is still better thank trunk:
$ size slow.o*
text data bss dec hex filename
3499938 8 583 3500529 3569f1 slow.o.00_trunk_r191835
3386117 8 583 3386708 33ad54 slow.o.01_lra_r191626
3439755 8 583 3440346 347eda slow.o.02_lra_r192093
The lra-branch outperforms trunk on everything else I've thrown at it,
in terms of compile time and code size at least, and also e.g. on
Fortran polyhedron runtime.
Ciao!
Steven