This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: LRA for x86/x86-64 [0/9]


On 12-09-28 11:21 AM, Vladimir Makarov wrote:
On 12-09-28 4:21 AM, Steven Bosscher wrote:
On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
   Any comments and proposals are appreciated.  Even if GCC community
decides that it is too late to submit it to gcc4.8, the earlier reviews
are always useful.
I would like to see some benchmark numbers, both for code quality and
compile time impact for the most notorious compile time hog PRs for
large routines where IRA performs poorly (e.g. PR54146, PR26854).




PR26854 will take a lot of time to get the data. So I inform you when I get them.

Related to compilation time. I reported the compilation time on GCC Cauldron for -O2/-O3 and Richard asked me about -O0. I did not have the answer that time. I checked the compilation time for all_cp2k_fortran.f90 (500K lines of fortran). The compilation time (usr and real time) was the same (no visible differences) for GCC with reload and for GCC with LRA for -O0.

When I started LRA project, my major design decision was to reflect LRA decision in RTL as much as possible. This simplifies LRA and make it easy for maintanence and this is quite different from reload design. I realized that time that LRA will be slower reload because of this decision as reload works on specialized very fast representation and roughly speaking changes RTL only once at the end of its work when it decides that it can a generate a right RTL from the representation while LRA takes most info from RTL (a bit simplified picture) and changes RTL many times during its work.

For me it was a surprise that I managed the same GCC speed (or even 2-3% faster for all_cp2k_fortran.f90 on x86) as reload after some hard work. But if you check LRA through valgrind --tool=lackey, you will see that LRA still, as I guessed before, executes more insns than reload. I think that the same or better speed of LRA is achieved by better data and code locality, and smaller code size which is translated in faster work of the subsequent passes.


Here is the numbers for PR26854: I used GCC with -O2 on Corei7-2600
(3.4Gzh) with 8GB memory.

The results are strange to me.  Although user time is less for reload
but the real time is less for LRA.  I checked it several times the
results are always the same with insignificant noise.  Also the
difference in reload and LRA times reported by -ftime-report is
sometimes less than whole compile times difference.

The code size with LRA is always smaller (up to 15%).  The data and
code locality is better with LRA (according reported major and minor
page faults).

I added logs for more details.

----------------------------------32-bit------------------------------------
Reload:
581.85user 29.91system 27:15.18elapsed 37%CPU (0avgtext+0avgdata 7730628maxresident)k
17558136inputs+21888outputs (307229major+8039646minor)pagefaults 0swaps
text data bss dec hex filename
2102836 213032 1568 2317436 235c7c rall.o
LRA:
629.67user 24.16system 24:31.08elapsed 44%CPU (0avgtext+0avgdata 7739172maxresident)k
13219464inputs+21432outputs (230546major+6874087minor)pagefaults 0swaps
text data bss dec hex filename
1707980 213032 1568 1922580 1d5614 lall.o


----------------------------------64-bit:-----------------------------------
Reload:
503.26user 36.54system 30:16.62elapsed 29%CPU (0avgtext+0avgdata 7755104maxresident)k
23111632inputs+19824outputs (399464major+8804300minor)pagefaults 0swaps
text data bss dec hex filename
1595546 423792 3040 2022378 1edbea rall.o
LRA:
598.70user 30.90system 27:26.92elapsed 38%CPU (0avgtext+0avgdata 7749424maxresident)k
19382904inputs+19752outputs (333306major+7343974minor)pagefaults 0swaps
text data bss dec hex filename
1581178 423792 3040 2008010 1ea3ca lall.o


Here is the numbers for PR54146 on the same machine with -O1 only for
64-bit (compiler reports error for -m32).

Reload:
350.40user 21.59system 17:09.75elapsed 36%CPU (0avgtext+0avgdata 7289044maxresident)k
8825824inputs+92800outputs (167226major+4631676minor)pagefaults 0swaps
text data bss dec hex filename
6556628 16 607 6557251 640e43 rs.o
LRA:
468.29user 21.35system 15:47.76elapsed 51%CPU (0avgtext+0avgdata 7728200maxresident)k
7407936inputs+91552outputs (140025major+5271594minor)pagefaults 0swaps
text data bss dec hex filename
6277934 16 607 6278557 5fcd9d ls.o


Attachment: ALL
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]