This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: LRA for x86/x86-64 [0/9]

On 12-09-28 4:21 AM, Steven Bosscher wrote:
On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov <> wrote:
   Any comments and proposals are appreciated.  Even if GCC community
decides that it is too late to submit it to gcc4.8, the earlier reviews
are always useful.
I would like to see some benchmark numbers, both for code quality and
compile time impact for the most notorious compile time hog PRs for
large routines where IRA performs poorly (e.g. PR54146, PR26854).

I should look at this, Steven. Unfortunately, the compiler @ trunk (without my patch) crashes on PR54156:

../../../trunk2/ In function ‘void check_() [with NT = CGAL::Gmpfi; int s = 3]’:
../../../trunk2/ internal compiler error: Segmentation fault
void check_(){
0x888adf crash_signal
0x8f4718 gimple_code
0x8f4718 gimple_nop_p
0x8f4718 walk_aliased_vdefs_1
0x8f50ed walk_aliased_vdefs(ao_ref_s*, tree_node*, bool (*)(ao_ref_s*, tree_node*, void*), void*, bitmap_head_def**)
0x9018b5 propagate_necessity
0x9027b3 perform_tree_ssa_dce
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <> for instructions.

PR26854 will take a lot of time to get the data. So I inform you when I get them.

Related to compilation time. I reported the compilation time on GCC Cauldron for -O2/-O3 and Richard asked me about -O0. I did not have the answer that time. I checked the compilation time for all_cp2k_fortran.f90 (500K lines of fortran). The compilation time (usr and real time) was the same (no visible differences) for GCC with reload and for GCC with LRA for -O0.

When I started LRA project, my major design decision was to reflect LRA decision in RTL as much as possible. This simplifies LRA and make it easy for maintanence and this is quite different from reload design. I realized that time that LRA will be slower reload because of this decision as reload works on specialized very fast representation and roughly speaking changes RTL only once at the end of its work when it decides that it can a generate a right RTL from the representation while LRA takes most info from RTL (a bit simplified picture) and changes RTL many times during its work.

For me it was a surprise that I managed the same GCC speed (or even 2-3% faster for all_cp2k_fortran.f90 on x86) as reload after some hard work. But if you check LRA through valgrind --tool=lackey, you will see that LRA still, as I guessed before, executes more insns than reload. I think that the same or better speed of LRA is achieved by better data and code locality, and smaller code size which is translated in faster work of the subsequent passes.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]