This is the mail archive of the
mailing list for the GCC project.
Re: RFC: LRA for x86/x86-64 [0/9]
On 12-09-28 4:21 AM, Steven Bosscher wrote:
On Fri, Sep 28, 2012 at 12:56 AM, Vladimir Makarov <firstname.lastname@example.org> wrote:
I should look at this, Steven. Unfortunately, the compiler @ trunk
(without my patch) crashes on PR54156:
Any comments and proposals are appreciated. Even if GCC community
decides that it is too late to submit it to gcc4.8, the earlier reviews
are always useful.
I would like to see some benchmark numbers, both for code quality and
compile time impact for the most notorious compile time hog PRs for
large routines where IRA performs poorly (e.g. PR54146, PR26854).
../../../trunk2/slow.cc: In function ‘void check_() [with NT =
CGAL::Gmpfi; int s = 3]’:
../../../trunk2/slow.cc:95489:6: internal compiler error: Segmentation fault
0x8f50ed walk_aliased_vdefs(ao_ref_s*, tree_node*, bool (*)(ao_ref_s*,
tree_node*, void*), void*, bitmap_head_def**)
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
PR26854 will take a lot of time to get the data. So I inform you when I
Related to compilation time. I reported the compilation time on GCC
Cauldron for -O2/-O3 and Richard asked me about -O0. I did not have the
answer that time. I checked the compilation time for
all_cp2k_fortran.f90 (500K lines of fortran). The compilation time (usr
and real time) was the same (no visible differences) for GCC with reload
and for GCC with LRA for -O0.
When I started LRA project, my major design decision was to reflect LRA
decision in RTL as much as possible. This simplifies LRA and make it
easy for maintanence and this is quite different from reload design. I
realized that time that LRA will be slower reload because of this
decision as reload works on specialized very fast representation and
roughly speaking changes RTL only once at the end of its work when it
decides that it can a generate a right RTL from the representation while
LRA takes most info from RTL (a bit simplified picture) and changes RTL
many times during its work.
For me it was a surprise that I managed the same GCC speed (or even 2-3%
faster for all_cp2k_fortran.f90 on x86) as reload after some hard work.
But if you check LRA through valgrind --tool=lackey, you will see that
LRA still, as I guessed before, executes more insns than reload. I think
that the same or better speed of LRA is achieved by better data and code
locality, and smaller code size which is translated in faster work of
the subsequent passes.