This is the mail archive of the
mailing list for the GCC project.
RE: New rematerialization sub-pass in LRA
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: <vmakarov at redhat dot com>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 13 Oct 2014 17:24:04 +0100
- Subject: RE: New rematerialization sub-pass in LRA
- Authentication-results: sourceware.org; auth=none
- References: <5437F4EC dot 2070809 at redhat dot com> <543BC697 dot 4010207 at arm dot com>
> Here is a new rematerialization sub-pass of LRA.
> I've tested and benchmarked the sub-pass on x86-64 and ARM. The
> sub-pass permits to generate a smaller code in average on both
> architecture (although improvement no-significant), adds < 0.4%
> additional compilation time in -O2 mode of release GCC (according user
> time of compilation of 500K lines fortran program and valgrind lakey #
> insns in combine.i compilation) and about 0.7% in -O0 mode. As the
> performance result, the best I found is 1% SPECFP2000 improvement on
> ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
> results are practically the same (Haswell has a very good
> sophisticated memory sub-system).
I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves
-mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is
AArch64 specific, but previously non-optimal register allocation choices triggered
A latent bug in ree (it's unclear why GCC still allocates FP registers in
high-pressure integer code, as I set the costs for int<->FP moves high).
On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
SPECFP is ~0.2% faster.
Generally I think it is good to have a specific pass for rematerialization.
However should this not also affect the costs of instructions that can be
cheaply rematerialized? Similarly for the choice whether to caller save or spill
(today the caller-save code doesn't care at all about rematerialization, so it
aggressively caller-saves values which could be rematerialized - see eg.
Also I am confused by the claim "memory reads are not profitable to rematerialize".
Surely rematerializing a memory read from const-data or literal pool is cheaper
than spilling as you avoid a store to the stack?