This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Register Rematerialization In New Register Allocator
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: "Mukta Punjani, Noida" <muktap at noida dot hcltech dot com>
- Cc: gcc at gcc dot gnu dot org, Michael Matz <matz at suse dot de>
- Date: Wed, 02 Jul 2003 13:23:51 -0400
- Subject: Re: Register Rematerialization In New Register Allocator
- References: <E04CF3F88ACBD5119EFE00508BBB21210A3BF6FF@exch-01.noida.hcltech.com>
"Mukta Punjani, Noida" wrote:
>
> Hi,
>
> I would like to discuss the register remat(rematerialization)
> in new register allocator branch.
>
> During register allocation, when a value cannot be kept in a register and
> needs to be spilled, the register allocator should recognize when it is
> cheaper to recompute the value (i.e. to remat it), rather than to store and
> reload it.
> An example of this can be
> o Calculate Frame Pointer(FP) + offset ->store to say r150
> o Spill/restore r150 to/from stack even though it would be cheaper to
> clobber r150 and regenerate it from scratch.
>
> Remat can score over spilling (in terms of aggregate cost of remat/spill
> instructions generated) in a number of cases such as frame pointer/static
> data calculation related spills and also other trivial calculations from
> ever live values in a function.
>
> The new register allocator seem to remat only constant loads. I plan to
> enhance the remat infrastructure as follows -
>
> o Collect remat information during the data flow pass itself (to propagate
> live range information) - (in df.c)
> o The remat v/s spill cost criteria can't be directly applied in the current
> framework (as no hard stack slots are assigned during spilling, so can't
> determine the spill cost). A workaround for remat/spill criteria might be to
> find sure shot cases of better remat/spill and take spill decision
> accordingly.
>
> Example of such a case (sure remat case) could be spilling of a def before
> its first use. Here, spill can be avoided by moving the def being spilled
> closer to its use. Other such cases can be reasoned out and treated
> likewise.
>
> Thoughts and comments awaited.
FYI, about month ago I started to write register pressure relief through
rematerialization optimization.
I've started this work because I believe that this optimization is
important for improving p4 performance. The gap between intel compiler
and gcc for p4 became bigger in comparison with p3. The important
difference between p4 and p3 is that p4 has incredibly fast core (simple
arithmetic insn takes 0.5 cycle) and slower L1 cache (reading/writing
integers to L1 is 2 cycle and much more for floating point data).
Therefore p4 is very sensitive to register allocation. E.g. usage of
-fomit-frame-pointer is always winner for p4 even though it increases
the code size (by using longer insn variants with addressing through
sp).
I've decided to implement the optimization as a separate phase. The
major reason of this was intention to use it with the old and the new
(colour based) register allocation (which I hope sometimes will be
proved working better for x86 than the old one). Another reason of this
is in the algorithm complexity. The colour based register allocation is
complicated itself and adding another complex optimization is not
reasonable with my point of view. Although I should admit the
optimization should work better with register coalescing.
About the algorithm I am implementing. Relief of register pressure is
made by copying insns which are decreasing the register pressure. It
can be considered as opposite pass to redundancy elimination (actually
PRE is one of the biggest creators of high register pressure). There
are similar data equation in the optimization as in PRE.
The algorithm consists of the following steps:
o finding basic blocks with high register pressure
o finding (cheap) insns available in the high register pressure
block and whose register results and operands are not mentioned
in the block and live through the block.
o copying such insns on edges as far from the block as possible and
as consequence decreasing the register pressure in the block.
o removing insns which are dead after insn copying.
The idea of the optimization could be found in Chapter 7:
"Value driven redundancy elimination", Loren T. Simpson, Rice
University Ph.D thesis, April, 1996.
The core of the optimization is practically written. I think it will
be finished and debugged in 2 months and placed on a branch for tuning
and proving (although you never the final results) its necessity for the
main line. It would be interesting finally to compare it with what you
are going to implement.
Vlad