A new gimple pass (LRS: live range shrinking) to reduce register pressure
Xinliang David Li
davidxl@google.com
Tue Dec 30 06:35:00 GMT 2008
Hi, this is a patch that is waiting to be submitted for a while. The
original implementation was reviewed by Daniel B., and Ian T. internally
in google a while back, but it has since then been enhanced a lot.
This patch implements a new gcc tree pass (lrs) which includes the
following components:
* An iterative data flow analysis (live use references). The result of
the analysis is used to estimate the register pressure as well as the
impact (cost/benefit) of code motions on the register pressure. The data
flow result can be easily updated under various transformations.
* An upward code motion pass to shrink live ranges
* A downward code motion pass to perform subtree scheduling (to reduce
overlapping live ranges). Multiple use trees are also scheduled downward
if profitable
* A forward data flow analysis to compute reaching virtual defs -- the
result of this analysis is used for legality check for downward motion
of statements with virtual uses
* An expression tree reassociation pass to enable more opportunities for
overlapping live range reduction (this is complementary to the existing
reassociation pass, but with a different objective).
The change is motivated by an application internal to google. The
changes have been tested on SPEC06 (i686 target, -O3 -ffast-math)
The following is the performance impact (measured on core-2)
Benchmark LRS NO_LRS Improvement
464.h264ref 20.9 20.0 4.5%
433.milc 9.95 9.80 1.5%
436.cactusADM 6.82 6.64 2.7%
454.calculix 9.20 9.10 1.0%
470.lbm 13.4 13.0 3.0%
The performance changes have been verified to be caused by reduced
number of spills in the hottest loops.
The compiler bootstrap (i686) is done successfully with the changes and
no regression is seen in the regression test.
The patch is not so small, so the review process may be long. In the
meantime, if you can help out with some performance test (on platforms
other than i686), that will be very helpful.
In terms of phase order, LRS is right after the second reassociation
pass, and the loop recognition can be shared between the two passes to
save some compile time -- but I do not find a clean way to do that.
Besides, the register pressure analysis result can probably be useful to
be passed down so that subsequent passes do not introduce more
overlapping live ranges or undo the code motion performed by LRS.
Your suggestions are welcome.
Thanks,
David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lrs.patch
Type: text/x-patch
Size: 164482 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20081230/2f543351/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lrs.cl
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20081230/2f543351/attachment.ksh>
More information about the Gcc-patches
mailing list