A new gimple pass (LRS: live range shrinking) to reduce register pressure

Tue Dec 30 06:35:00 GMT 2008

Hi, this is a patch that is waiting to be submitted for a while. The 
original implementation was reviewed by Daniel B., and Ian T. internally 
in google a while back, but it has since then been enhanced a lot.

This patch implements a new gcc tree pass (lrs) which includes the 
following components:

* An iterative data flow analysis (live use references). The result of 
the analysis is used to estimate the register pressure as well as the 
impact (cost/benefit) of code motions on the register pressure. The data 
flow result can be easily updated under various transformations.
* An upward code motion pass to shrink live ranges
* A downward code motion pass to perform subtree scheduling (to reduce 
overlapping live ranges). Multiple use trees are also scheduled downward 
if profitable
* A forward data flow analysis to compute reaching virtual defs -- the 
result of this analysis is used for legality check for downward motion 
of statements with virtual uses
* An expression tree reassociation pass to enable more opportunities for 
overlapping live range reduction (this is complementary to the existing 
reassociation pass, but with a different objective).

The change is motivated by an application internal to google. The 
changes have been tested on SPEC06 (i686 target, -O3 -ffast-math)

The following is the performance impact (measured on core-2)

Benchmark       LRS      NO_LRS   Improvement

464.h264ref     20.9      20.0     4.5%

433.milc         9.95     9.80     1.5%
436.cactusADM    6.82     6.64     2.7%
454.calculix     9.20     9.10     1.0%
470.lbm          13.4     13.0     3.0%

The performance changes have been verified to be caused by reduced 
number of spills in the hottest loops.

The compiler bootstrap (i686) is done successfully with the changes and 
no regression is seen in the regression test.

The patch is not so small, so the review process may be long. In the 
meantime, if you can help out with some performance test (on platforms 
other than i686), that will be very helpful.

In terms of phase order, LRS is right after the second reassociation 
pass, and the loop recognition can be shared between the two passes to 
save some compile time -- but I do not find a clean way to do that. 
Besides, the register pressure analysis result can probably be useful to 
be passed down so that subsequent passes do not introduce more 
overlapping live ranges or undo the code motion performed by LRS.

Your suggestions are welcome.

Thanks,

David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lrs.patch
Type: text/x-patch
Size: 164482 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20081230/2f543351/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lrs.cl
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20081230/2f543351/attachment.ksh>