A new gimple pass (LRS: live range shrinking) to reduce register pressure
Xinliang David Li
Sat Jan 3 00:59:00 GMT 2009
On Fri, Jan 2, 2009 at 3:37 PM, Steven Bosscher <firstname.lastname@example.org> wrote:
> When you initially posted this new pass, I was quite enthusiastic
> about the idea. It seemed to me that it would be a very useful
Thanks for the confidence and inconfidence below :)
> But the more I think about this pass, the more I get the feeling this
> is not the right place to insert these transformations. There was
> some discussion yesterday about this patch, and in particular whether
> this should be a GIMPLE pass or an RTL pass. I think this pass may be
> too x86-centric. I wouldn't normally complain about that ;-) But I
> believe you can get better and more generally useful (i.e. for other
> targets) results if you implement this not as a gimple GIMPLE pass,
> but instead as a late RTL pass.
Well this is generally the case, but not completely true. There are
two parts of phase: the analysis and transformation. The analysis part
should actually be done early in the pipeline -- the result of which
can be used to guide transformations downstream -- as not all
transformations that increase register pressure can be undone easily.
The code motion part in theory needs to be done as late as possible.
However if later phases are made register pressure aware (by using the
early estimate), this (putting it late) does not seem that important.
> There are many optimizations after this live range shrinking pass that
> extend live ranges, such as fwprop, rtl-PRE, and sched1. None of
> these passes actually matter much for the i686 target that you tested
> on, e.g.:
> * rtl-PRE is completely ineffective for i686 and x86_64 because of all
> the reg:17 clobbers (only lea insns are PRE candidates)
> * sched1 doesn't run at all for the x86* targets
There is also rtl level loop unrolling -- but this one should be moved up.
> Therefore, your testing hasn't measured the effects of passes that
> undo the transformations that LRS would be doing for you, and it is
> impossible to tell how useful your pass is for non-x86 targets. But I
> have a feeling it will be not very useful, except perhaps for some of
> the re-association transformations.
Pre-LRS re-association pass is in fact one of the main contributors to
register pressure increase due to its unique of way (eager) of code
motion -- which can be corrected by LRS -- the corrected results will
not likely be undone later.
Besides, is it a trend in gcc that more and more passes (machine
independent) will be moved up stream (as gimple passes) due to obvious
reasons? sched1 can be problem for LRS (but can be toned down due to
high reg pressure).
> I'd bet a beer or two that you don't measure the kind of improvements
> you find for i686 if you'd benchmark the patch on POWER or ia64 -- and
> not just because those targets simply have more registers, but simply
> because the passes after live range shrinking will undo most (all?) of
> the good work your pass has done.
POWER and ia64 are not good choices -- LRS may not even kick in if
register pressure (relative) is low.
> In summary: My final $0.02 is that this pass should not go into the
> trunk until there is some good evidence that there are transformations
> this live range shrinking patch can do, that we cannot do on RTL. And
> I would also recommend someone investigates how this can be
> implemented as a pre-regalloc pass in RTL.
I knew this would come up (and Daniel B has warned me :) ). I prefer
a gimple pass -- as I would hope this pass to be a placeholder (the
one in this patch is simply a start) for all sorts of register
pressure reduction related cleanups and transformations (including
undos of some of the previous transformations). They can share the
same analysis. Existence of ssa and memSSA makes a gimple pass much
> You are basically running a scheduling problem. Your pass reminded me
> about a paper I read recently: "Minimum Register Instruction
> Scheduling: A New Approach for Dynamic Instruction Issue Processors",
> by R. Govindarajan et. al. Maybe you should implement this as a set of
> scheduler heuristics for sched1, and enable sched1 for the x86*
> targets. (This would also solve about a dozen bugzilla bugs about
> sched1 on x86, as a bonus.)
As mentioned above, it is not entirely this (scheduling).
More information about the Gcc-patches