A new gimple pass (LRS: live range shrinking) to reduce register pressure
Xinliang David Li
Thu Jan 1 18:33:00 GMT 2009
On Thu, Jan 1, 2009 at 9:02 AM, Steven Bosscher <firstname.lastname@example.org> wrote:
> Xinliang David Li wrote:
>> * An iterative data flow analysis (live use references). The result
>> of the analysis is used to estimate the register pressure as well
>> as the impact (cost/benefit) of code motions on the register pressure.
>> The data flow result can be easily updated under various transformations.
>> * An upward code motion pass to shrink live ranges
>> * A downward code motion pass to perform subtree scheduling (to reduce
>> overlapping live ranges). Multiple use trees are also scheduled downward
>> if profitable
>> * A forward data flow analysis to compute reaching virtual defs -- the
>> result of this analysis is used for legality check for downward motion
>> of statements with virtual uses
>> * An expression tree reassociation pass to enable more opportunities
>> for overlapping live range reduction (this is complementary to the
>> existing reassociation pass, but with a different objective).
> Are all these analyses/transformations interdependent?
All the transformations share the same data flow analysis result.
> Is there a reason for not just adding the reassociation
> transformations to tree-ssa-reassoc.c?
1) see above -- there is dependency on the data flow result
2) the optimization objectives are different and therefore the tree
operand reordering scheme may be conflict. The one in
tree-ssa-reassoc.c is based on operand ranks in order to expose
opportunities for simplification (folding), factorization, PRE/LICM,
and to reduce dependence height. The one in LRS is to maximizing
freedom for lrs.
>> +sbitmap_a_and_not_b (sbitmap dst, const_sbitmap a, const_sbitmap b)
> See sbitmap_difference().
They are similar -- the latter is more general and has more checks
(unnecessary when sizes are the same).
>> @@ -1095,4 +1155,3 @@ sbitmap_popcount (const_sbitmap a, unsig
>> return count;
> Redundant white space change.
>> +DEFPARAM (PARAM_REG_PRESSURE_MIN_BB,
>> + "reg-pressure-min-bb",
>> + "The parmater to control the behavior of the lrs shrinking"
>> + "phase",
>> + 80, 1, 10000)
> The description is not very useful (for all 3 params). You should also
> add documentation for the new params in doc/invoke.texi.
>> Index: passes.c
>> --- passes.c (revision 142954)
>> +++ passes.c (working copy)
>> @@ -674,6 +674,7 @@ init_optimization_passes (void)
>> NEXT_PASS (pass_cse_reciprocals);
>> NEXT_PASS (pass_convert_to_rsqrt);
>> NEXT_PASS (pass_reassoc);
>> + NEXT_PASS (pass_lrs);
>> NEXT_PASS (pass_vrp);
>> NEXT_PASS (pass_dominator);
>> /* The only const/copy propagation opportunities left after
> If you run the pass here, you still have tracer and forwprop (and some
> more) running after live range shrinking. With profile-guided
> optimization, tracer restructures the CFG completely, so this would
> change the live ranges completely too. Therefore I'd run this pass
> after tracer. And forwprop propagates SSA names, maybe extending live
> ranges, so you probably want to run it after the last fwprop too.
> In fact, I would expect you want to run this as the last pass, maybe
> even after uncprop.
>> +/* The function returns the unique id of the gimple STMT. */
>> +static inline int
>> +get_stmt_idx_in_region (gimple stmt)
>> + return gimple_uid (stmt);
> Don't know what else to say then wtf.
More readable code ? I can make it a macro if it makes you happier :)
>> +is_gimple_assign_rhs_reassociable (enum tree_code rhs_code,
> ISTR we have similar checks in tree-ssa-reassoc.c. Maybe move the
> function there and export it (unless you can move all the reassoc code
> to that file, of course - that'd be the preferred solution ;-)
Yes, I thought about that -- some code factorization is needed.
> I don't see the relation between all the functions in the new
> tree-ssa-lrs.c file. You probably want to group functions and add
> some overviews for the different phases/passes that you implement.
> Bonus if you can split things per-pass into separate files...
Yes, Richard has the same comment.
> How does this new pass (do these new passes?) impact compile time performance?
The fixed overhead is the loop recognition. The rest depends on the
existence of large inner loops (or standlone bbs) in functions which
is usually not so common. I did not see difference in compiler
bootstrap time in my earlier experiment.
More information about the Gcc-patches