[PATCH] Loop unswitching: support gswitch statements.

Richard Biener richard.guenther@gmail.com
Fri Nov 19 10:00:35 GMT 2021


On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 11/11/21 08:15, Richard Biener wrote:
> > So I'd try to do no functional change first, improving the costing and
> > setting up the transform to simply pick up the stmts to "fold" as discovered
> > during analysis (as I hinted you possibly can use gimple_uid to mark
> > the stmts that simplify, IIRC gimple_uid is preserved during copying.
> > gimple_uid would also scale better than gimple_plf in case we do
> > the analysis for all candidates at once).
>
> Thinking about the analysis. Am I correct that we want to properly calculate
> loop size for true and false edge of a potential gcond before the actually unswitching?

Yes.

> We can do that by finding a first gcond candidate, evaluate (symbolic + irange approache)
> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to what we do now
> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these reachable blocks.
> Having that, we can calculate # of insns that will live in true/false loops.

So whatever we do here we should record as "this control stmt folds to
{true,false}" (or {true,unknown},
or in future, "this control stmt will lead to edge {e,unknown}"),
recording the simplification
on the true/false loop version in a way we can apply it after the transform.

> Then we can call tree_unswitch_loop and make the gcond folding as we do in the versioned loops.
>
> Is it a step in good direction? Having that we can then extend it to gswitch statements.

One issue I see is that BB_REACHABLE is there only once but you could use
auto_bb_flag reachable_true, reachable_false to distinguish the
true/false loop version
copies.

So yes, I think that sounds reasonable.  At the point we want to
evaluate different
(first) unswitching opportunities against each other storing this only
as BB flag is
likely to hit limits.  When we want to evaluate multiple levels of
unswitching before
doing any transforms even more so (if there are 3 opportunities there'd be
many cases to be considered when going to level 3 ;)).  I _think_ that a sparse
lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns
do to a DFS walk from the loop header, ignoring not taken edges by
consulting the
lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
so we just have to sum a different set of BBs rather than walking all
stmts again.

That said, the second step would definitely be to choose the "best" opportunity
on the current level.

Richard.

> Cheers,
> Martin


More information about the Gcc-patches mailing list