[PATCH] Loop unswitching: support gswitch statements.

Richard Biener richard.guenther@gmail.com
Tue Nov 23 13:58:02 GMT 2021

On Mon, Nov 22, 2021 at 4:07 PM Martin Liška <mliska@suse.cz> wrote:
> On 11/19/21 11:00, Richard Biener wrote:
> > On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 11/11/21 08:15, Richard Biener wrote:
> >>> So I'd try to do no functional change first, improving the costing and
> >>> setting up the transform to simply pick up the stmts to "fold" as discovered
> >>> during analysis (as I hinted you possibly can use gimple_uid to mark
> >>> the stmts that simplify, IIRC gimple_uid is preserved during copying.
> >>> gimple_uid would also scale better than gimple_plf in case we do
> >>> the analysis for all candidates at once).
> >>
> >> Thinking about the analysis. Am I correct that we want to properly calculate
> >> loop size for true and false edge of a potential gcond before the actually unswitching?
> >
> > Yes.
> >
> >> We can do that by finding a first gcond candidate, evaluate (symbolic + irange approache)
> >> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to what we do now
> >> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these reachable blocks.
> >> Having that, we can calculate # of insns that will live in true/false loops.
> >
> > So whatever we do here we should record as "this control stmt folds to
> > {true,false}" (or {true,unknown},
> > or in future, "this control stmt will lead to edge {e,unknown}"),
> > recording the simplification
> > on the true/false loop version in a way we can apply it after the transform.
> >
> >> Then we can call tree_unswitch_loop and make the gcond folding as we do in the versioned loops.
> >>
> >> Is it a step in good direction? Having that we can then extend it to gswitch statements.
> >
> > One issue I see is that BB_REACHABLE is there only once but you could use
> > auto_bb_flag reachable_true, reachable_false to distinguish the
> > true/false loop version
> > copies.
> >
> > So yes, I think that sounds reasonable.  At the point we want to
> > evaluate different
> > (first) unswitching opportunities against each other storing this only
> > as BB flag is
> > likely to hit limits.  When we want to evaluate multiple levels of
> > unswitching before
> > doing any transforms even more so (if there are 3 opportunities there'd be
> > many cases to be considered when going to level 3 ;)).  I _think_ that a sparse
> > lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns
> > do to a DFS walk from the loop header, ignoring not taken edges by
> > consulting the
> > lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
> > so we just have to sum a different set of BBs rather than walking all
> > stmts again.
> >
> > That said, the second step would definitely be to choose the "best" opportunity
> > on the current level.
> >
> > Richard.
> >
> >> Cheers,
> >> Martin
> Hello.
> I'm sending a new version where I changed:
> 1) all unswitch_predicates are find for a loop
> 2) context sensitive costing happens based on an unswitch_predicate and BB reachability
>     is implemented
> 3) folding happens in recursive invocation once we decide to unswitch
> 4) the patch folds both symbolic gcond predicates and irange provided by ranger
> 5) debug counter was added
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, I tested it
> on SPEC2006 and SPEC2017 with -size=ref.

Meh, diff made a mess out if this ;)  Random comments, I'm walking
myself the optimizations


+  unswitch_predicate *predicate = NULL;
+  if (num > param_max_unswitch_level)
+    {
+      if (dump_file
+         && (dump_flags & TDF_DETAILS))
+       fprintf (dump_file, ";; Not unswitching anymore, hit max level\n");
+      goto exit;
+    }

this looks like we can do this check before find_all_unswitching_predicates?

+  for (auto pred: candidates)
+    {
+      unsigned cost
+       = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);

so this searches for the first candidate that fits in
param_max_unswitch_insns, it doesn't
yet try to find the cheapest / best one.  Please add a comment to say
that.  After we
found one candidate we apply unswitching to such one candidate (and throw the
others away).  I guess that's OK - it's what the old code did - what
you did for this
intermediate step is actually gather all unswitching predicates
upfront.  Hopefully
we'll be able to share some of the work done for the recursive invocations.

+         fprintf (dump_file, ";; Unswitching loop with condition: ");

"on condition"

+         fprintf (dump_file, ";; Not unswitching condition, loop too big "
+                  "(%d insns): ", cost);

"cost too big"?  I assume 'cost' is the number of stmts we'll add,
loop-size - true-eliminated - false-eliminated?

+  for (auto predicate: candidates)
+    delete predicate;

Some refactoring should get rid of the goto ...

+static unsigned
+evaluate_insns (class loop *loop,  basic_block *bbs,
+               unswitch_predicate *candidate, bool true_edge,
+               gimple_ranger *ranger, auto_bb_flag &reachable_flag)
+                 unswitch_predicate *predicate
+                   = tree_may_unswitch_on (bb, loop, ranger);
+                 if (predicate != NULL)
+                   {
+                     tree folded
+                       = simplify_using_entry_checks (predicate, candidate,
+                                                      true_edge);

so just looking at the implementation it seems that if you assign UIDs
to all last_stmt() in a loop you can keep tree_may_unswitch_on ()
in a dense array and you do not have to re-compute it?  UIDs should
survive copying so even for recursive unswitchings the predicates should
be shareable (pointing to the wrong stmt then, but then just do not record
the stmt in there - it should be available from the caller somehow).

But then I wonder how this translates to gswitch handling where the
result is a taken edge / a set of known not taken edges.  I suggest
to try wiring in gcond support in a separate commit on top of this to
see whether some refactoring is needed.

Looking at evaluate_insns it also seems that 'reachable_flag'
is unused - computing the sizes could be done once you process
a block, no?  Previously I suggested to compute estimate_num_insns
for each BB of the original loop once so you can re-use that and
just need to sum up BB costs.

+static bool
+find_all_unswitching_predicates (class loop *loop, basic_block *bbs,
+                                bool true_edge,
+                                unswitch_predicate *parent_predicate,
+                                gimple_ranger *ranger,
+                                vec<unswitch_predicate *> &candidates)

so you re-do the simplification step here, in fact this function seems
to not only find unswitching predicates but apply simplifications
from the previous unswitching step.  (yes, that's what the old code did ...)
The costing step already did all the work though, it should be
possible to keep a vector of stmts to simplify and the simplification
from that analysis time.

You can get at the "other loop copy" stmt via

  last_stmt (get_bb_copy (gimple_bb (cond)))

if you do that before free_original_copy_tables.  I think it would be
nice to have the simplification step explicit before recursing
even if that means some duplicate walking for now
(I do hope we can share the find_all_unswitching_predicates
analysis step)

You are applying param_max_unswitch_insns with

+      unsigned cost
+       = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);

-      cond = simplify_using_entry_checks (loop, cond);
-      stmt = last_stmt (bbs[i]);
-      if (integer_nonzerop (cond))
+      if (cost <= (unsigned)param_max_unswitch_insns)

where cost is the sum of the simplified sizes of both loop copies after
unswitching.  By its very nature this will match for all recursive invocations
(loops only will get smaller) which is why we need param_max_unswitch_level.
The idea was to have one param_max_unswitch_insns budget for each
_original_ loop the recursive invocations will eat from, removing the need
for param_max_unswitch_level.  I'd re-interpret param_max_unswitch_insns
as the number of insns we may add, so with cost calculated as above we'd

    if (cost <= budget)


    budget -= cost;

before the recursion (and pass by reference so that the two recursions
share the budget).

Initialize by

  budget = original_loop_size + param_max_unswitch_insns;

Hope this wasn't too many comments ;)


> Thoughts?
> Thanks,
> Martin

More information about the Gcc-patches mailing list