[PATCH] Loop unswitching: support gswitch statements.

Martin Liška mliska@suse.cz
Tue Nov 23 15:20:24 GMT 2021


On 11/23/21 14:58, Richard Biener wrote:
> On Mon, Nov 22, 2021 at 4:07 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 11/19/21 11:00, Richard Biener wrote:
>>> On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mliska@suse.cz> wrote:
>>>>
>>>> On 11/11/21 08:15, Richard Biener wrote:
>>>>> So I'd try to do no functional change first, improving the costing and
>>>>> setting up the transform to simply pick up the stmts to "fold" as discovered
>>>>> during analysis (as I hinted you possibly can use gimple_uid to mark
>>>>> the stmts that simplify, IIRC gimple_uid is preserved during copying.
>>>>> gimple_uid would also scale better than gimple_plf in case we do
>>>>> the analysis for all candidates at once).
>>>>
>>>> Thinking about the analysis. Am I correct that we want to properly calculate
>>>> loop size for true and false edge of a potential gcond before the actually unswitching?
>>>
>>> Yes.
>>>
>>>> We can do that by finding a first gcond candidate, evaluate (symbolic + irange approache)
>>>> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to what we do now
>>>> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these reachable blocks.
>>>> Having that, we can calculate # of insns that will live in true/false loops.
>>>
>>> So whatever we do here we should record as "this control stmt folds to
>>> {true,false}" (or {true,unknown},
>>> or in future, "this control stmt will lead to edge {e,unknown}"),
>>> recording the simplification
>>> on the true/false loop version in a way we can apply it after the transform.
>>>
>>>> Then we can call tree_unswitch_loop and make the gcond folding as we do in the versioned loops.
>>>>
>>>> Is it a step in good direction? Having that we can then extend it to gswitch statements.
>>>
>>> One issue I see is that BB_REACHABLE is there only once but you could use
>>> auto_bb_flag reachable_true, reachable_false to distinguish the
>>> true/false loop version
>>> copies.
>>>
>>> So yes, I think that sounds reasonable.  At the point we want to
>>> evaluate different
>>> (first) unswitching opportunities against each other storing this only
>>> as BB flag is
>>> likely to hit limits.  When we want to evaluate multiple levels of
>>> unswitching before
>>> doing any transforms even more so (if there are 3 opportunities there'd be
>>> many cases to be considered when going to level 3 ;)).  I _think_ that a sparse
>>> lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns
>>> do to a DFS walk from the loop header, ignoring not taken edges by
>>> consulting the
>>> lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
>>> so we just have to sum a different set of BBs rather than walking all
>>> stmts again.
>>>
>>> That said, the second step would definitely be to choose the "best" opportunity
>>> on the current level.
>>>
>>> Richard.
>>>
>>>> Cheers,
>>>> Martin
>>
>> Hello.
>>
>> I'm sending a new version where I changed:
>> 1) all unswitch_predicates are find for a loop
>> 2) context sensitive costing happens based on an unswitch_predicate and BB reachability
>>      is implemented
>> 3) folding happens in recursive invocation once we decide to unswitch
>> 4) the patch folds both symbolic gcond predicates and irange provided by ranger
>> 5) debug counter was added
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, I tested it
>> on SPEC2006 and SPEC2017 with -size=ref.
> 
> Meh, diff made a mess out if this ;)  Random comments, I'm walking
> myself the optimizations
> flow.

Sure.

> 
> tree_unswitch_single_loop:
> 
> +  unswitch_predicate *predicate = NULL;
> +  if (num > param_max_unswitch_level)
> +    {
> +      if (dump_file
> +         && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, ";; Not unswitching anymore, hit max level\n");
> +      goto exit;
> +    }
> 
> this looks like we can do this check before find_all_unswitching_predicates?

Makes sense.

> 
> +  for (auto pred: candidates)
> +    {
> +      unsigned cost
> +       = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);
> ...
> 
> so this searches for the first candidate that fits in
> param_max_unswitch_insns, it doesn't
> yet try to find the cheapest / best one.  Please add a comment to say
> that.  After we
> found one candidate we apply unswitching to such one candidate (and throw the
> others away).  I guess that's OK - it's what the old code did - what
> you did for this
> intermediate step is actually gather all unswitching predicates
> upfront.  Hopefully
> we'll be able to share some of the work done for the recursive invocations.
> 
> +         fprintf (dump_file, ";; Unswitching loop with condition: ");
> 
> "on condition"
> 
> +         fprintf (dump_file, ";; Not unswitching condition, loop too big "
> +                  "(%d insns): ", cost);
> 
> "cost too big"?  I assume 'cost' is the number of stmts we'll add,
> loop-size - true-eliminated - false-eliminated?

I'm going to adjust this.

> 
> +exit:
> +  for (auto predicate: candidates)
> +    delete predicate;
> 
> Some refactoring should get rid of the goto ...

You don't like it? It seems to me quite logical as one does not have to repeat
a clean up code before each return statement.

> 
> +static unsigned
> +evaluate_insns (class loop *loop,  basic_block *bbs,
> +               unswitch_predicate *candidate, bool true_edge,
> +               gimple_ranger *ranger, auto_bb_flag &reachable_flag)
> +{
> ...
> +                 unswitch_predicate *predicate
> +                   = tree_may_unswitch_on (bb, loop, ranger);
> +                 if (predicate != NULL)
> +                   {
> +                     tree folded
> +                       = simplify_using_entry_checks (predicate, candidate,
> +                                                      true_edge);
> 
> so just looking at the implementation it seems that if you assign UIDs
> to all last_stmt() in a loop you can keep tree_may_unswitch_on ()
> in a dense array and you do not have to re-compute it?  UIDs should
> survive copying so even for recursive unswitchings the predicates should
> be shareable (pointing to the wrong stmt then, but then just do not record
> the stmt in there - it should be available from the caller somehow).

I like the idea, lemme implement it.

> 
> But then I wonder how this translates to gswitch handling where the
> result is a taken edge / a set of known not taken edges.  I suggest
> to try wiring in gcond support in a separate commit on top of this to

s/gcond/gswitch, right?

> see whether some refactoring is needed.

Sure, so for e.g. case 1 ... 5 we would need to create a new unswitch_predicate
with 1 <= index && index <= 5 tree predicate (and the corresponding irange range).
Later once we unswitch on it, we should use a special unreachable_flag that will
be used for marking of dead edges (similarly how we fold gconds to boolean_{false/true}_node.
Does it make sense?

> 
> Looking at evaluate_insns it also seems that 'reachable_flag'
> is unused - computing the sizes could be done once you process

It is used:

	  if (dest->loop_father == loop
	      && !(dest->flags & reachable_flag)
	      && !(e->flags & flags))
	    {
	      worklist.safe_push (dest);
	      dest->flags |= reachable_flag;
	    }

Where's problem?

> a block, no?  Previously I suggested to compute estimate_num_insns
> for each BB of the original loop once so you can re-use that and
> just need to sum up BB costs.

Yep, makes sense.

> 
> 
> +static bool
> +find_all_unswitching_predicates (class loop *loop, basic_block *bbs,
> +                                bool true_edge,
> +                                unswitch_predicate *parent_predicate,
> +                                gimple_ranger *ranger,
> +                                vec<unswitch_predicate *> &candidates)
> +{
> ...
> 
> so you re-do the simplification step here, in fact this function seems
> to not only find unswitching predicates but apply simplifications
> from the previous unswitching step.  (yes, that's what the old code did ...)
> The costing step already did all the work though, it should be
> possible to keep a vector of stmts to simplify and the simplification
> from that analysis time.

Makes sense.

> 
> You can get at the "other loop copy" stmt via
> 
>    last_stmt (get_bb_copy (gimple_bb (cond)))
> 
> if you do that before free_original_copy_tables.  I think it would be
> nice to have the simplification step explicit before recursing
> even if that means some duplicate walking for now
> (I do hope we can share the find_all_unswitching_predicates
> analysis step)

Nods.

> 
> You are applying param_max_unswitch_insns with
> 
> +      unsigned cost
> +       = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);
> 
> -      cond = simplify_using_entry_checks (loop, cond);
> -      stmt = last_stmt (bbs[i]);
> -      if (integer_nonzerop (cond))
> +      if (cost <= (unsigned)param_max_unswitch_insns)
> 
> where cost is the sum of the simplified sizes of both loop copies after
> unswitching.  By its very nature this will match for all recursive invocations
> (loops only will get smaller) which is why we need param_max_unswitch_level.
> The idea was to have one param_max_unswitch_insns budget for each
> _original_ loop the recursive invocations will eat from, removing the need
> for param_max_unswitch_level.  I'd re-interpret param_max_unswitch_insns
> as the number of insns we may add, so with cost calculated as above we'd
> have
> 
>      if (cost <= budget)
> ...
> 
> and
> 
>      budget -= cost;
> 
> before the recursion (and pass by reference so that the two recursions
> share the budget).
> 
> Initialize by
> 
>    budget = original_loop_size + param_max_unswitch_insns;

Ah, ok, lemme change that.

> 
> Hope this wasn't too many comments ;)

It was, but they are all very useful.

Thanks,
Martin

> 
> Thanks,
> Richard.
> 
>> Thoughts?
>> Thanks,
>> Martin
>>



More information about the Gcc-patches mailing list