Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Wed Jul 31 07:25:00 GMT 2019

Thanks for these comments.

Feng

________________________________________
From: Michael Matz <matz@suse.de>
Sent: Tuesday, July 30, 2019 1:59:04 AM
To: Feng Xue OS
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building
blocks seem to be sound.  But I dislike it being a separate pass, so
please integrate the code you have written into the existing loop split
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses
and transforms loops in one go, you're doing it separately.  That
separation makes sense for you, so the existing code should be changed to
also do that separately.  Some info for the existing loop-split analysis
needs to be stored in the info struct then as well, which can be done if
you add some fields in yours.  Some splitting-out of code from the
existing main driver is probably needed (basically the parts that
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call
loop_version) also have an overlap, so maybe something more can be
commonized between the two (ultimately the way of splitting in both
variants is different, so somewhere they'll do something else, but still
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the
diagram or such), so don't just replicate the short info from the
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>          "The maximum number of unswitchings in a single loop.",
>          3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition "
> +       "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
...
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;

My gut feeling is surprised by this.  So one of the predecessors of
branch_bb dominates it.  Why should that indicate that branch_bb
can be safely removed?

Think about something like this:

   esrc --> cond_bb --> branch_bb
    '-------------------^

(cond_bb is the callers bb of the cond statement in question).  Now esrc
dominates branch_bb but still you can't simply remove it, even if
the cond_bb->branch_bb edge becomes unexecutable.

> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

Please return an edge here, not an edge index (which removes the using of
-1).  I think the name (and comment) should consistently talk about
semi-invariant, not invariant.  For when you really need an edge index
later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't
really need it, e.g. instead of using the gimple pass-local-flag on a
statement you can just as well also store the edge in your info structure.

> +/* Given a conditional statement in LOOP, whose basic block is COND_BB,
> +   suppose its execution only goes through one of its branch, whose index is
> +   specified by BRANCH.  Return TRUE if this statement still executes multiple
> +   times in one iteration of LOOP, in that the statement belongs a nested
> +   unrecognized loop.  */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                       int branch)

With above change in get_cond_invariant_branch, this also should
take an edge, not a bb+edge-index.

> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH) of a conditional statement whose
> +   basic block is COND_BB.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                        int branch)

This should take an edge as well.

> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +       continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

Replace the loop by
  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);

> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement COND in LOOP.  */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return false;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

Operator should be first on next line, not last on previous line.

> +    return false;
> +
> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;

This part (as well as is_cond_in_hidden_loop implementation) had me
confused for a while, because "unrecognized" loops seems strange.  I think
I now know what you try to do here, but I wonder if there's an easier way,
or at least about which situations you stumbled into that made you write
this code.

> +
> +  /* Temporarily keep branch index in conditional statement.  */
> +  gimple_set_plf (cond, GF_PLF_1, branch);

i.e. here, store the edge in your info structure.

> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)

So, from here on the code should be integrated into the existing code
of the file (which might need changes as well for this integration to look
good).

That's it for now.

Ciao,
Michael.