[PATCH] Unswitching outer loops.

Yuri Rumyantsev ysrumyan@gmail.com
Mon Oct 5 13:13:00 GMT 2015


Thanks Richard.
I'd like to answer on your last comment related to using of exit edge
argument for edge that skips loop.
Let's consider the following test-case:

#include <stdlib.h>
#define N 32
float *foo(int ustride, int size, float *src)
{
   float *buffer, *p;
   int i, k;

   if (!src)
    return NULL;

   buffer = (float *) malloc(N * size * sizeof(float));

   if(buffer)
      for(i=0, p=buffer; i<N; i++, src+=ustride)
for(k=0; k<size; k++)
 *p++ = src[k];

   return buffer;
}

Before adding new edge we have in post-header bb:
  <bb 9>:
  # _6 = PHI <0B(8), buffer_20(16)>
  return _6;

It is clear that we must preserve function semantic and transform it to
_6 = PHI <0B(12), buffer_19(9), buffer_19(4)>


2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>> hoist invariant guard if loop is empty without guard. Note that this
>> was added to loop unswitching pass with simple modifications - using
>> another loop iterator etc.
>>
>> Bootstrap and regression testing did not show any new failures.
>> What is your opinion?
>
> Overall it looks good.  Some comments below - a few more testcases would
> be nice as well.
>
> +  /* Loop must not be infinite.  */
> +  if (!finite_loop_p (loop))
> +    return false;
>
> why's that?
>
> +  body = get_loop_body_in_dom_order (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      if (body[i]->loop_father != loop)
> +       continue;
> +      if (!empty_bb_without_guard_p (loop, body[i]))
>
> I wonder if there is a better way to iterate over the interesting
> blocks and PHIs
> we need to check for side-effects (and thus we maybe can avoid gathering
> the loop in DOM order).
>
> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
> +       {
> +         if (may_be_used_outside
>
> may_be_used_outside can be hoisted above the loop.  I wonder if we can take
> advantage of loop-closed SSA form here (and the fact we have a single exit
> from the loop).  Iterating over exit dest PHIs and determining whether the
> exit edge DEF is inside the loop part it may not be should be enough.
>
> +  gcc_assert (single_succ_p (pre_header));
>
> that should be always true.
>
> +  gsi_remove (&gsi, false);
> +  bb = guard->dest;
> +  remove_edge (guard);
> +  /* Update dominance for destination of GUARD.  */
> +  if (EDGE_COUNT (bb->preds) == 0)
> +    {
> +      basic_block s_bb;
> +      gcc_assert (single_succ_p (bb));
> +      s_bb = single_succ (bb);
> +      delete_basic_block (bb);
> +      if (single_pred_p (s_bb))
> +       set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>
> all this massaging should be simplified by leaving it to CFG cleanup by
> simply adjusting the CONDs condition to always true/false.  There is
> gimple_cond_make_{true,false} () for this (would be nice to have a variant
> taking a bool).
>
> +  new_edge = make_edge (pre_header, exit->dest, flags);
> +  if (fix_dom_of_exit)
> +    set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
> +  update_stmt (gsi_stmt (gsi));
>
> the update_stmt should be not necessary, it's done by gsi_insert_after already.
>
> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
> +  bb = exit->dest;
> +  for (gphi_iterator gsi = gsi_start_phis (bb);
> +       !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gphi *phi = gsi.phi ();
> +      /* edge_iterator ei; */
> +      tree arg;
> +      if (virtual_operand_p (gimple_phi_result (phi)))
> +       {
> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
> +       }
> +      else
> +       {
> +         /* Use exit edge argument.  */
> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>
> Hum.  How is it ok to use the exit edge argument for the edge that skips
> the loop?  Why can't you always use the pre-header edge value?
> That is, if we have
>
>  for(i=0;i<m;++i)
>    {
>      if (n > 0)
>     {
>      for (;;)
>        {
>        }
>      }
>    }
>   ... = i;
>
> then i is used after the loop and the correct value to use if
> n > 0 is false is '0'.  Maybe this way we can also relax
> what check_exit_phi does?  IMHO the only restriction is
> if sth defined inside the loop before the header check for
> the inner loop is used after the loop.
>
> Thanks,
> Richard.
>
>> Thanks.
>>
>> ChangeLog:
>> 2015-09-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>> "cfghooks.h", add prototypes for introduced new functions.
>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>> on innermost loop check.
>> (tree_unswitch_single_loop): Add all required checks on ability of
>> loop unswitching under zero recursive level guard.
>> (tree_unswitch_outer_loop): New function.
>> (find_loop_guard): Likewise.
>> (empty_bb_without_guard_p): Likewise.
>> (used_outside_loop_p): Likewise.
>> (hoist_guard): Likewise.
>> (check_exit_phi): Likewise.
>>
>>    gcc/testsuite/ChangeLog:
>> * gcc.dg/loop-unswitch-2.c: New test.
>>
>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>> this hoisting in loop header copying.  Integrating it
>>> with LIM would be a better fit eventually.
>>>
>>> Note that we did agree to go forward with your original patch just
>>> making it more "generically" perform outer loop
>>> unswitching.  Did you explore that idea further?
>>>
>>>
>>>
>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Thanks Richard.
>>>>
>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>> consider the following test-case:
>>>> void foo(int *ie, int *je, double *x)
>>>> {
>>>>   int i, j;
>>>>   for (j=0; j<*je; ++j)
>>>>     for (i=0; i<*ie; ++i)
>>>>       x[i+j] = 0.0;
>>>> }
>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>> memref alias analysis since in general x and ie can alias (if their
>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>> non-zero for *ie out of loop using 238565 patch.
>>>>  The second concern is that proposed copy header algorithm changes
>>>> loop structure significantly and it is not accepted by vectorizer
>>>> since latch is not empty (such transformation assumes loop peeling for
>>>> one iteration. So I can propose to implement simple guard hoisting
>>>> without copying header and tail blocks (if it is possible).
>>>>
>>>> I will appreciate you for any advice or help since without such
>>>> hoisting we are not able to perform outer loop vectorization for
>>>> important benchmark.
>>>> and
>>>>
>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>> discovered thta it does not work properly.
>>>>>> So I wonder is it tested patch and it should work?
>>>>>
>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>>
>>>>>> Should it accept for hoisting the following loop nest
>>>>>>   for (i=0; i<n; i++) {
>>>>>>     s = 0;
>>>>>>     for (j=0; j<m; j++)
>>>>>>       s += a[i] * b[j];
>>>>>>     c[i] = s;
>>>>>>   }
>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>
>>>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of course
>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>> comparison with my.
>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>> vecctorization.
>>>>>>>
>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>> is sensible.
>>>>>>>
>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>>> won't be empty for taken branch.
>>>>>>>
>>>>>>> Yes, it does not perform unswitching but guard hoisting.  Note that this
>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>
>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>> other pass.
>>>>>>>
>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>> get rid of them?
>>>>>>>
>>>>>>>> What is your opinion?
>>>>>>>
>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>> depth of a loop nest to unswitch).
>>>>>>>
>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>> looks very much too specific.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>>> nest as
>>>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>>>   ...
>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>
>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>
>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>> {
>>>>>>>>>>   int i, j;
>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>> }
>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>
>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>> unknown state...)
>>>>>>>>> is available that does it more generally?  Also unswitching is quite
>>>>>>>>> expensive compared
>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>
>>>>>>>>> In your patch:
>>>>>>>>>
>>>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>>>> +    nloop->force_vectorize = true;
>>>>>>>>> +  if (loop->safelen != 0)
>>>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>>>
>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  Please just use
>>>>>>>>> copy_loop_info.
>>>>>>>>>
>>>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>>
>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>
>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>>> when we just fed a guard condition to loop_version.  Can't we statically
>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>>
>>>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>>> +     outer-loop.  */
>>>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>>>
>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>>> because it invokes itself recursively on unswitched loops).  But do you really
>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>     acc = 0;
>>>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>>>    }
>>>>>>>>>>>>
>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>>> to patch).
>>>>>>>>>>>>
>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>
>>>>>>>>>>> I think this is
>>>>>>>>>>>
>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>
>>>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>>>> back to that patch.
>>>>>>>>>>>
>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.



More information about the Gcc-patches mailing list