[PATCH] Unswitching outer loops.

Yuri Rumyantsev ysrumyan@gmail.com
Tue Oct 6 11:41:00 GMT 2015


Richard,

Here is updated patch which reflects almost all your remarks:
1. Use ordinary get_loop_body.
2. Delete useless asserts.
3. Use check on iterated loop instead of finite_loop_p.
4. Do not update CFG by adjusting the CONDs condition to always true/false.
5. Add couple tests.

ChangeLog:
2015-10-06  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.

   gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: New test.
* gcc.dg/loop-unswitch-3.c: Likewise.
* gcc.dg/loop-unswitch-4.c: Likewise.

2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard.
>> I'd like to answer on your last comment related to using of exit edge
>> argument for edge that skips loop.
>> Let's consider the following test-case:
>>
>> #include <stdlib.h>
>> #define N 32
>> float *foo(int ustride, int size, float *src)
>> {
>>    float *buffer, *p;
>>    int i, k;
>>
>>    if (!src)
>>     return NULL;
>>
>>    buffer = (float *) malloc(N * size * sizeof(float));
>>
>>    if(buffer)
>>       for(i=0, p=buffer; i<N; i++, src+=ustride)
>> for(k=0; k<size; k++)
>>  *p++ = src[k];
>>
>>    return buffer;
>> }
>>
>> Before adding new edge we have in post-header bb:
>>   <bb 9>:
>>   # _6 = PHI <0B(8), buffer_20(16)>
>>   return _6;
>>
>> It is clear that we must preserve function semantic and transform it to
>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>
> Ah, yeah.  I was confusing the loop exit of the inner vs. the outer loop.
>
> Richard.
>
>>
>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>> was added to loop unswitching pass with simple modifications - using
>>>> another loop iterator etc.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>> What is your opinion?
>>>
>>> Overall it looks good.  Some comments below - a few more testcases would
>>> be nice as well.
>>>
>>> +  /* Loop must not be infinite.  */
>>> +  if (!finite_loop_p (loop))
>>> +    return false;
>>>
>>> why's that?
>>>
>>> +  body = get_loop_body_in_dom_order (loop);
>>> +  for (i = 0; i < loop->num_nodes; i++)
>>> +    {
>>> +      if (body[i]->loop_father != loop)
>>> +       continue;
>>> +      if (!empty_bb_without_guard_p (loop, body[i]))
>>>
>>> I wonder if there is a better way to iterate over the interesting
>>> blocks and PHIs
>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>> the loop in DOM order).
>>>
>>> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>> +       {
>>> +         if (may_be_used_outside
>>>
>>> may_be_used_outside can be hoisted above the loop.  I wonder if we can take
>>> advantage of loop-closed SSA form here (and the fact we have a single exit
>>> from the loop).  Iterating over exit dest PHIs and determining whether the
>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>
>>> +  gcc_assert (single_succ_p (pre_header));
>>>
>>> that should be always true.
>>>
>>> +  gsi_remove (&gsi, false);
>>> +  bb = guard->dest;
>>> +  remove_edge (guard);
>>> +  /* Update dominance for destination of GUARD.  */
>>> +  if (EDGE_COUNT (bb->preds) == 0)
>>> +    {
>>> +      basic_block s_bb;
>>> +      gcc_assert (single_succ_p (bb));
>>> +      s_bb = single_succ (bb);
>>> +      delete_basic_block (bb);
>>> +      if (single_pred_p (s_bb))
>>> +       set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>>>
>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>> simply adjusting the CONDs condition to always true/false.  There is
>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>>> taking a bool).
>>>
>>> +  new_edge = make_edge (pre_header, exit->dest, flags);
>>> +  if (fix_dom_of_exit)
>>> +    set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>> +  update_stmt (gsi_stmt (gsi));
>>>
>>> the update_stmt should be not necessary, it's done by gsi_insert_after already.
>>>
>>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>>> +  bb = exit->dest;
>>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>>> +    {
>>> +      gphi *phi = gsi.phi ();
>>> +      /* edge_iterator ei; */
>>> +      tree arg;
>>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>>> +       {
>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>> +       }
>>> +      else
>>> +       {
>>> +         /* Use exit edge argument.  */
>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>
>>> Hum.  How is it ok to use the exit edge argument for the edge that skips
>>> the loop?  Why can't you always use the pre-header edge value?
>>> That is, if we have
>>>
>>>  for(i=0;i<m;++i)
>>>    {
>>>      if (n > 0)
>>>     {
>>>      for (;;)
>>>        {
>>>        }
>>>      }
>>>    }
>>>   ... = i;
>>>
>>> then i is used after the loop and the correct value to use if
>>> n > 0 is false is '0'.  Maybe this way we can also relax
>>> what check_exit_phi does?  IMHO the only restriction is
>>> if sth defined inside the loop before the header check for
>>> the inner loop is used after the loop.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>>
>>>> ChangeLog:
>>>> 2015-09-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>> on innermost loop check.
>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>> loop unswitching under zero recursive level guard.
>>>> (tree_unswitch_outer_loop): New function.
>>>> (find_loop_guard): Likewise.
>>>> (empty_bb_without_guard_p): Likewise.
>>>> (used_outside_loop_p): Likewise.
>>>> (hoist_guard): Likewise.
>>>> (check_exit_phi): Likewise.
>>>>
>>>>    gcc/testsuite/ChangeLog:
>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>
>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>> this hoisting in loop header copying.  Integrating it
>>>>> with LIM would be a better fit eventually.
>>>>>
>>>>> Note that we did agree to go forward with your original patch just
>>>>> making it more "generically" perform outer loop
>>>>> unswitching.  Did you explore that idea further?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard.
>>>>>>
>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>> consider the following test-case:
>>>>>> void foo(int *ie, int *je, double *x)
>>>>>> {
>>>>>>   int i, j;
>>>>>>   for (j=0; j<*je; ++j)
>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>       x[i+j] = 0.0;
>>>>>> }
>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>>  The second concern is that proposed copy header algorithm changes
>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>
>>>>>> I will appreciate you for any advice or help since without such
>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>> important benchmark.
>>>>>> and
>>>>>>
>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>> discovered thta it does not work properly.
>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>
>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>>>>
>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>>   for (i=0; i<n; i++) {
>>>>>>>>     s = 0;
>>>>>>>>     for (j=0; j<m; j++)
>>>>>>>>       s += a[i] * b[j];
>>>>>>>>     c[i] = s;
>>>>>>>>   }
>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>
>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of course
>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>> comparison with my.
>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>> vecctorization.
>>>>>>>>>
>>>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>>>> is sensible.
>>>>>>>>>
>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>
>>>>>>>>> Yes, it does not perform unswitching but guard hoisting.  Note that this
>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>
>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>>>> other pass.
>>>>>>>>>
>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>> get rid of them?
>>>>>>>>>
>>>>>>>>>> What is your opinion?
>>>>>>>>>
>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>
>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>> looks very much too specific.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>>>>> nest as
>>>>>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>>>>>   ...
>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>
>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>> {
>>>>>>>>>>>>   int i, j;
>>>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>>>> }
>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>
>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>> unknown state...)
>>>>>>>>>>> is available that does it more generally?  Also unswitching is quite
>>>>>>>>>>> expensive compared
>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>
>>>>>>>>>>> In your patch:
>>>>>>>>>>>
>>>>>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>>>>>> +    nloop->force_vectorize = true;
>>>>>>>>>>> +  if (loop->safelen != 0)
>>>>>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>>>>>
>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  Please just use
>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>
>>>>>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>>>>
>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>
>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>>>>> when we just fed a guard condition to loop_version.  Can't we statically
>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>>>>
>>>>>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>>>>> +     outer-loop.  */
>>>>>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>>>>>
>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>>>>> because it invokes itself recursively on unswitched loops).  But do you really
>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>>     acc = 0;
>>>>>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>
>>>>>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.update
Type: application/octet-stream
Size: 16578 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151006/ca6f8b02/attachment.obj>


More information about the Gcc-patches mailing list