This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unswitching outer loops.


On Wed, Oct 7, 2015 at 5:26 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I noticed that 'gimple' type was changed and send you updated patch.

Ok.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-10-07 12:53 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Richard,
>>
>> I've fixed adding virtual phi argument and add check on irreducible basic block.
>> New patch is attached.
>>
>> I checked it for bootstrap and regression testing, no new failures.
>>
>> ChangeLog:
>> 2015-10-07  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>> "cfghooks.h", add prototypes for introduced new functions.
>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>> on innermost loop check.
>> (tree_unswitch_single_loop): Add all required checks on ability of
>> loop unswitching under zero recursive level guard.
>> (tree_unswitch_outer_loop): New function.
>> (find_loop_guard): Likewise.
>> (empty_bb_without_guard_p): Likewise.
>> (used_outside_loop_p): Likewise.
>> (get_vop_from_header): Likewise.
>> (hoist_guard): Likewise.
>> (check_exit_phi): Likewise.
>>
>>    gcc/testsuite/ChangeLog:
>> * gcc.dg/loop-unswitch-2.c: New test.
>> * gcc.dg/loop-unswitch-3.c: Likewise.
>> * gcc.dg/loop-unswitch-4.c: Likewise.
>>
>>
>> 2015-10-06 15:21 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is updated patch which reflects almost all your remarks:
>>>> 1. Use ordinary get_loop_body.
>>>> 2. Delete useless asserts.
>>>> 3. Use check on iterated loop instead of finite_loop_p.
>>>> 4. Do not update CFG by adjusting the CONDs condition to always true/false.
>>>> 5. Add couple tests.
>>>
>>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>>> +  bb = exit->dest;
>>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>>> +    {
>>> +      gphi *phi = gsi.phi ();
>>> +      /* edge_iterator ei; */
>>> +      tree arg;
>>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>>> +       {
>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>
>>> now I know what confused me - here you are looking at a loop exit PHI
>>> but querying with the preheader edge index.  I think you need to walk
>>> the loop header PHIs to find the PHI for the virtual operand and use that
>>> to get the PHI arg from?
>>>
>>> The side-effect / used-outside code is still the same.  What matters
>>> is side-effects outside of the loop-header protected code region, not
>>> blocks excluding the inner loop.  Say,
>>>
>>>   for (;;)
>>>     {
>>>       if (invariant-guard)
>>>         {
>>>            printf ("Blah");
>>>            for (;;)
>>>              ;
>>>         }
>>>     }
>>>
>>> would still ok to be unswitched.  So instead of
>>>
>>> +      if (body[i]->loop_father != loop)
>>> +       continue;
>>>
>>> it would be
>>>
>>>        if (dominated_by_p (CDI_DOMINATORS, body[i], header)
>>>            && !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest))
>>>
>>> with the obvious improvement to the patch to not only consider header checks
>>> in the outer loop header but in the pre-header block of the inner loop.
>>>
>>> And I still think you should walk the exit PHIs args to see whether they
>>> are defined in the non-guarded region of the outer loop instead of walking
>>> all uses of all defs.
>>>
>>> Note that I think you miss endless loops as side-effects if that endless
>>> loop occurs through a irreducible region (thus not reflected in the
>>> loop tree).  Thus you should reject BB_IRREDUCIBLE_LOOP blocks
>>> in the non-guarded region as well.
>>>
>>> It seems to me that protecting adjacent loops with a single guard is
>>> also eligible for hoisting thus the restriction on loop->inner->next
>>> should become a restriction on no loops (or irreducible regions)
>>> in the non-guarded region.
>>>
>>> Most things can be improved as followup, but at least the
>>> virtual PHI arg thing needs to be sorted out.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
>>>> ChangeLog:
>>>> 2015-10-06  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>> on innermost loop check.
>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>> loop unswitching under zero recursive level guard.
>>>> (tree_unswitch_outer_loop): New function.
>>>> (find_loop_guard): Likewise.
>>>> (empty_bb_without_guard_p): Likewise.
>>>> (used_outside_loop_p): Likewise.
>>>> (hoist_guard): Likewise.
>>>> (check_exit_phi): Likewise.
>>>>
>>>>    gcc/testsuite/ChangeLog:
>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>> * gcc.dg/loop-unswitch-3.c: Likewise.
>>>> * gcc.dg/loop-unswitch-4.c: Likewise.
>>>>
>>>> 2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard.
>>>>>> I'd like to answer on your last comment related to using of exit edge
>>>>>> argument for edge that skips loop.
>>>>>> Let's consider the following test-case:
>>>>>>
>>>>>> #include <stdlib.h>
>>>>>> #define N 32
>>>>>> float *foo(int ustride, int size, float *src)
>>>>>> {
>>>>>>    float *buffer, *p;
>>>>>>    int i, k;
>>>>>>
>>>>>>    if (!src)
>>>>>>     return NULL;
>>>>>>
>>>>>>    buffer = (float *) malloc(N * size * sizeof(float));
>>>>>>
>>>>>>    if(buffer)
>>>>>>       for(i=0, p=buffer; i<N; i++, src+=ustride)
>>>>>> for(k=0; k<size; k++)
>>>>>>  *p++ = src[k];
>>>>>>
>>>>>>    return buffer;
>>>>>> }
>>>>>>
>>>>>> Before adding new edge we have in post-header bb:
>>>>>>   <bb 9>:
>>>>>>   # _6 = PHI <0B(8), buffer_20(16)>
>>>>>>   return _6;
>>>>>>
>>>>>> It is clear that we must preserve function semantic and transform it to
>>>>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>>>>>
>>>>> Ah, yeah.  I was confusing the loop exit of the inner vs. the outer loop.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>>>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>>>>>> was added to loop unswitching pass with simple modifications - using
>>>>>>>> another loop iterator etc.
>>>>>>>>
>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>> What is your opinion?
>>>>>>>
>>>>>>> Overall it looks good.  Some comments below - a few more testcases would
>>>>>>> be nice as well.
>>>>>>>
>>>>>>> +  /* Loop must not be infinite.  */
>>>>>>> +  if (!finite_loop_p (loop))
>>>>>>> +    return false;
>>>>>>>
>>>>>>> why's that?
>>>>>>>
>>>>>>> +  body = get_loop_body_in_dom_order (loop);
>>>>>>> +  for (i = 0; i < loop->num_nodes; i++)
>>>>>>> +    {
>>>>>>> +      if (body[i]->loop_father != loop)
>>>>>>> +       continue;
>>>>>>> +      if (!empty_bb_without_guard_p (loop, body[i]))
>>>>>>>
>>>>>>> I wonder if there is a better way to iterate over the interesting
>>>>>>> blocks and PHIs
>>>>>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>>>>>> the loop in DOM order).
>>>>>>>
>>>>>>> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>>>>>> +       {
>>>>>>> +         if (may_be_used_outside
>>>>>>>
>>>>>>> may_be_used_outside can be hoisted above the loop.  I wonder if we can take
>>>>>>> advantage of loop-closed SSA form here (and the fact we have a single exit
>>>>>>> from the loop).  Iterating over exit dest PHIs and determining whether the
>>>>>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>>>>>
>>>>>>> +  gcc_assert (single_succ_p (pre_header));
>>>>>>>
>>>>>>> that should be always true.
>>>>>>>
>>>>>>> +  gsi_remove (&gsi, false);
>>>>>>> +  bb = guard->dest;
>>>>>>> +  remove_edge (guard);
>>>>>>> +  /* Update dominance for destination of GUARD.  */
>>>>>>> +  if (EDGE_COUNT (bb->preds) == 0)
>>>>>>> +    {
>>>>>>> +      basic_block s_bb;
>>>>>>> +      gcc_assert (single_succ_p (bb));
>>>>>>> +      s_bb = single_succ (bb);
>>>>>>> +      delete_basic_block (bb);
>>>>>>> +      if (single_pred_p (s_bb))
>>>>>>> +       set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>>>>>>>
>>>>>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>>>>>> simply adjusting the CONDs condition to always true/false.  There is
>>>>>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>>>>>>> taking a bool).
>>>>>>>
>>>>>>> +  new_edge = make_edge (pre_header, exit->dest, flags);
>>>>>>> +  if (fix_dom_of_exit)
>>>>>>> +    set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>>>>>> +  update_stmt (gsi_stmt (gsi));
>>>>>>>
>>>>>>> the update_stmt should be not necessary, it's done by gsi_insert_after already.
>>>>>>>
>>>>>>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>>>>>>> +  bb = exit->dest;
>>>>>>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>>>>>>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>> +    {
>>>>>>> +      gphi *phi = gsi.phi ();
>>>>>>> +      /* edge_iterator ei; */
>>>>>>> +      tree arg;
>>>>>>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>>>>>>> +       {
>>>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>>>> +       }
>>>>>>> +      else
>>>>>>> +       {
>>>>>>> +         /* Use exit edge argument.  */
>>>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>>>>
>>>>>>> Hum.  How is it ok to use the exit edge argument for the edge that skips
>>>>>>> the loop?  Why can't you always use the pre-header edge value?
>>>>>>> That is, if we have
>>>>>>>
>>>>>>>  for(i=0;i<m;++i)
>>>>>>>    {
>>>>>>>      if (n > 0)
>>>>>>>     {
>>>>>>>      for (;;)
>>>>>>>        {
>>>>>>>        }
>>>>>>>      }
>>>>>>>    }
>>>>>>>   ... = i;
>>>>>>>
>>>>>>> then i is used after the loop and the correct value to use if
>>>>>>> n > 0 is false is '0'.  Maybe this way we can also relax
>>>>>>> what check_exit_phi does?  IMHO the only restriction is
>>>>>>> if sth defined inside the loop before the header check for
>>>>>>> the inner loop is used after the loop.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>> 2015-09-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>>>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>>>>>> on innermost loop check.
>>>>>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>>>>>> loop unswitching under zero recursive level guard.
>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>> (find_loop_guard): Likewise.
>>>>>>>> (empty_bb_without_guard_p): Likewise.
>>>>>>>> (used_outside_loop_p): Likewise.
>>>>>>>> (hoist_guard): Likewise.
>>>>>>>> (check_exit_phi): Likewise.
>>>>>>>>
>>>>>>>>    gcc/testsuite/ChangeLog:
>>>>>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>>>>>
>>>>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>>>>>> this hoisting in loop header copying.  Integrating it
>>>>>>>>> with LIM would be a better fit eventually.
>>>>>>>>>
>>>>>>>>> Note that we did agree to go forward with your original patch just
>>>>>>>>> making it more "generically" perform outer loop
>>>>>>>>> unswitching.  Did you explore that idea further?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Thanks Richard.
>>>>>>>>>>
>>>>>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>>>>>> consider the following test-case:
>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>> {
>>>>>>>>>>   int i, j;
>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>> }
>>>>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>>>>>>  The second concern is that proposed copy header algorithm changes
>>>>>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>>>>>
>>>>>>>>>> I will appreciate you for any advice or help since without such
>>>>>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>>>>>> important benchmark.
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>>>>>> discovered thta it does not work properly.
>>>>>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>>>>>
>>>>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>>>>>>>>
>>>>>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>>>>>>   for (i=0; i<n; i++) {
>>>>>>>>>>>>     s = 0;
>>>>>>>>>>>>     for (j=0; j<m; j++)
>>>>>>>>>>>>       s += a[i] * b[j];
>>>>>>>>>>>>     c[i] = s;
>>>>>>>>>>>>   }
>>>>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>>>>>
>>>>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of course
>>>>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>>>>>> comparison with my.
>>>>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>>>>>> vecctorization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>>>>>>>> is sensible.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting.  Note that this
>>>>>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>>>>>>>> other pass.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>>>>>> get rid of them?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> What is your opinion?
>>>>>>>>>>>>>
>>>>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>>>>>> looks very much too specific.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>>>>>>>>> nest as
>>>>>>>>>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   int i, j;
>>>>>>>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>>>>>> unknown state...)
>>>>>>>>>>>>>>> is available that does it more generally?  Also unswitching is quite
>>>>>>>>>>>>>>> expensive compared
>>>>>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In your patch:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>>>>>>>>>> +    nloop->force_vectorize = true;
>>>>>>>>>>>>>>> +  if (loop->safelen != 0)
>>>>>>>>>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  Please just use
>>>>>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>>>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>>>>>>>>> when we just fed a guard condition to loop_version.  Can't we statically
>>>>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>>>>>>>>> +     outer-loop.  */
>>>>>>>>>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>>>>>>>>> because it invokes itself recursively on unswitched loops).  But do you really
>>>>>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>>>>>>     acc = 0;
>>>>>>>>>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]