[PATCH] Unswitching outer loops.
Yuri Rumyantsev
ysrumyan@gmail.com
Tue Oct 6 11:41:00 GMT 2015
Richard,
Here is updated patch which reflects almost all your remarks:
1. Use ordinary get_loop_body.
2. Delete useless asserts.
3. Use check on iterated loop instead of finite_loop_p.
4. Do not update CFG by adjusting the CONDs condition to always true/false.
5. Add couple tests.
ChangeLog:
2015-10-06 Yuri Rumyantsev <ysrumyan@gmail.com>
* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: New test.
* gcc.dg/loop-unswitch-3.c: Likewise.
* gcc.dg/loop-unswitch-4.c: Likewise.
2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard.
>> I'd like to answer on your last comment related to using of exit edge
>> argument for edge that skips loop.
>> Let's consider the following test-case:
>>
>> #include <stdlib.h>
>> #define N 32
>> float *foo(int ustride, int size, float *src)
>> {
>> float *buffer, *p;
>> int i, k;
>>
>> if (!src)
>> return NULL;
>>
>> buffer = (float *) malloc(N * size * sizeof(float));
>>
>> if(buffer)
>> for(i=0, p=buffer; i<N; i++, src+=ustride)
>> for(k=0; k<size; k++)
>> *p++ = src[k];
>>
>> return buffer;
>> }
>>
>> Before adding new edge we have in post-header bb:
>> <bb 9>:
>> # _6 = PHI <0B(8), buffer_20(16)>
>> return _6;
>>
>> It is clear that we must preserve function semantic and transform it to
>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>
> Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop.
>
> Richard.
>
>>
>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>> was added to loop unswitching pass with simple modifications - using
>>>> another loop iterator etc.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>> What is your opinion?
>>>
>>> Overall it looks good. Some comments below - a few more testcases would
>>> be nice as well.
>>>
>>> + /* Loop must not be infinite. */
>>> + if (!finite_loop_p (loop))
>>> + return false;
>>>
>>> why's that?
>>>
>>> + body = get_loop_body_in_dom_order (loop);
>>> + for (i = 0; i < loop->num_nodes; i++)
>>> + {
>>> + if (body[i]->loop_father != loop)
>>> + continue;
>>> + if (!empty_bb_without_guard_p (loop, body[i]))
>>>
>>> I wonder if there is a better way to iterate over the interesting
>>> blocks and PHIs
>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>> the loop in DOM order).
>>>
>>> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>> + {
>>> + if (may_be_used_outside
>>>
>>> may_be_used_outside can be hoisted above the loop. I wonder if we can take
>>> advantage of loop-closed SSA form here (and the fact we have a single exit
>>> from the loop). Iterating over exit dest PHIs and determining whether the
>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>
>>> + gcc_assert (single_succ_p (pre_header));
>>>
>>> that should be always true.
>>>
>>> + gsi_remove (&gsi, false);
>>> + bb = guard->dest;
>>> + remove_edge (guard);
>>> + /* Update dominance for destination of GUARD. */
>>> + if (EDGE_COUNT (bb->preds) == 0)
>>> + {
>>> + basic_block s_bb;
>>> + gcc_assert (single_succ_p (bb));
>>> + s_bb = single_succ (bb);
>>> + delete_basic_block (bb);
>>> + if (single_pred_p (s_bb))
>>> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>>>
>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>> simply adjusting the CONDs condition to always true/false. There is
>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>>> taking a bool).
>>>
>>> + new_edge = make_edge (pre_header, exit->dest, flags);
>>> + if (fix_dom_of_exit)
>>> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>> + update_stmt (gsi_stmt (gsi));
>>>
>>> the update_stmt should be not necessary, it's done by gsi_insert_after already.
>>>
>>> + /* Add NEW_ADGE argument for all phi in post-header block. */
>>> + bb = exit->dest;
>>> + for (gphi_iterator gsi = gsi_start_phis (bb);
>>> + !gsi_end_p (gsi); gsi_next (&gsi))
>>> + {
>>> + gphi *phi = gsi.phi ();
>>> + /* edge_iterator ei; */
>>> + tree arg;
>>> + if (virtual_operand_p (gimple_phi_result (phi)))
>>> + {
>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>> + }
>>> + else
>>> + {
>>> + /* Use exit edge argument. */
>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>
>>> Hum. How is it ok to use the exit edge argument for the edge that skips
>>> the loop? Why can't you always use the pre-header edge value?
>>> That is, if we have
>>>
>>> for(i=0;i<m;++i)
>>> {
>>> if (n > 0)
>>> {
>>> for (;;)
>>> {
>>> }
>>> }
>>> }
>>> ... = i;
>>>
>>> then i is used after the loop and the correct value to use if
>>> n > 0 is false is '0'. Maybe this way we can also relax
>>> what check_exit_phi does? IMHO the only restriction is
>>> if sth defined inside the loop before the header check for
>>> the inner loop is used after the loop.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>>
>>>> ChangeLog:
>>>> 2015-09-30 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>
>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>> on innermost loop check.
>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>> loop unswitching under zero recursive level guard.
>>>> (tree_unswitch_outer_loop): New function.
>>>> (find_loop_guard): Likewise.
>>>> (empty_bb_without_guard_p): Likewise.
>>>> (used_outside_loop_p): Likewise.
>>>> (hoist_guard): Likewise.
>>>> (check_exit_phi): Likewise.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>
>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>> this hoisting in loop header copying. Integrating it
>>>>> with LIM would be a better fit eventually.
>>>>>
>>>>> Note that we did agree to go forward with your original patch just
>>>>> making it more "generically" perform outer loop
>>>>> unswitching. Did you explore that idea further?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard.
>>>>>>
>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>> consider the following test-case:
>>>>>> void foo(int *ie, int *je, double *x)
>>>>>> {
>>>>>> int i, j;
>>>>>> for (j=0; j<*je; ++j)
>>>>>> for (i=0; i<*ie; ++i)
>>>>>> x[i+j] = 0.0;
>>>>>> }
>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>> The second concern is that proposed copy header algorithm changes
>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>
>>>>>> I will appreciate you for any advice or help since without such
>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>> important benchmark.
>>>>>> and
>>>>>>
>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>> discovered thta it does not work properly.
>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>
>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>>>>
>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>> for (i=0; i<n; i++) {
>>>>>>>> s = 0;
>>>>>>>> for (j=0; j<m; j++)
>>>>>>>> s += a[i] * b[j];
>>>>>>>> c[i] = s;
>>>>>>>> }
>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>
>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course
>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>> comparison with my.
>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>> vecctorization.
>>>>>>>>>
>>>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>>>> is sensible.
>>>>>>>>>
>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>
>>>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this
>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>
>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>>>> other pass.
>>>>>>>>>
>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>> get rid of them?
>>>>>>>>>
>>>>>>>>>> What is your opinion?
>>>>>>>>>
>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>> your pattern-matching approach. I see nothing that should prevent
>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>
>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>> looks very much too specific.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>>>>> nest as
>>>>>>>>>>>> for (i=0; i<10; i++)
>>>>>>>>>>>> for (j=0;j<n;j++)
>>>>>>>>>>>> for (k=0;k<20;k++)
>>>>>>>>>>>> ...
>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>
>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>> {
>>>>>>>>>>>> int i, j;
>>>>>>>>>>>> for (j=0; j<*je; ++j)
>>>>>>>>>>>> for (i=0; i<*ie; ++i)
>>>>>>>>>>>> x[i+j] = 0.0;
>>>>>>>>>>>> }
>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>
>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>> unknown state...)
>>>>>>>>>>> is available that does it more generally? Also unswitching is quite
>>>>>>>>>>> expensive compared
>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>
>>>>>>>>>>> In your patch:
>>>>>>>>>>>
>>>>>>>>>>> + if (!nloop->force_vectorize)
>>>>>>>>>>> + nloop->force_vectorize = true;
>>>>>>>>>>> + if (loop->safelen != 0)
>>>>>>>>>>> + nloop->safelen = loop->safelen;
>>>>>>>>>>>
>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use
>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>
>>>>>>>>>>> + if (integer_nonzerop (cond_new))
>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>>>>> + else if (integer_zerop (cond_new))
>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>>>>
>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>
>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically
>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>>>>
>>>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>>>>> + outer-loop. */
>>>>>>>>>>> + cleanup_tree_cfg ();
>>>>>>>>>>>
>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really
>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>> acc = 0;
>>>>>>>>>>>>>> for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>> for (j = 0; j < n; j++)
>>>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>> acc <<= 1;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>
>>>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to
>>>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come
>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.update
Type: application/octet-stream
Size: 16578 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20151006/ca6f8b02/attachment.obj>
More information about the Gcc-patches
mailing list