This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Unswitching outer loops.
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Yuri Rumyantsev <ysrumyan at gmail dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, Igor Zamyatin <izamyatin at gmail dot com>
- Date: Tue, 6 Oct 2015 14:21:03 +0200
- Subject: Re: [PATCH] Unswitching outer loops.
- Authentication-results: sourceware.org; auth=none
- References: <CAEoMCqRi+=wqTGyJaYa+vQiYcxTz8SQGCk1KDjHTX9Ws2+sp6w at mail dot gmail dot com> <CAFiYyc3P5Pzy+dbJp1_j6wo8NKJFQGJJP_tntt5vX74xwXo0cw at mail dot gmail dot com> <CAEoMCqTX7Jw+uLy9MzrsygBKNjLKk2hfZKrCV7exD9kCD1SWAg at mail dot gmail dot com> <CAFiYyc30yxVbLVYCLX2bEszOP7J00+FZWL8DVvw+Eb7-T63epw at mail dot gmail dot com> <CAEoMCqTBc5ZJ7OeOGKzt_VZiRU_aJbF39QmYNxF4F8eYJ415PA at mail dot gmail dot com> <CAFiYyc1_ejfLt8ef7JLe3Vwrs7WoTLHR2nD8JxkdNoXWJ_2miQ at mail dot gmail dot com> <CAEoMCqSorkh1WmFtVB_huC2hbcVr8uc1EYaRaNVe1g+5hVuzPw at mail dot gmail dot com> <CAFiYyc1nCCyF-4BH2hPWkKpmXnaQFQ34RMM5TTuHjZxZ25crrA at mail dot gmail dot com> <CAEoMCqSRsER9ZGgnX9eJgZJyN4EwkpxzWWk1FHRxWNiEW0HVCg at mail dot gmail dot com> <CAFiYyc2O9i690A0LZ0+jEOP8nkyz8Btc0YAb469aMgnRaVsmsQ at mail dot gmail dot com> <CAEoMCqQzyKtyT5N_LCsO3w9W65Tz4RcWXFp52ek=Z3r06AFdJQ at mail dot gmail dot com> <CAFiYyc1RS-Yjr=J__J3u5=nmbD+d9vxJgc198um8ZA_EHn-v2g at mail dot gmail dot com> <CAEoMCqRbAtXebLQMghQvUE8Sc2YiYH3KYEmhb_OcrCR47NBZoQ at mail dot gmail dot com> <CAFiYyc02z_P4YvuVY=vDifeRdOrFfszF6NLzHydYxn=DNnmG9w at mail dot gmail dot com> <CAEoMCqS5icR=0DA=vbYOVwZ633e46bXuhh-VuZT+QLpVw67JsA at mail dot gmail dot com>
On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Here is updated patch which reflects almost all your remarks:
> 1. Use ordinary get_loop_body.
> 2. Delete useless asserts.
> 3. Use check on iterated loop instead of finite_loop_p.
> 4. Do not update CFG by adjusting the CONDs condition to always true/false.
> 5. Add couple tests.
+ /* Add NEW_ADGE argument for all phi in post-header block. */
+ bb = exit->dest;
+ for (gphi_iterator gsi = gsi_start_phis (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gphi *phi = gsi.phi ();
+ /* edge_iterator ei; */
+ tree arg;
+ if (virtual_operand_p (gimple_phi_result (phi)))
+ {
+ arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
+ add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
now I know what confused me - here you are looking at a loop exit PHI
but querying with the preheader edge index. I think you need to walk
the loop header PHIs to find the PHI for the virtual operand and use that
to get the PHI arg from?
The side-effect / used-outside code is still the same. What matters
is side-effects outside of the loop-header protected code region, not
blocks excluding the inner loop. Say,
for (;;)
{
if (invariant-guard)
{
printf ("Blah");
for (;;)
;
}
}
would still ok to be unswitched. So instead of
+ if (body[i]->loop_father != loop)
+ continue;
it would be
if (dominated_by_p (CDI_DOMINATORS, body[i], header)
&& !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest))
with the obvious improvement to the patch to not only consider header checks
in the outer loop header but in the pre-header block of the inner loop.
And I still think you should walk the exit PHIs args to see whether they
are defined in the non-guarded region of the outer loop instead of walking
all uses of all defs.
Note that I think you miss endless loops as side-effects if that endless
loop occurs through a irreducible region (thus not reflected in the
loop tree). Thus you should reject BB_IRREDUCIBLE_LOOP blocks
in the non-guarded region as well.
It seems to me that protecting adjacent loops with a single guard is
also eligible for hoisting thus the restriction on loop->inner->next
should become a restriction on no loops (or irreducible regions)
in the non-guarded region.
Most things can be improved as followup, but at least the
virtual PHI arg thing needs to be sorted out.
Thanks,
Richard.
> ChangeLog:
> 2015-10-06 Yuri Rumyantsev <ysrumyan@gmail.com>
>
> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
> "cfghooks.h", add prototypes for introduced new functions.
> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
> checks on ability of loop unswitching to tree_unswitch_single_loop;
> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
> on innermost loop check.
> (tree_unswitch_single_loop): Add all required checks on ability of
> loop unswitching under zero recursive level guard.
> (tree_unswitch_outer_loop): New function.
> (find_loop_guard): Likewise.
> (empty_bb_without_guard_p): Likewise.
> (used_outside_loop_p): Likewise.
> (hoist_guard): Likewise.
> (check_exit_phi): Likewise.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/loop-unswitch-2.c: New test.
> * gcc.dg/loop-unswitch-3.c: Likewise.
> * gcc.dg/loop-unswitch-4.c: Likewise.
>
> 2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks Richard.
>>> I'd like to answer on your last comment related to using of exit edge
>>> argument for edge that skips loop.
>>> Let's consider the following test-case:
>>>
>>> #include <stdlib.h>
>>> #define N 32
>>> float *foo(int ustride, int size, float *src)
>>> {
>>> float *buffer, *p;
>>> int i, k;
>>>
>>> if (!src)
>>> return NULL;
>>>
>>> buffer = (float *) malloc(N * size * sizeof(float));
>>>
>>> if(buffer)
>>> for(i=0, p=buffer; i<N; i++, src+=ustride)
>>> for(k=0; k<size; k++)
>>> *p++ = src[k];
>>>
>>> return buffer;
>>> }
>>>
>>> Before adding new edge we have in post-header bb:
>>> <bb 9>:
>>> # _6 = PHI <0B(8), buffer_20(16)>
>>> return _6;
>>>
>>> It is clear that we must preserve function semantic and transform it to
>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>>
>> Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop.
>>
>> Richard.
>>
>>>
>>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>>> was added to loop unswitching pass with simple modifications - using
>>>>> another loop iterator etc.
>>>>>
>>>>> Bootstrap and regression testing did not show any new failures.
>>>>> What is your opinion?
>>>>
>>>> Overall it looks good. Some comments below - a few more testcases would
>>>> be nice as well.
>>>>
>>>> + /* Loop must not be infinite. */
>>>> + if (!finite_loop_p (loop))
>>>> + return false;
>>>>
>>>> why's that?
>>>>
>>>> + body = get_loop_body_in_dom_order (loop);
>>>> + for (i = 0; i < loop->num_nodes; i++)
>>>> + {
>>>> + if (body[i]->loop_father != loop)
>>>> + continue;
>>>> + if (!empty_bb_without_guard_p (loop, body[i]))
>>>>
>>>> I wonder if there is a better way to iterate over the interesting
>>>> blocks and PHIs
>>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>>> the loop in DOM order).
>>>>
>>>> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>>> + {
>>>> + if (may_be_used_outside
>>>>
>>>> may_be_used_outside can be hoisted above the loop. I wonder if we can take
>>>> advantage of loop-closed SSA form here (and the fact we have a single exit
>>>> from the loop). Iterating over exit dest PHIs and determining whether the
>>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>>
>>>> + gcc_assert (single_succ_p (pre_header));
>>>>
>>>> that should be always true.
>>>>
>>>> + gsi_remove (&gsi, false);
>>>> + bb = guard->dest;
>>>> + remove_edge (guard);
>>>> + /* Update dominance for destination of GUARD. */
>>>> + if (EDGE_COUNT (bb->preds) == 0)
>>>> + {
>>>> + basic_block s_bb;
>>>> + gcc_assert (single_succ_p (bb));
>>>> + s_bb = single_succ (bb);
>>>> + delete_basic_block (bb);
>>>> + if (single_pred_p (s_bb))
>>>> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>>>>
>>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>>> simply adjusting the CONDs condition to always true/false. There is
>>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>>>> taking a bool).
>>>>
>>>> + new_edge = make_edge (pre_header, exit->dest, flags);
>>>> + if (fix_dom_of_exit)
>>>> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>>> + update_stmt (gsi_stmt (gsi));
>>>>
>>>> the update_stmt should be not necessary, it's done by gsi_insert_after already.
>>>>
>>>> + /* Add NEW_ADGE argument for all phi in post-header block. */
>>>> + bb = exit->dest;
>>>> + for (gphi_iterator gsi = gsi_start_phis (bb);
>>>> + !gsi_end_p (gsi); gsi_next (&gsi))
>>>> + {
>>>> + gphi *phi = gsi.phi ();
>>>> + /* edge_iterator ei; */
>>>> + tree arg;
>>>> + if (virtual_operand_p (gimple_phi_result (phi)))
>>>> + {
>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>> + }
>>>> + else
>>>> + {
>>>> + /* Use exit edge argument. */
>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>
>>>> Hum. How is it ok to use the exit edge argument for the edge that skips
>>>> the loop? Why can't you always use the pre-header edge value?
>>>> That is, if we have
>>>>
>>>> for(i=0;i<m;++i)
>>>> {
>>>> if (n > 0)
>>>> {
>>>> for (;;)
>>>> {
>>>> }
>>>> }
>>>> }
>>>> ... = i;
>>>>
>>>> then i is used after the loop and the correct value to use if
>>>> n > 0 is false is '0'. Maybe this way we can also relax
>>>> what check_exit_phi does? IMHO the only restriction is
>>>> if sth defined inside the loop before the header check for
>>>> the inner loop is used after the loop.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>>
>>>>> ChangeLog:
>>>>> 2015-09-30 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>>> on innermost loop check.
>>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>>> loop unswitching under zero recursive level guard.
>>>>> (tree_unswitch_outer_loop): New function.
>>>>> (find_loop_guard): Likewise.
>>>>> (empty_bb_without_guard_p): Likewise.
>>>>> (used_outside_loop_p): Likewise.
>>>>> (hoist_guard): Likewise.
>>>>> (check_exit_phi): Likewise.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>>
>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>>> this hoisting in loop header copying. Integrating it
>>>>>> with LIM would be a better fit eventually.
>>>>>>
>>>>>> Note that we did agree to go forward with your original patch just
>>>>>> making it more "generically" perform outer loop
>>>>>> unswitching. Did you explore that idea further?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Thanks Richard.
>>>>>>>
>>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>>> consider the following test-case:
>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>> {
>>>>>>> int i, j;
>>>>>>> for (j=0; j<*je; ++j)
>>>>>>> for (i=0; i<*ie; ++i)
>>>>>>> x[i+j] = 0.0;
>>>>>>> }
>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>>> The second concern is that proposed copy header algorithm changes
>>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>>
>>>>>>> I will appreciate you for any advice or help since without such
>>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>>> important benchmark.
>>>>>>> and
>>>>>>>
>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>>> discovered thta it does not work properly.
>>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>>
>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>>>>>
>>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>>> for (i=0; i<n; i++) {
>>>>>>>>> s = 0;
>>>>>>>>> for (j=0; j<m; j++)
>>>>>>>>> s += a[i] * b[j];
>>>>>>>>> c[i] = s;
>>>>>>>>> }
>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>>
>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course
>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>
>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>>> comparison with my.
>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>>> vecctorization.
>>>>>>>>>>
>>>>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>>>>> is sensible.
>>>>>>>>>>
>>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>>
>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this
>>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>>
>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>>>>> other pass.
>>>>>>>>>>
>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>>> get rid of them?
>>>>>>>>>>
>>>>>>>>>>> What is your opinion?
>>>>>>>>>>
>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>>> your pattern-matching approach. I see nothing that should prevent
>>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>>
>>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>>> looks very much too specific.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>>>>>> nest as
>>>>>>>>>>>>> for (i=0; i<10; i++)
>>>>>>>>>>>>> for (j=0;j<n;j++)
>>>>>>>>>>>>> for (k=0;k<20;k++)
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>>
>>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>>> {
>>>>>>>>>>>>> int i, j;
>>>>>>>>>>>>> for (j=0; j<*je; ++j)
>>>>>>>>>>>>> for (i=0; i<*ie; ++i)
>>>>>>>>>>>>> x[i+j] = 0.0;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>>
>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>>> unknown state...)
>>>>>>>>>>>> is available that does it more generally? Also unswitching is quite
>>>>>>>>>>>> expensive compared
>>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>>
>>>>>>>>>>>> In your patch:
>>>>>>>>>>>>
>>>>>>>>>>>> + if (!nloop->force_vectorize)
>>>>>>>>>>>> + nloop->force_vectorize = true;
>>>>>>>>>>>> + if (loop->safelen != 0)
>>>>>>>>>>>> + nloop->safelen = loop->safelen;
>>>>>>>>>>>>
>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use
>>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>>
>>>>>>>>>>>> + if (integer_nonzerop (cond_new))
>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>>>>>> + else if (integer_zerop (cond_new))
>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>>>>>
>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>>
>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically
>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>>>>>
>>>>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>>>>>> + outer-loop. */
>>>>>>>>>>>> + cleanup_tree_cfg ();
>>>>>>>>>>>>
>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really
>>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>>> acc = 0;
>>>>>>>>>>>>>>> for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>>> for (j = 0; j < n; j++)
>>>>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>>> acc <<= 1;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to
>>>>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come
>>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.