This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR77848
Thanks, Richard! I'll follow up with these changes over the next day or
two. Appreciate all the help!
Bill
On Wed, 2016-11-16 at 16:08 +0100, Richard Biener wrote:
> On Tue, Nov 15, 2016 at 9:03 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Hi,
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848 identifies a situation
> > where if-conversion causes degradation when the if-converted loop is not
> > subsequently vectorized. The if-conversion pass does not have a cost
> > model to avoid such degradations. However, it does have a capability to
> > version the if-converted loop, so that the vectorizer can choose the
> > if-converted version if vectorization occurs, or the unmodified version
> > if vectorization does not occur. Currently versioning is only done under
> > special circumstances.
> >
> > This patch does two things: It requires loop versioning whenever loop
> > vectorization is enabled so that such degradations can't occur; and it
> > extends loop versioning to outer loops when such loops are of the right
> > form for outer loop vectorization. The latter is needed to avoid
> > introducing degradations with versioning of inner loops, which disturbs
> > the pattern that outer loop vectorization expects.
> >
> > This is an embarrassingly simple patch, given how much time I spent going
> > down other paths. The most surprising thing is that versioning the outer
> > loop doesn't require any additional handshaking with the vectorizer. It
> > just works. I've verified this on some examples, and we end up with the
> > correct vectorization and with the unused loop nest discarded.
> >
> > The one remaining problem with this bug is that it precludes SLP from
> > seeing if-converted loops to work on. With this patch, if the vectorizer
> > can't vectorize an if-converted loop, the original version survives. We
> > have one test case that fails when that happens, because it expected to
> > do SLP vectorization on the if-converted statements:
> >
> >> FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump-times slp1 "basic block vectorized" 1
> >> FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block vectorized" 1
> >
> > Arguably, this shows a deficiency in SLP vectorization, since it won't
> > see if-converted statements in non-loop code in any event. Eventually
> > SLP should learn to handle these kinds of PHI statements itself.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, with only the
> > specified regression. Is this ok for trunk?
>
> Thanks for working on this. Comments below.
>
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2016-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> >
> > PR tree-optimization/77848
> > * tree-if-conv.c (version_loop_for_if_conversion): When versioning
> > an outer loop, only save basic block aux information for the inner
> > loop.
> > (versionable_outer_loop_p): New function.
> > (tree_if_conversion): Always version a loop when vectorization
> > is enabled; version the outer loop instead of the inner one
> > if the pattern will be recognized for outer-loop vectorization.
> >
> > [gcc/testsuite]
> >
> > 2016-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> >
> > PR tree-optimization/77848
> > * gfortran.dg/vect/pr78848.f: New test.
> >
> >
> > Index: gcc/testsuite/gfortran.dg/vect/pr77848.f
> > ===================================================================
> > --- gcc/testsuite/gfortran.dg/vect/pr77848.f (revision 0)
> > +++ gcc/testsuite/gfortran.dg/vect/pr77848.f (working copy)
> > @@ -0,0 +1,24 @@
> > +! PR 77848: Verify versioning is on when vectorization fails
> > +! { dg-do compile }
> > +! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" }
> > +
> > + subroutine sub(x,a,n,m)
> > + implicit none
> > + real*8 x(*),a(*),atemp
> > + integer i,j,k,m,n
> > + real*8 s,t,u,v
> > + do j=1,m
> > + atemp=0.d0
> > + do i=1,n
> > + if (abs(a(i)).gt.atemp) then
> > + atemp=a(i)
> > + k = i
> > + end if
> > + enddo
> > + call dummy(atemp,k)
> > + enddo
> > + return
> > + end
> > +
> > +! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } }
> > +! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } }
> > Index: gcc/tree-if-conv.c
> > ===================================================================
> > --- gcc/tree-if-conv.c (revision 242412)
> > +++ gcc/tree-if-conv.c (working copy)
> > @@ -2533,6 +2533,7 @@ version_loop_for_if_conversion (struct loop *loop)
> > struct loop *new_loop;
> > gimple *g;
> > gimple_stmt_iterator gsi;
> > + unsigned int save_length;
> >
> > g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
> > build_int_cst (integer_type_node, loop->num),
> > @@ -2540,8 +2541,9 @@ version_loop_for_if_conversion (struct loop *loop)
> > gimple_call_set_lhs (g, cond);
> >
> > /* Save BB->aux around loop_version as that uses the same field. */
> > - void **saved_preds = XALLOCAVEC (void *, loop->num_nodes);
> > - for (unsigned i = 0; i < loop->num_nodes; i++)
> > + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> > + void **saved_preds = XALLOCAVEC (void *, save_length);
> > + for (unsigned i = 0; i < save_length; i++)
> > saved_preds[i] = ifc_bbs[i]->aux;
> >
> > initialize_original_copy_tables ();
> > @@ -2550,7 +2552,7 @@ version_loop_for_if_conversion (struct loop *loop)
> > REG_BR_PROB_BASE, true);
> > free_original_copy_tables ();
> >
> > - for (unsigned i = 0; i < loop->num_nodes; i++)
> > + for (unsigned i = 0; i < save_length; i++)
> > ifc_bbs[i]->aux = saved_preds[i];
> >
> > if (new_loop == NULL)
> > @@ -2565,6 +2567,40 @@ version_loop_for_if_conversion (struct loop *loop)
> > return true;
> > }
> >
> > +/* Return true when LOOP satisfies the follow conditions that will
> > + allow it to be recognized by the vectorizer for outer-loop
> > + vectorization:
> > + - The loop has exactly one inner loop.
> > + - The loop has a single exit.
> > + - The loop header has a single successor, which is the inner
> > + loop header.
> > + - The loop exit block has a single predecessor, which is the
> > + inner loop's exit block. */
> > +
> > +static bool
> > +versionable_outer_loop_p (struct loop *loop)
> > +{
> > + if (!loop->inner
> > + || loop->inner->next
> > + || !single_exit (loop)
> > + || !single_succ_p (loop->header)
> > + || single_succ (loop->header) != loop->inner->header)
> > + return false;
>
> not sure if the single_exit test would cover it but please add
>
> || ! loop_outer (loop)
>
> as a test that 'loop' isn't the root node of the loop tree.
>
> > +
> > + gcc_assert (single_pred_p (loop->latch));
>
> This assert is redundant.
>
> > + basic_block outer_exit = single_pred (loop->latch);
> > + gcc_assert (single_pred_p (loop->inner->latch));
>
> likewise (single_pred performs them).
>
> > + basic_block inner_exit = single_pred (loop->inner->latch);
> > +
> > + if (!single_pred_p (outer_exit) || single_pred (outer_exit) != inner_exit)
> > + return false;
> > +
> > + if (dump_file)
> > + fprintf (dump_file, "Found versionable outer loop\n");
>
> "Found vectorizable outer loop for versioning"
>
> > +
> > + return true;
> > +}
> > +
> > /* Performs splitting of critical edges. Skip splitting and return false
> > if LOOP will not be converted because:
> >
> > @@ -2767,8 +2803,16 @@ tree_if_conversion (struct loop *loop)
> > || loop->dont_vectorize))
> > goto cleanup;
> >
> > - if ((any_pred_load_store || any_complicated_phi)
> > - && !version_loop_for_if_conversion (loop))
> > + /* Since we have no cost model, always version loops if vectorization
> > + is enabled. Either version this loop, or if the pattern is right
> > + for outer-loop vectorization, version the outer loop. In the
> > + latter case we will still if-convert the original inner loop. */
> > + /* FIXME: When SLP vectorization can handle if-conversion on its own,
> > + predicate all of if-conversion on flag_tree_loop_vectorize. */
> > + if ((any_pred_load_store || any_complicated_phi || flag_tree_loop_vectorize)
>
> I'd say given fun->has_force_vectorize_loops this should be
>
> if (flag_tree_loop_if_convert != 1
> > + && !version_loop_for_if_conversion
> > + (versionable_outer_loop_p (loop_outer (loop))
> > + ? loop_outer (loop) : loop))
> > goto cleanup;
>
> and thus always version if the user didnt' specify -ftree-loop-if-convert
> (-ftree-loop-if-convert-stores is dead, I forgot to remove uses).
>
> Can you as a first patch (after fixing the minor things above) commit
> the patch w/o changing the condition under which we version
> (but _do_ version the outer loop if possible?). This should be a strict
> improvement enabling more outer loop vectorization.
>
> The 2nd patch can then fix the PR and change the condition.
>
> Thus, ok with the nits fixed and the condition unchanged.
>
> Can you re-test the 2nd part with my suggested changed predicate?
>
> Thanks,
> Richard.
>
> > /* Now all statements are if-convertible. Combine all the basic
> >
>