This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR77848


Thanks, Richard!  I'll follow up with these changes over the next day or
two.  Appreciate all the help!

Bill

On Wed, 2016-11-16 at 16:08 +0100, Richard Biener wrote:
> On Tue, Nov 15, 2016 at 9:03 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Hi,
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848 identifies a situation
> > where if-conversion causes degradation when the if-converted loop is not
> > subsequently vectorized.  The if-conversion pass does not have a cost
> > model to avoid such degradations.  However, it does have a capability to
> > version the if-converted loop, so that the vectorizer can choose the
> > if-converted version if vectorization occurs, or the unmodified version
> > if vectorization does not occur.  Currently versioning is only done under
> > special circumstances.
> >
> > This patch does two things:  It requires loop versioning whenever loop
> > vectorization is enabled so that such degradations can't occur; and it
> > extends loop versioning to outer loops when such loops are of the right
> > form for outer loop vectorization.  The latter is needed to avoid
> > introducing degradations with versioning of inner loops, which disturbs
> > the pattern that outer loop vectorization expects.
> >
> > This is an embarrassingly simple patch, given how much time I spent going
> > down other paths.  The most surprising thing is that versioning the outer
> > loop doesn't require any additional handshaking with the vectorizer.  It
> > just works.  I've verified this on some examples, and we end up with the
> > correct vectorization and with the unused loop nest discarded.
> >
> > The one remaining problem with this bug is that it precludes SLP from
> > seeing if-converted loops to work on.  With this patch, if the vectorizer
> > can't vectorize an if-converted loop, the original version survives.  We
> > have one test case that fails when that happens, because it expected to
> > do SLP vectorization on the if-converted statements:
> >
> >> FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump-times slp1 "basic block vectorized" 1
> >> FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block vectorized" 1
> >
> > Arguably, this shows a deficiency in SLP vectorization, since it won't
> > see if-converted statements in non-loop code in any event.  Eventually
> > SLP should learn to handle these kinds of PHI statements itself.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, with only the
> > specified regression.  Is this ok for trunk?
> 
> Thanks for working on this.  Comments below.
> 
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2016-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >         PR tree-optimization/77848
> >         * tree-if-conv.c (version_loop_for_if_conversion): When versioning
> >         an outer loop, only save basic block aux information for the inner
> >         loop.
> >         (versionable_outer_loop_p): New function.
> >         (tree_if_conversion): Always version a loop when vectorization
> >         is enabled; version the outer loop instead of the inner one
> >         if the pattern will be recognized for outer-loop vectorization.
> >
> > [gcc/testsuite]
> >
> > 2016-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >         PR tree-optimization/77848
> >         * gfortran.dg/vect/pr78848.f: New test.
> >
> >
> > Index: gcc/testsuite/gfortran.dg/vect/pr77848.f
> > ===================================================================
> > --- gcc/testsuite/gfortran.dg/vect/pr77848.f    (revision 0)
> > +++ gcc/testsuite/gfortran.dg/vect/pr77848.f    (working copy)
> > @@ -0,0 +1,24 @@
> > +! PR 77848: Verify versioning is on when vectorization fails
> > +! { dg-do compile }
> > +! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" }
> > +
> > +      subroutine sub(x,a,n,m)
> > +      implicit none
> > +      real*8 x(*),a(*),atemp
> > +      integer i,j,k,m,n
> > +      real*8 s,t,u,v
> > +      do j=1,m
> > +         atemp=0.d0
> > +         do i=1,n
> > +            if (abs(a(i)).gt.atemp) then
> > +               atemp=a(i)
> > +               k = i
> > +            end if
> > +         enddo
> > +         call dummy(atemp,k)
> > +      enddo
> > +      return
> > +      end
> > +
> > +! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } }
> > +! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } }
> > Index: gcc/tree-if-conv.c
> > ===================================================================
> > --- gcc/tree-if-conv.c  (revision 242412)
> > +++ gcc/tree-if-conv.c  (working copy)
> > @@ -2533,6 +2533,7 @@ version_loop_for_if_conversion (struct loop *loop)
> >    struct loop *new_loop;
> >    gimple *g;
> >    gimple_stmt_iterator gsi;
> > +  unsigned int save_length;
> >
> >    g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
> >                                   build_int_cst (integer_type_node, loop->num),
> > @@ -2540,8 +2541,9 @@ version_loop_for_if_conversion (struct loop *loop)
> >    gimple_call_set_lhs (g, cond);
> >
> >    /* Save BB->aux around loop_version as that uses the same field.  */
> > -  void **saved_preds = XALLOCAVEC (void *, loop->num_nodes);
> > -  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> > +  void **saved_preds = XALLOCAVEC (void *, save_length);
> > +  for (unsigned i = 0; i < save_length; i++)
> >      saved_preds[i] = ifc_bbs[i]->aux;
> >
> >    initialize_original_copy_tables ();
> > @@ -2550,7 +2552,7 @@ version_loop_for_if_conversion (struct loop *loop)
> >                            REG_BR_PROB_BASE, true);
> >    free_original_copy_tables ();
> >
> > -  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +  for (unsigned i = 0; i < save_length; i++)
> >      ifc_bbs[i]->aux = saved_preds[i];
> >
> >    if (new_loop == NULL)
> > @@ -2565,6 +2567,40 @@ version_loop_for_if_conversion (struct loop *loop)
> >    return true;
> >  }
> >
> > +/* Return true when LOOP satisfies the follow conditions that will
> > +   allow it to be recognized by the vectorizer for outer-loop
> > +   vectorization:
> > +    - The loop has exactly one inner loop.
> > +    - The loop has a single exit.
> > +    - The loop header has a single successor, which is the inner
> > +      loop header.
> > +    - The loop exit block has a single predecessor, which is the
> > +      inner loop's exit block.  */
> > +
> > +static bool
> > +versionable_outer_loop_p (struct loop *loop)
> > +{
> > +  if (!loop->inner
> > +      || loop->inner->next
> > +      || !single_exit (loop)
> > +      || !single_succ_p (loop->header)
> > +      || single_succ (loop->header) != loop->inner->header)
> > +    return false;
> 
> not sure if the single_exit test would cover it but please add
> 
>     || ! loop_outer (loop)
> 
> as a test that 'loop' isn't the root node of the loop tree.
> 
> > +
> > +  gcc_assert (single_pred_p (loop->latch));
> 
> This assert is redundant.
> 
> > +  basic_block outer_exit = single_pred (loop->latch);
> > +  gcc_assert (single_pred_p (loop->inner->latch));
> 
> likewise (single_pred performs them).
> 
> > +  basic_block inner_exit = single_pred (loop->inner->latch);
> > +
> > +  if (!single_pred_p (outer_exit) || single_pred (outer_exit) != inner_exit)
> > +    return false;
> > +
> > +  if (dump_file)
> > +    fprintf (dump_file, "Found versionable outer loop\n");
> 
> "Found vectorizable outer loop for versioning"
> 
> > +
> > +  return true;
> > +}
> > +
> >  /* Performs splitting of critical edges.  Skip splitting and return false
> >     if LOOP will not be converted because:
> >
> > @@ -2767,8 +2803,16 @@ tree_if_conversion (struct loop *loop)
> >           || loop->dont_vectorize))
> >      goto cleanup;
> >
> > -  if ((any_pred_load_store || any_complicated_phi)
> > -      && !version_loop_for_if_conversion (loop))
> > +  /* Since we have no cost model, always version loops if vectorization
> > +     is enabled.  Either version this loop, or if the pattern is right
> > +     for outer-loop vectorization, version the outer loop.  In the
> > +     latter case we will still if-convert the original inner loop.  */
> > +  /* FIXME: When SLP vectorization can handle if-conversion on its own,
> > +     predicate all of if-conversion on flag_tree_loop_vectorize.  */
> > +  if ((any_pred_load_store || any_complicated_phi || flag_tree_loop_vectorize)
> 
> I'd say given fun->has_force_vectorize_loops this should be
> 
>        if (flag_tree_loop_if_convert != 1
> > +      && !version_loop_for_if_conversion
> > +      (versionable_outer_loop_p (loop_outer (loop))
> > +       ? loop_outer (loop) : loop))
> >      goto cleanup;
> 
> and thus always version if the user didnt' specify -ftree-loop-if-convert
> (-ftree-loop-if-convert-stores is dead, I forgot to remove uses).
> 
> Can you as a first patch (after fixing the minor things above) commit
> the patch w/o changing the condition under which we version
> (but _do_ version the outer loop if possible?).  This should be a strict
> improvement enabling more outer loop vectorization.
> 
> The 2nd patch can then fix the PR and change the condition.
> 
> Thus, ok with the nits fixed and the condition unchanged.
> 
> Can you re-test the 2nd part with my suggested changed predicate?
> 
> Thanks,
> Richard.
> 
> >    /* Now all statements are if-convertible.  Combine all the basic
> >
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]