This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking


On 17 Jun 10:46, Jeff Law wrote:
> On 06/17/2016 08:33 AM, Ilya Enkovich wrote:
> >>
> >>Hmm, there seems to be a level of indirection I'm missing here.  We're
> >>smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
> >>the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
> >>the original loop to the vectorized epilogue.  What am I missing?  Rather
> >>than smuggling around in the aux field, is there some inherent reason why we
> >>can't just copy the info from the original loop directly into
> >>LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
> >
> >LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
> > - mark this loop as epilogue
> > - get VF of original loop (required for both mask and nomask modes)
> > - get decision about epilogue masking
> >
> >That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
> >vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
> >loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
> >LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
> >place to bind it with the original loop and therefore I use vacant loop->aux
> >for that.  Any other way to bind epilogue with its original loop would work
> >as well.  I just chose loop->aux to avoid new fields and data structures.
> I was starting to draw the conclusion that the smuggling in the aux field
> was for cases when there was no LOOP_VINFO.  But was rather late at night
> and I didn't follow that idea through the code.  THanks for clarifying.
> 
> 
> >>
> >>And something just occurred to me -- is there some inherent reason why SLP
> >>doesn't vectorize the epilogue, particularly for the cases where we can
> >>vectorize the epilogue using smaller vectors?  Sorry if you've already
> >>answered this somewhere or it's a dumb question.
> >
> >IIUC this may happen only if we unroll epilogue into a single BB which happens
> >only when epilogue iterations count is known. Right?
> Probably.  The need to make sure the epilogue is unrolled probably makes
> this a non-starter.
> 
> I have a soft spot for SLP as I stumbled on the idea while rewriting a
> presentation in the wee hours of the morning for the next day. Essentially
> it was a "poor man's" vectorizer that could be done for dramatically less
> engineering cost than a traditional vectorizer.  The MIT paper outlining the
> same ideas came out a couple years later...
> 
> 
> >>+       /* Add new loop to a processing queue.  To make it easier
> >>>+          to match loop and its epilogue vectorization in dumps
> >>>+          put new loop as the next loop to process.  */
> >>>+       if (new_loop)
> >>>+         {
> >>>+           loops.safe_insert (i + 1, new_loop->num);
> >>>+           vect_loops_num = number_of_loops (cfun);
> >>>+         }
> >>>+
> >>
> >>So just to be clear, the only reason to do this is for dumps -- other than
> >>processing the loop before it's epilogue, there's no other inherently
> >>necessary ordering of the loops, right?
> >
> >Right, I don't see other reasons to do it.
> Perfect.  Thanks for confirming.
> 
> jeff
> 

Hi,

Here is an updated version with disabled alias checks for loop epilogues.
Instead of calling vect_analyze_data_ref_dependence I just use VF of the
original loop as MAX_VF for epilogue.

Thanks,
Ilya
--
gcc/

2016-05-24  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-if-conv.c (tree_if_conversion): Make public.
	* tree-if-conv.h: New file.
	* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
	dynamic alias checks for epilogues.
	(vect_enhance_data_refs_alignment): Don't try to enhance alignment
	for epilogues.
	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
	created loop.
	* tree-vect-loop.c: include tree-if-conv.h.
	(destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
	loop->aux.
	(vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
	loop->aux.
	(vect_analyze_loop): Reset loop->aux.
	(vect_transform_loop): Check if created epilogue should be returned
	for further vectorization.  If-convert epilogue if required.
	* tree-vectorizer.c (vectorize_loops): Add a queue of loops to
	process and insert vectorized loop epilogues into this queue.
	* tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created
	loop.
	(vect_transform_loop): Return created loop.


diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 5914a78..b790ca9 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2808,7 +2808,7 @@ ifcvt_local_dce (basic_block bb)
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
 
-static unsigned int
+unsigned int
 tree_if_conversion (struct loop *loop)
 {
   unsigned int todo = 0;
diff --git a/gcc/tree-if-conv.h b/gcc/tree-if-conv.h
new file mode 100644
index 0000000..3a732c2
--- /dev/null
+++ b/gcc/tree-if-conv.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_TREE_IF_CONV_H
+#define GCC_TREE_IF_CONV_H
+
+unsigned int tree_if_conversion (struct loop *);
+
+#endif  /* GCC_TREE_IF_CONV_H  */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 36d302a..a902a50 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -472,9 +472,15 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, int *max_vf)
 				LOOP_VINFO_LOOP_NEST (loop_vinfo), true))
     return false;
 
-  FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr)
-    if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf))
-      return false;
+  /* For epilogues we either have no aliases or alias versioning
+     was applied to original loop.  Therefore we may just get max_vf
+     using VF of original loop.  */
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+    *max_vf = LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo);
+  else
+    FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr)
+      if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf))
+	return false;
 
   return true;
 }
@@ -1595,7 +1601,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
-      || loop->inner)
+      || loop->inner
+      /* Required peeling was performed in prologue and
+	 is not required for epilogue.  */
+      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
     do_peeling = false;
 
   if (do_peeling
@@ -1875,7 +1884,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   do_versioning =
 	optimize_loop_nest_for_speed_p (loop)
-	&& (!loop->inner); /* FORNOW */
+	&& (!loop->inner) /* FORNOW */
+        /* Required versioning was performed for the
+	   original loop and is not required for epilogue.  */
+	&& !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
 
   if (do_versioning)
     {
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 7ec6dae..fab5879 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1742,9 +1742,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters,
    NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).
 
    COND_EXPR and COND_EXPR_STMT_LIST are combined with a new generated
-   test.  */
+   test.
 
-void
+   Return created loop.  */
+
+struct loop *
 vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
 				tree ni_name, tree ratio_mult_vf_name,
 				unsigned int th, bool check_profitability)
@@ -1812,6 +1814,8 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   scev_reset ();
 
   free_original_copy_tables ();
+
+  return new_loop;
 }
 
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 90d42b5..d48f565 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "gimple-fold.h"
 #include "cgraph.h"
+#include "tree-if-conv.h"
 
 /* Loop Vectorization Pass.
 
@@ -1214,8 +1215,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
   destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
   loop_vinfo->scalar_cost_vec.release ();
 
+  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
   free (loop_vinfo);
-  loop->aux = NULL;
 }
 
 
@@ -1501,13 +1502,24 @@ vect_analyze_loop_form (struct loop *loop)
 
   if (! vect_analyze_loop_form_1 (loop, &loop_cond, &number_of_iterationsm1,
 				  &number_of_iterations, &inner_loop_cond))
-    return NULL;
+    {
+      loop->aux = NULL;
+      return NULL;
+    }
 
   loop_vec_info loop_vinfo = new_loop_vec_info (loop);
   LOOP_VINFO_NITERSM1 (loop_vinfo) = number_of_iterationsm1;
   LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
   LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = number_of_iterations;
 
+  /* For epilogues we want to vectorize aux holds
+     loop_vec_info of the original loop.  */
+  if (loop->aux)
+    {
+      gcc_assert (LOOP_VINFO_VECTORIZABLE_P ((loop_vec_info)loop->aux));
+      LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = (loop_vec_info)loop->aux;
+    }
+
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       if (dump_enabled_p ())
@@ -1524,7 +1536,6 @@ vect_analyze_loop_form (struct loop *loop)
     STMT_VINFO_TYPE (vinfo_for_stmt (inner_loop_cond))
       = loop_exit_ctrl_vec_info_type;
 
-  gcc_assert (!loop->aux);
   loop->aux = loop_vinfo;
   return loop_vinfo;
 }
@@ -2284,7 +2295,10 @@ vect_analyze_loop (struct loop *loop)
       if (fatal
 	  || vector_sizes == 0
 	  || current_vector_size == 0)
-	return NULL;
+	{
+	  loop->aux = NULL;
+	  return NULL;
+	}
 
       /* Try the next biggest vector size.  */
       current_vector_size = 1 << floor_log2 (vector_sizes);
@@ -6573,10 +6587,11 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
    Vectorize the loop - created vectorized stmts to replace the scalar
    stmts in the loop, and update the loop exit condition.  */
 
-void
+struct loop *
 vect_transform_loop (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  struct loop *epilogue = NULL;
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
   int i;
@@ -6658,8 +6673,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 	ni_name = vect_build_loop_niters (loop_vinfo);
       vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf,
 				       &ratio);
-      vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, ratio_mult_vf,
-				      th, check_profitability);
+      epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name,
+						 ratio_mult_vf, th,
+						 check_profitability);
     }
   else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
@@ -6964,6 +6980,59 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (loop_vinfo), i, instance)
     vect_free_slp_instance (instance);
   LOOP_VINFO_SLP_INSTANCES (loop_vinfo).release ();
+
+  /* Don't vectorize epilogue for epilogue.  */
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+    epilogue = NULL;
+  /* Scalar epilogue is not vectorized in case
+     we use combined vector epilogue.  */
+  else if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+    epilogue = NULL;
+
+  if (epilogue)
+    {
+      if (!LOOP_VINFO_MASK_EPILOGUE (loop_vinfo))
+	{
+	  unsigned int vector_sizes
+	    = targetm.vectorize.autovectorize_vector_sizes ();
+	  vector_sizes &= current_vector_size - 1;
+
+	  if (!(flag_tree_vectorize_epilogues & VECT_EPILOGUE_NOMASK))
+	    epilogue = NULL;
+	  else if (!vector_sizes)
+	    epilogue = NULL;
+	  else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+		   && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+	    {
+	      int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
+	      int ratio = current_vector_size / smallest_vec_size;
+	      int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+		- LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+	      eiters = eiters % vectorization_factor;
+
+	      epilogue->nb_iterations_upper_bound = eiters - 1;
+
+	      if (eiters < vectorization_factor / ratio)
+		epilogue = NULL;
+	    }
+	}
+    }
+
+  if (epilogue)
+    {
+      epilogue->force_vectorize = loop->force_vectorize;
+      epilogue->safelen = loop->safelen;
+      epilogue->dont_vectorize = false;
+
+      /* We may need to if-convert epilogue to vectorize it.  */
+      if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
+	tree_if_conversion (epilogue);
+
+      gcc_assert (!epilogue->aux);
+      epilogue->aux = loop_vinfo;
+    }
+
+  return epilogue;
 }
 
 /* The code below is trying to perform simple optimization - revert
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2669813..1fc8b65 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -491,14 +491,16 @@ vectorize_loops (void)
 {
   unsigned int i;
   unsigned int num_vectorized_loops = 0;
-  unsigned int vect_loops_num;
+  unsigned int vect_loops_num = number_of_loops (cfun);
   struct loop *loop;
   hash_table<simduid_to_vf> *simduid_to_vf_htab = NULL;
   hash_table<simd_array_to_simduid> *simd_array_to_simduid_htab = NULL;
   bool any_ifcvt_loops = false;
   unsigned ret = 0;
+  auto_vec<unsigned int> loops (vect_loops_num);
 
-  vect_loops_num = number_of_loops (cfun);
+  FOR_EACH_LOOP (loop, 0)
+    loops.quick_push (loop->num);
 
   /* Bail out if there are no loops.  */
   if (vect_loops_num <= 1)
@@ -514,14 +516,18 @@ vectorize_loops (void)
   /* If some loop was duplicated, it gets bigger number
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
-  FOR_EACH_LOOP (loop, 0)
-    if (loop->dont_vectorize)
+  for (i = 0; i < loops.length (); i++)
+    if (!(loop = get_loop (cfun, loops[i])))
+      continue;
+    else if (loop->dont_vectorize)
       any_ifcvt_loops = true;
     else if ((flag_tree_loop_vectorize
-	      && optimize_loop_nest_for_speed_p (loop))
+	      && (optimize_loop_nest_for_speed_p (loop)
+		  || loop->aux))
 	     || loop->force_vectorize)
       {
 	loop_vec_info loop_vinfo;
+	struct loop *new_loop;
 	vect_location = find_loop_location (loop);
         if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION
 	    && dump_enabled_p ())
@@ -551,12 +557,21 @@ vectorize_loops (void)
 	    && dump_enabled_p ())
           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
                            "loop vectorized\n");
-	vect_transform_loop (loop_vinfo);
+	new_loop = vect_transform_loop (loop_vinfo);
 	num_vectorized_loops++;
 	/* Now that the loop has been vectorized, allow it to be unrolled
 	   etc.  */
 	loop->force_vectorize = false;
 
+	/* Add new loop to a processing queue.  To make it easier
+	   to match loop and its epilogue vectorization in dumps
+	   put new loop as the next loop to process.  */
+	if (new_loop)
+	  {
+	    loops.safe_insert (i + 1, new_loop->num);
+	    vect_loops_num = number_of_loops (cfun);
+	  }
+
 	if (loop->simduid)
 	  {
 	    simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2c6cdbf..26d84b4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -984,8 +984,8 @@ extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
 struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
 						     struct loop *, edge);
 extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
-extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree,
-					    unsigned int, bool);
+extern struct loop *vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree,
+						    unsigned int, bool);
 extern void vect_do_peeling_for_alignment (loop_vec_info, tree,
 					   unsigned int, bool);
 extern source_location find_loop_location (struct loop *);
@@ -1099,7 +1099,7 @@ extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *, bool,
 /* Drive for loop analysis stage.  */
 extern loop_vec_info vect_analyze_loop (struct loop *);
 /* Drive for loop transformation stage.  */
-extern void vect_transform_loop (loop_vec_info);
+extern struct loop *vect_transform_loop (loop_vec_info);
 extern loop_vec_info vect_analyze_loop_form (struct loop *);
 extern bool vectorizable_live_operation (gimple *, gimple_stmt_iterator *,
 					 gimple **);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]