New GCC options for loop vectorization

Xinliang David Li davidxl@google.com
Thu Sep 12 21:09:00 GMT 2013


Currently -ftree-vectorize turns on both loop and slp vectorizations,
but there is no simple way to turn on loop vectorization alone. The
logic for default O3 setting is also complicated.

In this patch, two new options are introduced:

1) -ftree-loop-vectorize

This option is used to turn on loop vectorization only. option
-ftree-slp-vectorize also becomes a first class citizen, and no funny
business of Init(2) is needed.  With this change, -ftree-vectorize
becomes a simple alias to -ftree-loop-vectorize +
-ftree-slp-vectorize.

For instance, to turn on only slp vectorize at O3, the old way is:

     -O3 -fno-tree-vectorize -ftree-slp-vectorize

With the new change it becomes:

    -O3 -fno-loop-vectorize


To turn on only loop vectorize at O2, the old way is

    -O2 -ftree-vectorize -fno-slp-vectorize

The new way is

    -O2 -ftree-loop-vectorize



2) -ftree-vect-loop-peeling

This option is used to turn on/off loop peeling for alignment.  In the
long run, this should be folded into the cheap cost model proposed by
Richard.  This option is also useful in scenarios where peeling can
introduce runtime problems:
http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
common in practice.



Patch attached. Compiler boostrapped. Ok after testing?


thanks,

David
-------------- next part --------------
Index: omp-low.c
===================================================================
--- omp-low.c	(revision 202481)
+++ omp-low.c	(working copy)
@@ -2305,8 +2305,8 @@ omp_max_vf (void)
 {
   if (!optimize
       || optimize_debug
-      || (!flag_tree_vectorize
-	  && global_options_set.x_flag_tree_vectorize))
+      || (!flag_tree_loop_vectorize
+	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
 	  loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
 	  cfun->has_simduid_loops = true;
 	}
-      /* If not -fno-tree-vectorize, hint that we want to vectorize
+      /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
 	 the loop.  */
-      if ((flag_tree_vectorize
-	   || !global_options_set.x_flag_tree_vectorize)
+      if ((flag_tree_loop_vectorize
+	   || !global_options_set.x_flag_tree_loop_vectorize)
 	  && loop->safelen > 1)
 	{
 	  loop->force_vect = true;
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 202481)
+++ ChangeLog	(working copy)
@@ -1,3 +1,24 @@
+2013-09-12  Xinliang David Li  <davidxl@google.com>
+
+	* tree-if-conv.c (main_tree_if_conversion): Check new flag.
+	* omp-low.c (omp_max_vf): Ditto.
+	(expand_omp_simd): Ditto.
+	* tree-vectorizer.c (vectorize_loops): Ditto.
+	(gate_vect_slp): Ditto.
+	(gate_increase_alignment): Ditto.
+	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Ditto.
+	* tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+	* tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+	(gate_tree_vectorize): Name change.
+	(tree_vectorize): Ditto.
+	(pass_vectorize::gate): Call new function.
+	(pass_vectorize::execute): Ditto.
+	opts.c: O3 default setting change.
+	(finish_options): Check new flag.
+	* doc/invoke.texi: Document new flags.
+	* common.opt: New flags.
+
+
 2013-09-10  Richard Earnshaw  <rearnsha@arm.com>
 
 	PR target/58361
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 202481)
+++ doc/invoke.texi	(working copy)
@@ -419,10 +419,12 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-if-convert-stores -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
+-ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge @gol
--ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
+-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
+-ftree-vect-loop-version -ftree-vect-loop-peeling -ftree-vectorize @gol
+-ftree-vrp @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
@@ -6748,8 +6750,8 @@ invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize},
-@option{-fvect-cost-model},
+@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
+@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
 @option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
 
 @item -O0
@@ -6766,7 +6768,7 @@ optimizations designed to reduce code si
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
 -falign-labels  -freorder-blocks  -freorder-blocks-and-partition @gol
--fprefetch-loop-arrays  -ftree-vect-loop-version}
+-fprefetch-loop-arrays  -ftree-vect-loop-version -ftree-vect-loop-peeling}
 
 @item -Ofast
 @opindex Ofast
@@ -8008,14 +8010,29 @@ higher.
 
 @item -ftree-vectorize
 @opindex ftree-vectorize
+Perform vectorization on trees. This flag enables @option{-ftree-loop-vectorize}
+and @option{-ftree-slp-vectorize} if neither option is explicitly specified.
+
+@item -ftree-loop-vectorize
+@opindex ftree-loop-vectorize
 Perform loop vectorization on trees. This flag is enabled by default at
-@option{-O3}.
+@option{-O3} and when @option{-ftree-vectorize} is enabled.
 
 @item -ftree-slp-vectorize
 @opindex ftree-slp-vectorize
 Perform basic block vectorization on trees. This flag is enabled by default at
 @option{-O3} and when @option{-ftree-vectorize} is enabled.
 
+@item -ftree-vect-loop-peeling
+@opindex ftree-vect-loop-peeling
+Perform loop peeling when doing loop vectorization on trees.  When a loop
+appears to be vectorizable except that data alignment can not be determined
+at compile time, then loop is peeled to enhance alignment for one or more
+data accesses determined by the compiler. After loop peeling, those accesses
+will become well aligned that more efficient simd instructions can be used 
+for them.  This option is enabled by default except at level @option{-Os} 
+where it is disabled.
+
 @item -ftree-vect-loop-version
 @opindex ftree-vect-loop-version
 Perform loop versioning when doing loop vectorization on trees.  When a loop
Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c	(revision 202481)
+++ tree-if-conv.c	(working copy)
@@ -1789,7 +1789,7 @@ main_tree_if_conversion (void)
   FOR_EACH_LOOP (li, loop, 0)
     if (flag_tree_loop_if_convert == 1
 	|| flag_tree_loop_if_convert_stores == 1
-	|| flag_tree_vectorize
+	|| flag_tree_loop_vectorize
 	|| loop->force_vect)
     changed |= tree_if_conversion (loop);
 
@@ -1815,7 +1815,7 @@ main_tree_if_conversion (void)
 static bool
 gate_tree_if_conversion (void)
 {
-  return (((flag_tree_vectorize || cfun->has_force_vect_loops)
+  return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
 	   && flag_tree_loop_if_convert != 0)
 	  || flag_tree_loop_if_convert == 1
 	  || flag_tree_loop_if_convert_stores == 1);
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c	(revision 202481)
+++ tree-vect-data-refs.c	(working copy)
@@ -1404,7 +1404,9 @@ vect_enhance_data_refs_alignment (loop_v
 	continue;
 
       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
-      do_peeling = vector_alignment_reachable_p (dr);
+      do_peeling = (flag_tree_vect_loop_peeling
+	            && optimize_loop_nest_for_speed_p (loop)
+                    && vector_alignment_reachable_p (dr));
       if (do_peeling)
         {
           if (known_alignment_for_access_p (dr))
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c	(revision 202481)
+++ tree-ssa-pre.c	(working copy)
@@ -3026,7 +3026,7 @@ inhibit_phi_insertion (basic_block bb, p
   unsigned i;
 
   /* If we aren't going to vectorize we don't inhibit anything.  */
-  if (!flag_tree_vectorize)
+  if (!flag_tree_loop_vectorize)
     return false;
 
   /* Otherwise we inhibit the insertion when the address of the
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 202481)
+++ tree-vectorizer.c	(working copy)
@@ -341,7 +341,7 @@ vectorize_loops (void)
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
   FOR_EACH_LOOP (li, loop, 0)
-    if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop))
+    if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
 	|| loop->force_vect)
       {
 	loop_vec_info loop_vinfo;
@@ -486,10 +486,7 @@ execute_vect_slp (void)
 static bool
 gate_vect_slp (void)
 {
-  /* Apply SLP either if the vectorizer is on and the user didn't specify
-     whether to run SLP or not, or if the SLP flag was set by the user.  */
-  return ((flag_tree_vectorize != 0 && flag_tree_slp_vectorize != 0)
-          || flag_tree_slp_vectorize == 1);
+  return flag_tree_slp_vectorize != 0;
 }
 
 namespace {
@@ -579,7 +576,7 @@ increase_alignment (void)
 static bool
 gate_increase_alignment (void)
 {
-  return flag_section_anchors && flag_tree_vectorize;
+  return flag_section_anchors && flag_tree_loop_vectorize;
 }
 
 
Index: tree-ssa-loop.c
===================================================================
--- tree-ssa-loop.c	(revision 202481)
+++ tree-ssa-loop.c	(working copy)
@@ -303,7 +303,7 @@ make_pass_predcom (gcc::context *ctxt)
 /* Loop autovectorization.  */
 
 static unsigned int
-tree_vectorize (void)
+tree_loop_vectorize (void)
 {
   if (number_of_loops (cfun) <= 1)
     return 0;
@@ -312,9 +312,9 @@ tree_vectorize (void)
 }
 
 static bool
-gate_tree_vectorize (void)
+gate_tree_loop_vectorize (void)
 {
-  return flag_tree_vectorize || cfun->has_force_vect_loops;
+  return flag_tree_loop_vectorize || cfun->has_force_vect_loops;
 }
 
 namespace {
@@ -342,8 +342,8 @@ public:
   {}
 
   /* opt_pass methods: */
-  bool gate () { return gate_tree_vectorize (); }
-  unsigned int execute () { return tree_vectorize (); }
+  bool gate () { return gate_tree_loop_vectorize (); }
+  unsigned int execute () { return tree_loop_vectorize (); }
 
 }; // class pass_vectorize
 
Index: opts.c
===================================================================
--- opts.c	(revision 202481)
+++ opts.c	(working copy)
@@ -498,7 +498,8 @@ static const struct default_options defa
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
-    { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
@@ -826,7 +827,8 @@ finish_options (struct gcc_options *opts
 
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
      is disabled.  */
-  if (!opts->x_flag_tree_vectorize || !opts->x_flag_tree_loop_if_convert)
+  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
+       || !opts->x_flag_tree_loop_if_convert)
     maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
                            opts->x_param_values, opts_set->x_param_values);
 
@@ -1660,8 +1662,10 @@ common_handle_option (struct gcc_options
 	opts->x_flag_unswitch_loops = value;
       if (!opts_set->x_flag_gcse_after_reload)
 	opts->x_flag_gcse_after_reload = value;
-      if (!opts_set->x_flag_tree_vectorize)
-	opts->x_flag_tree_vectorize = value;
+      if (!opts_set->x_flag_tree_loop_vectorize)
+	opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+	opts->x_flag_tree_slp_vectorize = value;
       if (!opts_set->x_flag_vect_cost_model)
 	opts->x_flag_vect_cost_model = value;
       if (!opts_set->x_flag_tree_loop_distribute_patterns)
@@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
         opts->x_flag_ipa_reference = false;
       break;
 
+    case OPT_ftree_vectorize:
+      if (!opts_set->x_flag_tree_loop_vectorize)
+	opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+	opts->x_flag_tree_slp_vectorize = value;
+      break;
     case OPT_fshow_column:
       dc->show_column = value;
       break;
Index: common.opt
===================================================================
--- common.opt	(revision 202481)
+++ common.opt	(working copy)
@@ -2263,15 +2263,19 @@ Common Report Var(flag_var_tracking_unin
 Perform variable tracking and also tag variables that are uninitialized
 
 ftree-vectorize
-Common Report Var(flag_tree_vectorize) Optimization
-Enable loop vectorization on trees
+Common Report Optimization
+Enable vectorization on trees
 
 ftree-vectorizer-verbose=
 Common RejectNegative Joined UInteger Var(common_deferred_options) Defer
 -ftree-vectorizer-verbose=<number>	This switch is deprecated. Use -fopt-info instead.
 
+ftree-loop-vectorize
+Common Report Var(flag_tree_loop_vectorize) Optimization
+Enable loop vectorization on trees
+
 ftree-slp-vectorize
-Common Report Var(flag_tree_slp_vectorize) Init(2) Optimization
+Common Report Var(flag_tree_slp_vectorize) Optimization
 Enable basic block vectorization (SLP) on trees
 
 fvect-cost-model
@@ -2282,6 +2286,10 @@ ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
 
+ftree-vect-loop-peeling
+Common Report Var(flag_tree_vect_loop_peeling) Init(1) Optimization
+Enable loop peeling to enhance alignment when doing loop vectorization on trees
+
 ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.


More information about the Gcc-patches mailing list