New GCC options for loop vectorization

Xinliang David Li davidxl@google.com
Fri Sep 13 16:41:00 GMT 2013


New patch attached.

1) the peeling part is removed
2) the new patch implements the last-one-wins logic. -ftree-vectorize
behaves like a true alias. -fno-tree-vectorize can override previous
-ftree-xxx-vectorize.

Ok for trunk after testing?

thanks,

David

On Fri, Sep 13, 2013 at 8:16 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>> but there is no simple way to turn on loop vectorization alone. The
>>> logic for default O3 setting is also complicated.
>>>
>>> In this patch, two new options are introduced:
>>>
>>> 1) -ftree-loop-vectorize
>>>
>>> This option is used to turn on loop vectorization only. option
>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>> becomes a simple alias to -ftree-loop-vectorize +
>>> -ftree-slp-vectorize.
>>>
>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>
>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>
>>> With the new change it becomes:
>>>
>>>     -O3 -fno-loop-vectorize
>>>
>>>
>>> To turn on only loop vectorize at O2, the old way is
>>>
>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>
>>> The new way is
>>>
>>>     -O2 -ftree-loop-vectorize
>>>
>>>
>>>
>>> 2) -ftree-vect-loop-peeling
>>>
>>> This option is used to turn on/off loop peeling for alignment.  In the
>>> long run, this should be folded into the cheap cost model proposed by
>>> Richard.  This option is also useful in scenarios where peeling can
>>> introduce runtime problems:
>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>> common in practice.
>>>
>>>
>>>
>>> Patch attached. Compiler boostrapped. Ok after testing?
>>
>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>
> Ok. Can you also comment on 2) ?
>
>>
>> I've stopped a quick try doing 1) myself because
>>
>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>          opts->x_flag_ipa_reference = false;
>>        break;
>>
>> +    case OPT_ftree_vectorize:
>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>> + opts->x_flag_tree_loop_vectorize = value;
>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>> + opts->x_flag_tree_slp_vectorize = value;
>> +      break;
>>
>> doesn't look obviously correct.  Does that handle
>>
>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>
>> or
>>
>>   -ftree-loop-vectorize -fno-tree-vectorize
>>
>> properly?  Currently at least
>>
>>   -ftree-slp-vectorize -fno-tree-vectorize
>>
>> doesn't "work".
>
>
> Right -- same is true for -fprofile-use option. FDO enables some
> passes, but can not re-enable them if they are flipped off before.
>
>>
>> That said, the option machinery doesn't handle an option being an alias
>> for two other options, so it's mechanism to contract positives/negatives
>> doesn't work here and the override hooks do not work reliably for
>> repeated options.
>>
>> Or am I wrong here?  Should we care at all?  Joseph?
>
> We should probably just document the behavior. Even better, we should
> deprecate the old option.
>
> thanks,
>
> David
>
>>
>> Thanks,
>> Richard.
>>
>>>
>>> thanks,
>>>
>>> David
-------------- next part --------------
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 202540)
+++ ChangeLog	(working copy)
@@ -1,3 +1,22 @@
+2013-09-12  Xinliang David Li  <davidxl@google.com>
+
+	* tree-if-conv.c (main_tree_if_conversion): Check new flag.
+	* omp-low.c (omp_max_vf): Ditto.
+	(expand_omp_simd): Ditto.
+	* tree-vectorizer.c (vectorize_loops): Ditto.
+	(gate_vect_slp): Ditto.
+	(gate_increase_alignment): Ditto.
+	* tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+	* tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+	(gate_tree_vectorize): Name change.
+	(tree_vectorize): Ditto.
+	(pass_vectorize::gate): Call new function.
+	(pass_vectorize::execute): Ditto.
+	opts.c: O3 default setting change.
+	(finish_options): Check new flag.
+	* doc/invoke.texi: Document new flags.
+	* common.opt: New flags.
+
 2013-09-12  Vladimir Makarov  <vmakarov@redhat.com>
 
 	PR middle-end/58335
Index: cp/lambda.c
===================================================================
--- cp/lambda.c	(revision 202540)
+++ cp/lambda.c	(working copy)
@@ -792,7 +792,7 @@ maybe_add_lambda_conv_op (tree type)
      particular, parameter pack expansions are marked PACK_EXPANSION_LOCAL_P in
      the body CALL, but not in DECLTYPE_CALL.  */
 
-  vec<tree, va_gc> *direct_argvec;
+  vec<tree, va_gc> *direct_argvec = NULL;
   tree decltype_call = 0, call;
   tree fn_result = TREE_TYPE (TREE_TYPE (callop));
 
Index: tree-ssa-loop.c
===================================================================
--- tree-ssa-loop.c	(revision 202540)
+++ tree-ssa-loop.c	(working copy)
@@ -303,7 +303,7 @@ make_pass_predcom (gcc::context *ctxt)
 /* Loop autovectorization.  */
 
 static unsigned int
-tree_vectorize (void)
+tree_loop_vectorize (void)
 {
   if (number_of_loops (cfun) <= 1)
     return 0;
@@ -312,9 +312,9 @@ tree_vectorize (void)
 }
 
 static bool
-gate_tree_vectorize (void)
+gate_tree_loop_vectorize (void)
 {
-  return flag_tree_vectorize || cfun->has_force_vect_loops;
+  return flag_tree_loop_vectorize || cfun->has_force_vect_loops;
 }
 
 namespace {
@@ -342,8 +342,8 @@ public:
   {}
 
   /* opt_pass methods: */
-  bool gate () { return gate_tree_vectorize (); }
-  unsigned int execute () { return tree_vectorize (); }
+  bool gate () { return gate_tree_loop_vectorize (); }
+  unsigned int execute () { return tree_loop_vectorize (); }
 
 }; // class pass_vectorize
 
Index: common.opt
===================================================================
--- common.opt	(revision 202540)
+++ common.opt	(working copy)
@@ -2263,15 +2263,19 @@ Common Report Var(flag_var_tracking_unin
 Perform variable tracking and also tag variables that are uninitialized
 
 ftree-vectorize
-Common Report Var(flag_tree_vectorize) Optimization
-Enable loop vectorization on trees
+Common Report Optimization
+Enable vectorization on trees
 
 ftree-vectorizer-verbose=
 Common RejectNegative Joined UInteger Var(common_deferred_options) Defer
 -ftree-vectorizer-verbose=<number>	This switch is deprecated. Use -fopt-info instead.
 
+ftree-loop-vectorize
+Common Report Var(flag_tree_loop_vectorize) Optimization
+Enable loop vectorization on trees
+
 ftree-slp-vectorize
-Common Report Var(flag_tree_slp_vectorize) Init(2) Optimization
+Common Report Var(flag_tree_slp_vectorize) Optimization
 Enable basic block vectorization (SLP) on trees
 
 fvect-cost-model
Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c	(revision 202540)
+++ tree-if-conv.c	(working copy)
@@ -1789,7 +1789,7 @@ main_tree_if_conversion (void)
   FOR_EACH_LOOP (li, loop, 0)
     if (flag_tree_loop_if_convert == 1
 	|| flag_tree_loop_if_convert_stores == 1
-	|| flag_tree_vectorize
+	|| flag_tree_loop_vectorize
 	|| loop->force_vect)
     changed |= tree_if_conversion (loop);
 
@@ -1815,7 +1815,7 @@ main_tree_if_conversion (void)
 static bool
 gate_tree_if_conversion (void)
 {
-  return (((flag_tree_vectorize || cfun->has_force_vect_loops)
+  return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
 	   && flag_tree_loop_if_convert != 0)
 	  || flag_tree_loop_if_convert == 1
 	  || flag_tree_loop_if_convert_stores == 1);
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c	(revision 202540)
+++ tree-ssa-pre.c	(working copy)
@@ -3026,7 +3026,7 @@ inhibit_phi_insertion (basic_block bb, p
   unsigned i;
 
   /* If we aren't going to vectorize we don't inhibit anything.  */
-  if (!flag_tree_vectorize)
+  if (!flag_tree_loop_vectorize)
     return false;
 
   /* Otherwise we inhibit the insertion when the address of the
Index: opts.c
===================================================================
--- opts.c	(revision 202540)
+++ opts.c	(working copy)
@@ -498,7 +498,8 @@ static const struct default_options defa
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
-    { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
@@ -826,7 +827,8 @@ finish_options (struct gcc_options *opts
 
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
      is disabled.  */
-  if (!opts->x_flag_tree_vectorize || !opts->x_flag_tree_loop_if_convert)
+  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
+       || !opts->x_flag_tree_loop_if_convert)
     maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
                            opts->x_param_values, opts_set->x_param_values);
 
@@ -1660,8 +1662,10 @@ common_handle_option (struct gcc_options
 	opts->x_flag_unswitch_loops = value;
       if (!opts_set->x_flag_gcse_after_reload)
 	opts->x_flag_gcse_after_reload = value;
-      if (!opts_set->x_flag_tree_vectorize)
-	opts->x_flag_tree_vectorize = value;
+      if (!opts_set->x_flag_tree_loop_vectorize)
+	opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+	opts->x_flag_tree_slp_vectorize = value;
       if (!opts_set->x_flag_vect_cost_model)
 	opts->x_flag_vect_cost_model = value;
       if (!opts_set->x_flag_tree_loop_distribute_patterns)
@@ -1691,6 +1695,10 @@ common_handle_option (struct gcc_options
         opts->x_flag_ipa_reference = false;
       break;
 
+    case OPT_ftree_vectorize:
+      opts->x_flag_tree_loop_vectorize = value;
+      opts->x_flag_tree_slp_vectorize = value;
+      break;
     case OPT_fshow_column:
       dc->show_column = value;
       break;
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 202540)
+++ doc/invoke.texi	(working copy)
@@ -419,10 +419,11 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-if-convert-stores -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
+-ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge @gol
--ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
+-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
+-ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
@@ -6751,8 +6752,8 @@ invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize},
-@option{-fvect-cost-model},
+@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
+@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
 @option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
 
 @item -O0
@@ -8011,8 +8012,13 @@ higher.
 
 @item -ftree-vectorize
 @opindex ftree-vectorize
+Perform vectorization on trees. This flag enables @option{-ftree-loop-vectorize}
+and @option{-ftree-slp-vectorize} if neither option is explicitly specified.
+
+@item -ftree-loop-vectorize
+@opindex ftree-loop-vectorize
 Perform loop vectorization on trees. This flag is enabled by default at
-@option{-O3}.
+@option{-O3} and when @option{-ftree-vectorize} is enabled.
 
 @item -ftree-slp-vectorize
 @opindex ftree-slp-vectorize
Index: omp-low.c
===================================================================
--- omp-low.c	(revision 202540)
+++ omp-low.c	(working copy)
@@ -2305,8 +2305,8 @@ omp_max_vf (void)
 {
   if (!optimize
       || optimize_debug
-      || (!flag_tree_vectorize
-	  && global_options_set.x_flag_tree_vectorize))
+      || (!flag_tree_loop_vectorize
+	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
 	  loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
 	  cfun->has_simduid_loops = true;
 	}
-      /* If not -fno-tree-vectorize, hint that we want to vectorize
+      /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
 	 the loop.  */
-      if ((flag_tree_vectorize
-	   || !global_options_set.x_flag_tree_vectorize)
+      if ((flag_tree_loop_vectorize
+	   || !global_options_set.x_flag_tree_loop_vectorize)
 	  && loop->safelen > 1)
 	{
 	  loop->force_vect = true;
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 202540)
+++ tree-vectorizer.c	(working copy)
@@ -341,7 +341,7 @@ vectorize_loops (void)
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
   FOR_EACH_LOOP (li, loop, 0)
-    if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop))
+    if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
 	|| loop->force_vect)
       {
 	loop_vec_info loop_vinfo;
@@ -486,10 +486,7 @@ execute_vect_slp (void)
 static bool
 gate_vect_slp (void)
 {
-  /* Apply SLP either if the vectorizer is on and the user didn't specify
-     whether to run SLP or not, or if the SLP flag was set by the user.  */
-  return ((flag_tree_vectorize != 0 && flag_tree_slp_vectorize != 0)
-          || flag_tree_slp_vectorize == 1);
+  return flag_tree_slp_vectorize != 0;
 }
 
 namespace {
@@ -579,7 +576,7 @@ increase_alignment (void)
 static bool
 gate_increase_alignment (void)
 {
-  return flag_section_anchors && flag_tree_vectorize;
+  return flag_section_anchors && flag_tree_loop_vectorize;
 }
 
 


More information about the Gcc-patches mailing list