New GCC options for loop vectorization
Xinliang David Li
davidxl@google.com
Thu Sep 12 21:09:00 GMT 2013
Currently -ftree-vectorize turns on both loop and slp vectorizations,
but there is no simple way to turn on loop vectorization alone. The
logic for default O3 setting is also complicated.
In this patch, two new options are introduced:
1) -ftree-loop-vectorize
This option is used to turn on loop vectorization only. option
-ftree-slp-vectorize also becomes a first class citizen, and no funny
business of Init(2) is needed. With this change, -ftree-vectorize
becomes a simple alias to -ftree-loop-vectorize +
-ftree-slp-vectorize.
For instance, to turn on only slp vectorize at O3, the old way is:
-O3 -fno-tree-vectorize -ftree-slp-vectorize
With the new change it becomes:
-O3 -fno-loop-vectorize
To turn on only loop vectorize at O2, the old way is
-O2 -ftree-vectorize -fno-slp-vectorize
The new way is
-O2 -ftree-loop-vectorize
2) -ftree-vect-loop-peeling
This option is used to turn on/off loop peeling for alignment. In the
long run, this should be folded into the cheap cost model proposed by
Richard. This option is also useful in scenarios where peeling can
introduce runtime problems:
http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be
common in practice.
Patch attached. Compiler boostrapped. Ok after testing?
thanks,
David
-------------- next part --------------
Index: omp-low.c
===================================================================
--- omp-low.c (revision 202481)
+++ omp-low.c (working copy)
@@ -2305,8 +2305,8 @@ omp_max_vf (void)
{
if (!optimize
|| optimize_debug
- || (!flag_tree_vectorize
- && global_options_set.x_flag_tree_vectorize))
+ || (!flag_tree_loop_vectorize
+ && global_options_set.x_flag_tree_loop_vectorize))
return 1;
int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi
loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
cfun->has_simduid_loops = true;
}
- /* If not -fno-tree-vectorize, hint that we want to vectorize
+ /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
the loop. */
- if ((flag_tree_vectorize
- || !global_options_set.x_flag_tree_vectorize)
+ if ((flag_tree_loop_vectorize
+ || !global_options_set.x_flag_tree_loop_vectorize)
&& loop->safelen > 1)
{
loop->force_vect = true;
Index: ChangeLog
===================================================================
--- ChangeLog (revision 202481)
+++ ChangeLog (working copy)
@@ -1,3 +1,24 @@
+2013-09-12 Xinliang David Li <davidxl@google.com>
+
+ * tree-if-conv.c (main_tree_if_conversion): Check new flag.
+ * omp-low.c (omp_max_vf): Ditto.
+ (expand_omp_simd): Ditto.
+ * tree-vectorizer.c (vectorize_loops): Ditto.
+ (gate_vect_slp): Ditto.
+ (gate_increase_alignment): Ditto.
+ * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Ditto.
+ * tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+ * tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+ (gate_tree_vectorize): Name change.
+ (tree_vectorize): Ditto.
+ (pass_vectorize::gate): Call new function.
+ (pass_vectorize::execute): Ditto.
+ opts.c: O3 default setting change.
+ (finish_options): Check new flag.
+ * doc/invoke.texi: Document new flags.
+ * common.opt: New flags.
+
+
2013-09-10 Richard Earnshaw <rearnsha@arm.com>
PR target/58361
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 202481)
+++ doc/invoke.texi (working copy)
@@ -419,10 +419,12 @@ Objective-C and Objective-C++ Dialects}.
-ftree-loop-if-convert-stores -ftree-loop-im @gol
-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
-ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
+-ftree-loop-vectorize @gol
-ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
-ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge @gol
--ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
+-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
+-ftree-vect-loop-version -ftree-vect-loop-peeling -ftree-vectorize @gol
+-ftree-vrp @gol
-funit-at-a-time -funroll-all-loops -funroll-loops @gol
-funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
-fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
@@ -6748,8 +6750,8 @@ invoking @option{-O2} on programs that u
Optimize yet more. @option{-O3} turns on all optimizations specified
by @option{-O2} and also turns on the @option{-finline-functions},
@option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize},
-@option{-fvect-cost-model},
+@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
+@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
@item -O0
@@ -6766,7 +6768,7 @@ optimizations designed to reduce code si
@option{-Os} disables the following optimization flags:
@gccoptlist{-falign-functions -falign-jumps -falign-loops @gol
-falign-labels -freorder-blocks -freorder-blocks-and-partition @gol
--fprefetch-loop-arrays -ftree-vect-loop-version}
+-fprefetch-loop-arrays -ftree-vect-loop-version -ftree-vect-loop-peeling}
@item -Ofast
@opindex Ofast
@@ -8008,14 +8010,29 @@ higher.
@item -ftree-vectorize
@opindex ftree-vectorize
+Perform vectorization on trees. This flag enables @option{-ftree-loop-vectorize}
+and @option{-ftree-slp-vectorize} if neither option is explicitly specified.
+
+@item -ftree-loop-vectorize
+@opindex ftree-loop-vectorize
Perform loop vectorization on trees. This flag is enabled by default at
-@option{-O3}.
+@option{-O3} and when @option{-ftree-vectorize} is enabled.
@item -ftree-slp-vectorize
@opindex ftree-slp-vectorize
Perform basic block vectorization on trees. This flag is enabled by default at
@option{-O3} and when @option{-ftree-vectorize} is enabled.
+@item -ftree-vect-loop-peeling
+@opindex ftree-vect-loop-peeling
+Perform loop peeling when doing loop vectorization on trees. When a loop
+appears to be vectorizable except that data alignment can not be determined
+at compile time, then loop is peeled to enhance alignment for one or more
+data accesses determined by the compiler. After loop peeling, those accesses
+will become well aligned that more efficient simd instructions can be used
+for them. This option is enabled by default except at level @option{-Os}
+where it is disabled.
+
@item -ftree-vect-loop-version
@opindex ftree-vect-loop-version
Perform loop versioning when doing loop vectorization on trees. When a loop
Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c (revision 202481)
+++ tree-if-conv.c (working copy)
@@ -1789,7 +1789,7 @@ main_tree_if_conversion (void)
FOR_EACH_LOOP (li, loop, 0)
if (flag_tree_loop_if_convert == 1
|| flag_tree_loop_if_convert_stores == 1
- || flag_tree_vectorize
+ || flag_tree_loop_vectorize
|| loop->force_vect)
changed |= tree_if_conversion (loop);
@@ -1815,7 +1815,7 @@ main_tree_if_conversion (void)
static bool
gate_tree_if_conversion (void)
{
- return (((flag_tree_vectorize || cfun->has_force_vect_loops)
+ return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
&& flag_tree_loop_if_convert != 0)
|| flag_tree_loop_if_convert == 1
|| flag_tree_loop_if_convert_stores == 1);
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c (revision 202481)
+++ tree-vect-data-refs.c (working copy)
@@ -1404,7 +1404,9 @@ vect_enhance_data_refs_alignment (loop_v
continue;
supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
- do_peeling = vector_alignment_reachable_p (dr);
+ do_peeling = (flag_tree_vect_loop_peeling
+ && optimize_loop_nest_for_speed_p (loop)
+ && vector_alignment_reachable_p (dr));
if (do_peeling)
{
if (known_alignment_for_access_p (dr))
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c (revision 202481)
+++ tree-ssa-pre.c (working copy)
@@ -3026,7 +3026,7 @@ inhibit_phi_insertion (basic_block bb, p
unsigned i;
/* If we aren't going to vectorize we don't inhibit anything. */
- if (!flag_tree_vectorize)
+ if (!flag_tree_loop_vectorize)
return false;
/* Otherwise we inhibit the insertion when the address of the
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c (revision 202481)
+++ tree-vectorizer.c (working copy)
@@ -341,7 +341,7 @@ vectorize_loops (void)
than all previously defined loops. This fact allows us to run
only over initial loops skipping newly generated ones. */
FOR_EACH_LOOP (li, loop, 0)
- if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop))
+ if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
|| loop->force_vect)
{
loop_vec_info loop_vinfo;
@@ -486,10 +486,7 @@ execute_vect_slp (void)
static bool
gate_vect_slp (void)
{
- /* Apply SLP either if the vectorizer is on and the user didn't specify
- whether to run SLP or not, or if the SLP flag was set by the user. */
- return ((flag_tree_vectorize != 0 && flag_tree_slp_vectorize != 0)
- || flag_tree_slp_vectorize == 1);
+ return flag_tree_slp_vectorize != 0;
}
namespace {
@@ -579,7 +576,7 @@ increase_alignment (void)
static bool
gate_increase_alignment (void)
{
- return flag_section_anchors && flag_tree_vectorize;
+ return flag_section_anchors && flag_tree_loop_vectorize;
}
Index: tree-ssa-loop.c
===================================================================
--- tree-ssa-loop.c (revision 202481)
+++ tree-ssa-loop.c (working copy)
@@ -303,7 +303,7 @@ make_pass_predcom (gcc::context *ctxt)
/* Loop autovectorization. */
static unsigned int
-tree_vectorize (void)
+tree_loop_vectorize (void)
{
if (number_of_loops (cfun) <= 1)
return 0;
@@ -312,9 +312,9 @@ tree_vectorize (void)
}
static bool
-gate_tree_vectorize (void)
+gate_tree_loop_vectorize (void)
{
- return flag_tree_vectorize || cfun->has_force_vect_loops;
+ return flag_tree_loop_vectorize || cfun->has_force_vect_loops;
}
namespace {
@@ -342,8 +342,8 @@ public:
{}
/* opt_pass methods: */
- bool gate () { return gate_tree_vectorize (); }
- unsigned int execute () { return tree_vectorize (); }
+ bool gate () { return gate_tree_loop_vectorize (); }
+ unsigned int execute () { return tree_loop_vectorize (); }
}; // class pass_vectorize
Index: opts.c
===================================================================
--- opts.c (revision 202481)
+++ opts.c (working copy)
@@ -498,7 +498,8 @@ static const struct default_options defa
{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
- { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },
+ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
+ { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
@@ -826,7 +827,8 @@ finish_options (struct gcc_options *opts
/* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
is disabled. */
- if (!opts->x_flag_tree_vectorize || !opts->x_flag_tree_loop_if_convert)
+ if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
+ || !opts->x_flag_tree_loop_if_convert)
maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
opts->x_param_values, opts_set->x_param_values);
@@ -1660,8 +1662,10 @@ common_handle_option (struct gcc_options
opts->x_flag_unswitch_loops = value;
if (!opts_set->x_flag_gcse_after_reload)
opts->x_flag_gcse_after_reload = value;
- if (!opts_set->x_flag_tree_vectorize)
- opts->x_flag_tree_vectorize = value;
+ if (!opts_set->x_flag_tree_loop_vectorize)
+ opts->x_flag_tree_loop_vectorize = value;
+ if (!opts_set->x_flag_tree_slp_vectorize)
+ opts->x_flag_tree_slp_vectorize = value;
if (!opts_set->x_flag_vect_cost_model)
opts->x_flag_vect_cost_model = value;
if (!opts_set->x_flag_tree_loop_distribute_patterns)
@@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
opts->x_flag_ipa_reference = false;
break;
+ case OPT_ftree_vectorize:
+ if (!opts_set->x_flag_tree_loop_vectorize)
+ opts->x_flag_tree_loop_vectorize = value;
+ if (!opts_set->x_flag_tree_slp_vectorize)
+ opts->x_flag_tree_slp_vectorize = value;
+ break;
case OPT_fshow_column:
dc->show_column = value;
break;
Index: common.opt
===================================================================
--- common.opt (revision 202481)
+++ common.opt (working copy)
@@ -2263,15 +2263,19 @@ Common Report Var(flag_var_tracking_unin
Perform variable tracking and also tag variables that are uninitialized
ftree-vectorize
-Common Report Var(flag_tree_vectorize) Optimization
-Enable loop vectorization on trees
+Common Report Optimization
+Enable vectorization on trees
ftree-vectorizer-verbose=
Common RejectNegative Joined UInteger Var(common_deferred_options) Defer
-ftree-vectorizer-verbose=<number> This switch is deprecated. Use -fopt-info instead.
+ftree-loop-vectorize
+Common Report Var(flag_tree_loop_vectorize) Optimization
+Enable loop vectorization on trees
+
ftree-slp-vectorize
-Common Report Var(flag_tree_slp_vectorize) Init(2) Optimization
+Common Report Var(flag_tree_slp_vectorize) Optimization
Enable basic block vectorization (SLP) on trees
fvect-cost-model
@@ -2282,6 +2286,10 @@ ftree-vect-loop-version
Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
Enable loop versioning when doing loop vectorization on trees
+ftree-vect-loop-peeling
+Common Report Var(flag_tree_vect_loop_peeling) Init(1) Optimization
+Enable loop peeling to enhance alignment when doing loop vectorization on trees
+
ftree-scev-cprop
Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
Enable copy propagation of scalar-evolution information.
More information about the Gcc-patches
mailing list