[PATCH] tree-optimization/96043 - BB vectorization costing improvement
Richard Biener
rguenther@suse.de
Thu Sep 10 09:20:18 GMT 2020
On Wed, 9 Sep 2020, Michael Matz wrote:
> Hello,
>
> On Tue, 8 Sep 2020, Richard Biener wrote:
>
> > CCing some people to double-check my graph partitioning algorithm.
>
> I seem to not know the pre-existing data structures enough to say anything
> about this, but I noticed small things which might or might not indicate
> thinkos or incompleteness:
>
> > +static void
> > +vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
> > + slp_instance instance, slp_tree node,
> > + hash_map<stmt_vec_info, slp_instance> &stmt_to_instance,
> > + hash_map<slp_instance, slp_instance> &instance_leader)
> > +{
> > + stmt_vec_info stmt_info;
> > + unsigned i;
> > + bool all = true;
> > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > + {
> > + bool existed_p;
> > + slp_instance &stmt_instance
> > + = stmt_to_instance.get_or_insert (stmt_info, &existed_p);
> > + if (!existed_p)
> > + {
> > + all = false;
> > + }
> > + else if (stmt_instance != instance)
> > + {
> > + /* If we're running into a previously marked stmt make us the
> > + leader of the current ultimate leader. This keeps the
> > + leader chain acyclic and works even when the current instance
> > + connects two previously independent graph parts. */
> > + stmt_instance = get_ultimate_leader (stmt_instance, instance_leader);
> > + if (stmt_instance != instance)
> > + instance_leader.put (stmt_instance, instance);
> > + }
> > + stmt_instance = instance;
>
> This last assignment is useless.
Not quite - I refactored this a tiny bit to make the intent more clear.
> > +/* Partition the SLP graph into pieces that can be costed independently. */
> > +
> > +static void
> > +vect_bb_partition_graph (bb_vec_info bb_vinfo)
> > +{
> ...
> > + /* Then collect entries to each independent subgraphs. */
> > + for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i)
> > + {
> > + slp_instance leader = get_ultimate_leader (instance, instance_leader);
> > + if (leader == instance)
> > + leader->subgraph_entries.safe_push (leader);
> > + else
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "instance %p is leader of %p\n",
> > + leader, instance);
> > + leader->subgraph_entries.safe_push (instance);
> > + }
>
> So the 'leader->subgraph_entries.safe_push (instance)' is actually done
> unconditionally (the leader is leader of itself), only the debug dump is
> conditional.
Yeah, refactored as well.
I've pushed the change as follows after re-bootstrapping/testing on
x86_64-unknown-linux-gnu.
Richard.
This makes the BB vectorizer cost independent SLP subgraphs
separately. While on pristine trunk and for x86_64 I failed to
distill a testcase where the vectorizer would think _any_
basic-block vectorization opportunity is not profitable I do
have pending work that would make the cost savings of a
profitable opportunity make another independently not
profitable opportunity vectorized.
2020-09-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/96043
* tree-vectorizer.h (_slp_instance::cost_vec): New.
(_slp_instance::subgraph_entries): Likewise.
(BB_VINFO_TARGET_COST_DATA): Remove.
* tree-vect-slp.c (vect_free_slp_instance): Free
cost_vec and subgraph_entries.
(vect_analyze_slp_instance): Initialize them.
(vect_slp_analyze_operations): Defer passing costs to
the target, instead record them in the SLP graph entry.
(get_ultimate_leader): New helper for graph partitioning.
(vect_bb_partition_graph_r): Likewise.
(vect_bb_partition_graph): New function to partition the
SLP graph into independently costable parts.
(vect_bb_vectorization_profitable_p): Adjust to work on
a subgraph.
(vect_bb_vectorization_profitable_p): New wrapper,
discarding non-profitable vectorization of subgraphs.
(vect_slp_analyze_bb_1): Call vect_bb_partition_graph before
costing.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c: Adjust.
---
.../vect/costmodel/x86_64/costmodel-pr69297.c | 20 +-
gcc/tree-vect-slp.c | 185 ++++++++++++++++--
gcc/tree-vectorizer.h | 8 +-
3 files changed, 194 insertions(+), 19 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
index e65a30c06d6..ef74785f6a8 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
@@ -74,10 +74,28 @@ foo (int* diff)
d[13] = m[12] - m[13];
d[14] = m[14] + m[15];
d[15] = m[15] - m[14];
+ /* The following obviously profitable part should not make
+ the former unprofitable one profitable. */
+ diff[16 + 16] = diff[16];
+ diff[17 + 16] = diff[17];
+ diff[18 + 16] = diff[18];
+ diff[19 + 16] = diff[19];
+ diff[20 + 16] = diff[20];
+ diff[21 + 16] = diff[21];
+ diff[22 + 16] = diff[22];
+ diff[23 + 16] = diff[23];
+ diff[24 + 16] = diff[24];
+ diff[25 + 16] = diff[25];
+ diff[26 + 16] = diff[26];
+ diff[27 + 16] = diff[27];
+ diff[28 + 16] = diff[28];
+ diff[29 + 16] = diff[29];
+ diff[30 + 16] = diff[30];
+ diff[31 + 16] = diff[31];
for (k=0; k<16; k++)
satd += abs(d[k]);
return satd;
}
/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp1" } } */
-/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp1" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing SLP tree" 1 "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 7860fe3f32b..458a2c5bb30 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -126,6 +126,8 @@ vect_free_slp_instance (slp_instance instance, bool final_p)
{
vect_free_slp_tree (SLP_INSTANCE_TREE (instance), final_p);
SLP_INSTANCE_LOADS (instance).release ();
+ instance->subgraph_entries.release ();
+ instance->cost_vec.release ();
free (instance);
}
@@ -2263,6 +2265,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
SLP_INSTANCE_LOADS (new_instance) = vNULL;
SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info : NULL;
new_instance->reduc_phis = NULL;
+ new_instance->cost_vec = vNULL;
+ new_instance->subgraph_entries = vNULL;
vect_gather_slp_loads (new_instance, node);
if (dump_enabled_p ())
@@ -3133,8 +3137,8 @@ vect_slp_analyze_operations (vec_info *vinfo)
visited.add (*x);
i++;
- add_stmt_costs (vinfo, vinfo->target_cost_data, &cost_vec);
- cost_vec.release ();
+ /* Remember the SLP graph entry cost for later. */
+ instance->cost_vec = cost_vec;
}
}
@@ -3142,18 +3146,108 @@ vect_slp_analyze_operations (vec_info *vinfo)
if (bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo))
{
hash_set<stmt_vec_info> svisited;
- stmt_vector_for_cost cost_vec;
- cost_vec.create (2);
for (i = 0; vinfo->slp_instances.iterate (i, &instance); ++i)
vect_bb_slp_mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE (instance),
- instance, &cost_vec, svisited);
- add_stmt_costs (vinfo, vinfo->target_cost_data, &cost_vec);
- cost_vec.release ();
+ instance, &instance->cost_vec, svisited);
}
return !vinfo->slp_instances.is_empty ();
}
+/* Get the SLP instance leader from INSTANCE_LEADER thereby transitively
+ closing the eventual chain. */
+
+static slp_instance
+get_ultimate_leader (slp_instance instance,
+ hash_map<slp_instance, slp_instance> &instance_leader)
+{
+ auto_vec<slp_instance *, 8> chain;
+ slp_instance *tem;
+ while (*(tem = instance_leader.get (instance)) != instance)
+ {
+ chain.safe_push (tem);
+ instance = *tem;
+ }
+ while (!chain.is_empty ())
+ *chain.pop () = instance;
+ return instance;
+}
+
+/* Worker of vect_bb_partition_graph, recurse on NODE. */
+
+static void
+vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
+ slp_instance instance, slp_tree node,
+ hash_map<stmt_vec_info, slp_instance> &stmt_to_instance,
+ hash_map<slp_instance, slp_instance> &instance_leader)
+{
+ stmt_vec_info stmt_info;
+ unsigned i;
+ bool all = true;
+ FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+ {
+ bool existed_p;
+ slp_instance &stmt_instance
+ = stmt_to_instance.get_or_insert (stmt_info, &existed_p);
+ if (!existed_p)
+ all = false;
+ else if (stmt_instance != instance)
+ {
+ /* If we're running into a previously marked stmt make us the
+ leader of the current ultimate leader. This keeps the
+ leader chain acyclic and works even when the current instance
+ connects two previously independent graph parts. */
+ slp_instance stmt_leader
+ = get_ultimate_leader (stmt_instance, instance_leader);
+ if (stmt_leader != instance)
+ instance_leader.put (stmt_leader, instance);
+ }
+ stmt_instance = instance;
+ }
+ /* If not all stmts had been visited we have to recurse on children. */
+ if (all)
+ return;
+
+ slp_tree child;
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+ if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
+ vect_bb_partition_graph_r (bb_vinfo, instance, child, stmt_to_instance,
+ instance_leader);
+}
+
+/* Partition the SLP graph into pieces that can be costed independently. */
+
+static void
+vect_bb_partition_graph (bb_vec_info bb_vinfo)
+{
+ DUMP_VECT_SCOPE ("vect_bb_partition_graph");
+
+ /* First walk the SLP graph assigning each involved scalar stmt a
+ corresponding SLP graph entry and upon visiting a previously
+ marked stmt, make the stmts leader the current SLP graph entry. */
+ hash_map<stmt_vec_info, slp_instance> stmt_to_instance;
+ hash_map<slp_instance, slp_instance> instance_leader;
+ slp_instance instance;
+ for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i)
+ {
+ instance_leader.put (instance, instance);
+ vect_bb_partition_graph_r (bb_vinfo,
+ instance, SLP_INSTANCE_TREE (instance),
+ stmt_to_instance, instance_leader);
+ }
+
+ /* Then collect entries to each independent subgraph. */
+ for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i)
+ {
+ slp_instance leader = get_ultimate_leader (instance, instance_leader);
+ leader->subgraph_entries.safe_push (instance);
+ if (dump_enabled_p ()
+ && leader != instance)
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "instance %p is leader of %p\n",
+ leader, instance);
+ }
+}
/* Compute the scalar cost of the SLP node NODE and its children
and return it. Do not account defs that are marked in LIFE and
@@ -3261,18 +3355,22 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
}
}
-/* Check if vectorization of the basic block is profitable. */
+/* Check if vectorization of the basic block is profitable for the
+ subgraph denoted by SLP_INSTANCES. */
static bool
-vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo)
+vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
+ vec<slp_instance> slp_instances)
{
- vec<slp_instance> slp_instances = BB_VINFO_SLP_INSTANCES (bb_vinfo);
slp_instance instance;
int i;
unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0;
unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0;
- /* Calculate scalar cost. */
+ void *vect_target_cost_data = init_cost (NULL);
+
+ /* Calculate scalar cost and sum the cost for the vector stmts
+ previously collected. */
stmt_vector_for_cost scalar_costs;
scalar_costs.create (0);
hash_set<slp_tree> visited;
@@ -3284,22 +3382,25 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo)
vect_bb_slp_scalar_cost (bb_vinfo,
SLP_INSTANCE_TREE (instance),
&life, &scalar_costs, visited);
+ add_stmt_costs (bb_vinfo, vect_target_cost_data, &instance->cost_vec);
+ instance->cost_vec.release ();
}
/* Unset visited flag. */
stmt_info_for_cost *si;
FOR_EACH_VEC_ELT (scalar_costs, i, si)
gimple_set_visited (si->stmt_info->stmt, false);
- void *target_cost_data = init_cost (NULL);
- add_stmt_costs (bb_vinfo, target_cost_data, &scalar_costs);
+ void *scalar_target_cost_data = init_cost (NULL);
+ add_stmt_costs (bb_vinfo, scalar_target_cost_data, &scalar_costs);
scalar_costs.release ();
unsigned dummy;
- finish_cost (target_cost_data, &dummy, &scalar_cost, &dummy);
- destroy_cost_data (target_cost_data);
+ finish_cost (scalar_target_cost_data, &dummy, &scalar_cost, &dummy);
+ destroy_cost_data (scalar_target_cost_data);
- /* Complete the target-specific cost calculation. */
- finish_cost (BB_VINFO_TARGET_COST_DATA (bb_vinfo), &vec_prologue_cost,
+ /* Complete the target-specific vector cost calculation. */
+ finish_cost (vect_target_cost_data, &vec_prologue_cost,
&vec_inside_cost, &vec_epilogue_cost);
+ destroy_cost_data (vect_target_cost_data);
vec_outside_cost = vec_prologue_cost + vec_epilogue_cost;
@@ -3324,6 +3425,54 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo)
return true;
}
+/* For each SLP subgraph determine profitability and remove parts not so.
+ Returns true if any profitable to vectorize subgraph remains. */
+
+static bool
+vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo)
+{
+ slp_instance instance;
+ unsigned i;
+
+ auto_vec<slp_instance> subgraphs (BB_VINFO_SLP_INSTANCES (bb_vinfo).length ());
+ FOR_EACH_VEC_ELT (BB_VINFO_SLP_INSTANCES (bb_vinfo), i, instance)
+ if (!instance->subgraph_entries.is_empty ())
+ subgraphs.quick_push (instance);
+ BB_VINFO_SLP_INSTANCES (bb_vinfo).truncate (0);
+ for (i = 0; i < subgraphs.length ();)
+ {
+ instance = subgraphs[i];
+ if (!vect_bb_vectorization_profitable_p (bb_vinfo,
+ instance->subgraph_entries))
+ {
+ /* ??? We need to think of providing better dump/opt-report
+ locations here. */
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "not vectorized: vectorization is not "
+ "profitable.\n");
+ }
+ slp_instance entry;
+ unsigned j;
+ FOR_EACH_VEC_ELT (instance->subgraph_entries, j, entry)
+ if (entry != instance)
+ vect_free_slp_instance (entry, false);
+ vect_free_slp_instance (instance, false);
+ subgraphs.ordered_remove (i);
+ }
+ else
+ {
+ slp_instance entry;
+ unsigned j;
+ FOR_EACH_VEC_ELT (instance->subgraph_entries, j, entry)
+ BB_VINFO_SLP_INSTANCES (bb_vinfo).safe_push (entry);
+ ++i;
+ }
+ }
+ return !BB_VINFO_SLP_INSTANCES (bb_vinfo).is_empty ();
+}
+
/* Find any vectorizable constructors and add them to the grouped_store
array. */
@@ -3465,6 +3614,8 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal,
return false;
}
+ vect_bb_partition_graph (bb_vinfo);
+
/* Cost model: check if the vectorization is worthwhile. */
if (!unlimited_cost_model (NULL)
&& !vect_bb_vectorization_profitable_p (bb_vinfo))
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index fb1111c8615..8bf33137395 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -183,6 +183,13 @@ public:
/* The SLP node containing the reduction PHIs. */
slp_tree reduc_phis;
+
+ /* Vector cost of this entry to the SLP graph. */
+ stmt_vector_for_cost cost_vec;
+
+ /* If this instance is the main entry of a subgraph the set of
+ entries into the same subgraph, including itself. */
+ vec<_slp_instance *> subgraph_entries;
} *slp_instance;
@@ -913,7 +920,6 @@ public:
#define BB_VINFO_SLP_INSTANCES(B) (B)->slp_instances
#define BB_VINFO_DATAREFS(B) (B)->shared->datarefs
#define BB_VINFO_DDRS(B) (B)->shared->ddrs
-#define BB_VINFO_TARGET_COST_DATA(B) (B)->target_cost_data
static inline bb_vec_info
vec_info_for_bb (basic_block bb)
--
2.26.2
More information about the Gcc-patches
mailing list