This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Fix PR84101, account for function ABI details in vectorization costs
- From: Richard Biener <rguenther at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: law at redhat dot com
- Date: Tue, 30 Jan 2018 10:59:59 +0100 (CET)
- Subject: [PATCH] Fix PR84101, account for function ABI details in vectorization costs
- Authentication-results: sourceware.org; auth=none
This patch tries to deal with the "easy" part of a function ABI,
the return value location, in vectorization costing. The testcase
shows that if we vectorize the returned value but the function
doesn't return in memory or in a vector register but as in this
case in an integer register pair (reg:TI ax) (bah, ABI details
exposed late? why's this not a parallel?) we end up spilling
badly.
The idea is to account for such spilling so if vectorization
benefits outweight the spilling we'll vectorize anyway.
I think the particular testcase could be fixed in the subreg
pass basically undoing the vectorization but I realize that
generally this is a too hard problem and avoiding vectorization
is better. Still this patch is somewhat fragile in that it
depends on us "seeing" that the stored to decl is returned
(see cfun_returns).
Bootstrap & regtest running on x86_64-unknown-linux-gnu.
I'd like to hear opinions on my use of hard_function_value
and also from other target maintainers. I'm not sure we
have sufficient testsuite coverage of _profitable_ vectorization
of a return value. Feel free to add to this for your
target.
Ok for trunk?
Thanks,
Richard.
2018-01-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/84101
* tree-vect-stmts.c: Include explow.h for hard_function_value.
(cfun_returns): New helper.
(vect_model_store_cost): When vectorizing a store to a decl
we return and the function ABI returns via a non-vector
register account for the possible spilling that will happen.
* gcc.target/i386/pr84101.c: New testcase.
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c (revision 257139)
+++ gcc/tree-vect-stmts.c (working copy)
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
#include "tree-cfg.h"
#include "tree-ssa-loop-manip.h"
#include "cfgloop.h"
+#include "explow.h"
#include "tree-ssa-loop.h"
#include "tree-scalar-evolution.h"
#include "tree-vectorizer.h"
@@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt
"prologue_cost = %d .\n", inside_cost, prologue_cost);
}
+/* Returns true if the current function returns DECL. */
+
+static bool
+cfun_returns (tree decl)
+{
+ edge_iterator ei;
+ edge e;
+ FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+ {
+ greturn *ret = safe_dyn_cast <greturn *> (last_stmt (e->src));
+ if (ret && gimple_return_retval (ret) == decl)
+ return true;
+ }
+ return false;
+}
+
/* Function vect_model_store_cost
Models cost for stores. In the case of grouped accesses, one access
@@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm
vec_to_scalar, stmt_info, 0, vect_body);
}
+ /* When vectorizing an SLP store into the function result assign
+ a penalty if the function returns in a non-vector register.
+ In this case we assume we'll end up with having to spill the
+ vector result and do element loads. */
+ if (slp_node)
+ {
+ tree base = get_base_address (dr->ref);
+ if (base
+ && (TREE_CODE (base) == RESULT_DECL
+ || (DECL_P (base) && cfun_returns (base))))
+ {
+ rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
+ if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg)))
+ || GET_CODE (reg) == PARALLEL
+ || GET_CODE (reg) == CONCAT)
+ {
+ /* Spill. */
+ inside_cost += record_stmt_cost (body_cost_vec,
+ ncopies, vector_store,
+ stmt_info, 0, vect_body);
+ /* Element loads. */
+ unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+ inside_cost += record_stmt_cost (body_cost_vec,
+ ncopies * assumed_nunits,
+ scalar_load,
+ stmt_info, 0, vect_body);
+ }
+ }
+ }
+
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"vect_model_store_cost: inside_cost = %d, "
Index: gcc/testsuite/gcc.target/i386/pr84101.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slp2-details" } */
+
+typedef struct uint64_pair uint64_pair_t ;
+struct uint64_pair
+{
+ unsigned long w0 ;
+ unsigned long w1 ;
+} ;
+
+uint64_pair_t pair(int num)
+{
+ uint64_pair_t p ;
+
+ p.w0 = num << 1 ;
+ p.w1 = num >> 1 ;
+
+ return p ;
+}
+
+/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */