This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <richard dot guenther at gmail dot com>
- Date: Tue, 19 Sep 2017 14:28:13 -0500
- Subject: Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)
- Authentication-results: sourceware.org; auth=none
- References: <7570cb71-cb74-d97f-3b7a-b161631e36c5@linux.vnet.ibm.com>
On 9/19/17 12:38 PM, Bill Schmidt wrote:
> Hi,
>
> https://gcc.gnu.org/PR82255 identifies a problem in the vector cost model
> where a vectorized load is treated as having the cost of a strided load
> in a case where we will not actually generate a strided load. This is
> simply a mismatch between the conditions tested in the cost model and
> those tested in the code that generates vectorized instructions. This
> patch fixes the problem by recognizing when only a single non-strided
> load will be generated and reporting the cost accordingly.
>
> I believe this patch is sufficient to catch all such cases, but I admit
> that the code in vectorizable_load is complex enough that I could have
> missed a trick.
>
> I've added a test in the PowerPC cost model subdirectory. Even though
> this isn't a target-specific issue, the test does rely on a 16-byte
> vector size, so this seems safest.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this ok for trunk?
After posting, I realized that I had wrongly recalculated stmt_info
in the patch. Here's a new version (also passing regstrap) that removes
that flaw.
[gcc]
2017-09-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR tree-optimization/82255
* tree-vect-stmts.c (vect_model_load_cost): Don't count
vec_construct cost when a true strided load isn't present.
[gcc/testsuite]
2017-09-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR tree-optimization/82255
* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New file.
Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+/* PR82255: Ensure we don't require a vec_construct cost when we aren't
+ going to generate a strided load. */
+
+extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) __attribute__ ((__const__));
+
+static int
+foo (unsigned char *w, int i, unsigned char *x, int j)
+{
+ int tot = 0;
+ for (int a = 0; a < 16; a++)
+ {
+ for (int b = 0; b < 16; b++)
+ tot += abs (w[b] - x[b]);
+ w += i;
+ x += j;
+ }
+ return tot;
+}
+
+void
+bar (unsigned char *w, unsigned char *x, int i, int *result)
+{
+ *result = foo (w, 16, x, i);
+}
+
+/* { dg-final { scan-tree-dump-times "vec_construct required" 0 "vect" } } */
+
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c (revision 252760)
+++ gcc/tree-vect-stmts.c (working copy)
@@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
prologue_cost_vec, body_cost_vec, true);
if (memory_access_type == VMAT_ELEMENTWISE
|| memory_access_type == VMAT_STRIDED_SLP)
- inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
- stmt_info, 0, vect_body);
+ {
+ int group_size = GROUP_SIZE (stmt_info);
+ int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+ if (group_size < nunits)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "vect_model_load_cost: vec_construct required");
+ inside_cost += record_stmt_cost (body_cost_vec, ncopies,
+ vec_construct, stmt_info, 0,
+ vect_body);
+ }
+ }
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,