This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)


Hello,

PR tree-optimization/53636 is a crash due to an invalid unaligned access
generated by the vectorizer.

The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO
as computed by the default data-ref analysis to decide whether an access
is sufficiently aligned for the vectorizer.

However, this analysis computes the scalar evolution relative to the
innermost loop in which the access takes place; DR_ALIGNED_TO only
reflects the known alignmnent of the *base* address according to
that evolution.

Now, if we're actually about to vectorize this particular loop, then
just checking the alignment of the base is exactly what we need to do
(subsequent accesses will usually be misaligned, but that's OK since
we're transforming those into a vector access).

However, if we're actually currently vectorizing something else, this
test is not sufficient.  The code currently already checks for the
case where we're performing outer-loop vectorization.  In this case,
we need to check alignment of the access on *every* pass through the
inner loop, as the comment states:

  /* In case the dataref is in an inner-loop of the loop that is being
     vectorized (LOOP), we use the base and misalignment information
     relative to the outer-loop (LOOP).  This is ok only if the misalignment
     stays the same throughout the execution of the inner-loop, which is why
     we have to check that the stride of the dataref in the inner-loop evenly
     divides by the vector size.  */

However, there is a second case where we need to check every pass: if
we're not actually vectorizing any loop, but are performing basic-block
SLP.  In this case, it would appear that we need the same check as
described in the comment above, i.e. to verify that the stride is a
multiple of the vector size.

The patch below adds this check, and this indeed fixes the invalid access
I was seeing in the test case (in the final assembler, we now get a vld1.16
instead of vldr).

Tested on arm-linux-gnueabi with no regressions.

OK for mainline?

Bye,
Ulrich


ChangeLog:

	gcc/
	PR tree-optimization/53636
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
	stride when doing basic-block vectorization.

	gcc/testsuite/
	PR tree-optimization/53636
	* gcc.target/arm/pr53636.c: New test.


=== added file 'gcc/testsuite/gcc.target/arm/pr53636.c'
--- gcc/testsuite/gcc.target/arm/pr53636.c	1970-01-01 00:00:00 +0000
+++ gcc/testsuite/gcc.target/arm/pr53636.c	2012-06-11 17:31:41 +0000
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O -ftree-vectorize" } */
+/* { dg-add-options arm_neon } */
+
+void fill (short *buf) __attribute__ ((noinline));
+void fill (short *buf)
+{
+  int i;
+
+  for (i = 0; i < 11 * 8; i++)
+    buf[i] = i;
+}
+
+void test (unsigned char *dst) __attribute__ ((noinline));
+void test (unsigned char *dst)
+{
+  short tmp[11 * 8], *tptr;
+  int i;
+
+  fill (tmp);
+
+  tptr = tmp;
+  for (i = 0; i < 8; i++)
+    {
+      dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7;
+      dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7;
+      dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7;
+      dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7;
+      dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7;
+      dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7;
+      dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7;
+      dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7;
+
+      dst += 8;
+      tptr += 11;
+    }
+}
+
+int main (void)
+{
+  char buf [8 * 8];
+
+  test (buf);
+
+  return 0;
+}
+

=== modified file 'gcc/tree-vect-data-refs.c'
--- gcc/tree-vect-data-refs.c	2012-05-31 08:46:10 +0000
+++ gcc/tree-vect-data-refs.c	2012-06-11 17:31:41 +0000
@@ -845,6 +845,24 @@
 	}
     }
 
+  /* Similarly, if we're doing basic-block vectorization, we can only use
+     base and misalignment information relative to an innermost loop if the
+     misalignment stays the same throughout the execution of the loop.
+     As above, this is the case if the stride of the dataref evenly divides
+     by the vector size.  */
+  if (!loop)
+    {
+      tree step = DR_STEP (dr);
+      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
+
+      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
+	{
+	  if (vect_print_dump_info (REPORT_ALIGNMENT))
+	    fprintf (vect_dump, "SLP: step doesn't divide the vector-size.");
+	  misalign = NULL_TREE;
+	}
+    }
+
   base = build_fold_indirect_ref (base_addr);
   alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
 

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]