Don't vectorise zero-step rmw operations (PR 84485)

Richard Sandiford richard.sandiford@linaro.org
Fri Mar 2 14:12:00 GMT 2018


Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> GCC 6 and 7 would vectorise:
>>>>
>>>> void
>>>> f (unsigned long incx, unsigned long incy,
>>>>    float *restrict dx, float *restrict dy)
>>>> {
>>>>   unsigned long ix = 0, iy = 0;
>>>>   for (unsigned long i = 0; i < 512; ++i)
>>>>     {
>>>>       dy[iy] += dx[ix];
>>>>       ix += incx;
>>>>       iy += incy;
>>>>     }
>>>> }
>>>>
>>>> without first proving that incy is nonzero.  This is a regression from
>>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based
>>>> on whether incy is zero, but that's obviously too invasive to backport.
>>>> This patch instead bails out for non-constant steps in the place that
>>>> trunk would try a check for zeroness.
>>>>
>>>> I did wonder about trying to use range information to prove nonzeroness
>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem
>>>> suitable for a release branch.
>>>>
>>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase
>>>> to trunk too.
>>>
>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or
>>> DR_STEP (drb)?
>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
>>> Even when not using range-info I think that using !
>>> tree_expr_nonzero_p (DR_STEP (dra))
>>> is more to the point of the issue we're fixing -- that also would catch
>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
>>> patch wouldn't so it looks like a more "complete" fix.
>>
>> OK.
>>
>>> Last but not least trunk and your patch guards all this by
>>> !loop->force_vectorize.
>>> But force_vectorize doesn't give any such guarantee that step isn't
>>> zero so I wonder
>>> why we deliberately choose to possibly miscompile stuff here?
>>
>> This was based on the pre-existing:
>>
>>   if (loop_vinfo && integer_zerop (step))
>>     {
>>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
>>       if (!nested_in_vect_loop_p (loop, stmt))
>>         return DR_IS_READ (dr);
>>       /* Allow references with zero step for outer loops marked
>>          with pragma omp simd only - it guarantees absence of
>>          loop-carried dependencies between inner loop iterations.  */
>>       if (!loop->force_vectorize)
>>         {
>>           if (dump_enabled_p ())
>>             dump_printf_loc (MSG_NOTE, vect_location,
>>                              "zero step in inner loop of nest\n");
>>           return false;
>>         }
>>     }
>>
>> AIUI #pragma omp simd really does guarantee that iterations of
>> the loop can be executed concurrently (up to the limit given by
>> safelen if present).  So something like:
>>
>>   #pragma omp simd
>>   for (int i = 0; i < n; ++i)
>>     a[i * step] += 1;
>>
>> would be incorrect for step==0.  (#pragma ordered simd forces
>> things to be executed in order where necessary.)
>
> But then we should check safelen, not force_vectorize...

OK.  Following on from irc, this version uses expr_not_equal_to
instead of tree_expr_nonzero_p.  It also adds safelen to the existing
use of force_vectorize (which seemed safer than replacing it).

Tested on aarch64-linux-gnu.  OK to install?

Richard


2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/84485
	* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
	true for zero dependence distances if the step might be zero,
	and if there is no metadata that guarantees correctness.
	(vect_analyze_data_ref_access): Check safelen as well as
	force_vectorize.

gcc/testsuite/
	PR tree-optimization/84485
	* gcc.dg/vect/pr84485.c: New test.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-02-28 14:19:22.326678337 +0000
+++ gcc/tree-vect-data-refs.c	2018-03-02 13:28:06.339321095 +0000
@@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
 		}
 	    }
 
+	  unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));
+	  if (loop->safelen < 2
+	      && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "step could be zero.\n");
+	      return true;
+	    }
+
 	  continue;
 	}
 
@@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat
       /* Allow references with zero step for outer loops marked
 	 with pragma omp simd only - it guarantees absence of
 	 loop-carried dependencies between inner loop iterations.  */
-      if (!loop->force_vectorize)
+      if (!loop->force_vectorize || loop->safelen < 2)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
Index: gcc/testsuite/gcc.dg/vect/pr84485.c
===================================================================
--- /dev/null	2018-03-01 08:17:49.562264353 +0000
+++ gcc/testsuite/gcc.dg/vect/pr84485.c	2018-03-02 13:28:06.338321095 +0000
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+#define N 256
+
+void __attribute__ ((noinline, noclone))
+f (unsigned long incx, unsigned long incy,
+   float *restrict dx, float *restrict dy)
+{
+  unsigned long ix = 0, iy = 0;
+  for (unsigned long i = 0; i < N; ++i)
+    {
+      dy[iy] += dx[ix];
+      ix += incx;
+      iy += incy;
+    }
+}
+
+float a = 0.0;
+float b[N];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N; ++i)
+    b[i] = i;
+  f (1, 0, b, &a);
+  if (a != N * (N - 1) / 2)
+    __builtin_abort ();
+  return 0;
+}



More information about the Gcc-patches mailing list