This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Don't vectorise zero-step rmw operations (PR 84485)
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 02 Mar 2018 14:12:22 +0000
- Subject: Re: Don't vectorise zero-step rmw operations (PR 84485)
- Authentication-results: sourceware.org; auth=none
- References: <87muzt8ac4.fsf@linaro.org> <CAFiYyc3Hop+HxBXxMzD9gQu77hOCc83=Hi26b9Pa5fY1vo9Q-A@mail.gmail.com> <87y3jc81qq.fsf@linaro.org> <CAFiYyc3zXjbMjh8pRrXMzO8TusU=nZeninHMrov40tJdD+bKew@mail.gmail.com>
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> GCC 6 and 7 would vectorise:
>>>>
>>>> void
>>>> f (unsigned long incx, unsigned long incy,
>>>> float *restrict dx, float *restrict dy)
>>>> {
>>>> unsigned long ix = 0, iy = 0;
>>>> for (unsigned long i = 0; i < 512; ++i)
>>>> {
>>>> dy[iy] += dx[ix];
>>>> ix += incx;
>>>> iy += incy;
>>>> }
>>>> }
>>>>
>>>> without first proving that incy is nonzero. This is a regression from
>>>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based
>>>> on whether incy is zero, but that's obviously too invasive to backport.
>>>> This patch instead bails out for non-constant steps in the place that
>>>> trunk would try a check for zeroness.
>>>>
>>>> I did wonder about trying to use range information to prove nonzeroness
>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem
>>>> suitable for a release branch.
>>>>
>>>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase
>>>> to trunk too.
>>>
>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or
>>> DR_STEP (drb)?
>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
>>> Even when not using range-info I think that using !
>>> tree_expr_nonzero_p (DR_STEP (dra))
>>> is more to the point of the issue we're fixing -- that also would catch
>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
>>> patch wouldn't so it looks like a more "complete" fix.
>>
>> OK.
>>
>>> Last but not least trunk and your patch guards all this by
>>> !loop->force_vectorize.
>>> But force_vectorize doesn't give any such guarantee that step isn't
>>> zero so I wonder
>>> why we deliberately choose to possibly miscompile stuff here?
>>
>> This was based on the pre-existing:
>>
>> if (loop_vinfo && integer_zerop (step))
>> {
>> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
>> if (!nested_in_vect_loop_p (loop, stmt))
>> return DR_IS_READ (dr);
>> /* Allow references with zero step for outer loops marked
>> with pragma omp simd only - it guarantees absence of
>> loop-carried dependencies between inner loop iterations. */
>> if (!loop->force_vectorize)
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> "zero step in inner loop of nest\n");
>> return false;
>> }
>> }
>>
>> AIUI #pragma omp simd really does guarantee that iterations of
>> the loop can be executed concurrently (up to the limit given by
>> safelen if present). So something like:
>>
>> #pragma omp simd
>> for (int i = 0; i < n; ++i)
>> a[i * step] += 1;
>>
>> would be incorrect for step==0. (#pragma ordered simd forces
>> things to be executed in order where necessary.)
>
> But then we should check safelen, not force_vectorize...
OK. Following on from irc, this version uses expr_not_equal_to
instead of tree_expr_nonzero_p. It also adds safelen to the existing
use of force_vectorize (which seemed safer than replacing it).
Tested on aarch64-linux-gnu. OK to install?
Richard
2018-02-28 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/84485
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
true for zero dependence distances if the step might be zero,
and if there is no metadata that guarantees correctness.
(vect_analyze_data_ref_access): Check safelen as well as
force_vectorize.
gcc/testsuite/
PR tree-optimization/84485
* gcc.dg/vect/pr84485.c: New test.
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c 2018-02-28 14:19:22.326678337 +0000
+++ gcc/tree-vect-data-refs.c 2018-03-02 13:28:06.339321095 +0000
@@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
}
}
+ unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));
+ if (loop->safelen < 2
+ && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "step could be zero.\n");
+ return true;
+ }
+
continue;
}
@@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat
/* Allow references with zero step for outer loops marked
with pragma omp simd only - it guarantees absence of
loop-carried dependencies between inner loop iterations. */
- if (!loop->force_vectorize)
+ if (!loop->force_vectorize || loop->safelen < 2)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
Index: gcc/testsuite/gcc.dg/vect/pr84485.c
===================================================================
--- /dev/null 2018-03-01 08:17:49.562264353 +0000
+++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+#define N 256
+
+void __attribute__ ((noinline, noclone))
+f (unsigned long incx, unsigned long incy,
+ float *restrict dx, float *restrict dy)
+{
+ unsigned long ix = 0, iy = 0;
+ for (unsigned long i = 0; i < N; ++i)
+ {
+ dy[iy] += dx[ix];
+ ix += incx;
+ iy += incy;
+ }
+}
+
+float a = 0.0;
+float b[N];
+
+int
+main (void)
+{
+ check_vect ();
+
+ for (int i = 0; i < N; ++i)
+ b[i] = i;
+ f (1, 0, b, &a);
+ if (a != N * (N - 1) / 2)
+ __builtin_abort ();
+ return 0;
+}