This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Don't vectorise zero-step rmw operations (PR 84485)
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 01 Mar 2018 11:38:21 +0000
- Subject: Re: Don't vectorise zero-step rmw operations (PR 84485)
- Authentication-results: sourceware.org; auth=none
- References: <87muzt8ac4.fsf@linaro.org> <CAFiYyc3Hop+HxBXxMzD9gQu77hOCc83=Hi26b9Pa5fY1vo9Q-A@mail.gmail.com>
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> GCC 6 and 7 would vectorise:
>>
>> void
>> f (unsigned long incx, unsigned long incy,
>> float *restrict dx, float *restrict dy)
>> {
>> unsigned long ix = 0, iy = 0;
>> for (unsigned long i = 0; i < 512; ++i)
>> {
>> dy[iy] += dx[ix];
>> ix += incx;
>> iy += incy;
>> }
>> }
>>
>> without first proving that incy is nonzero. This is a regression from
>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based
>> on whether incy is zero, but that's obviously too invasive to backport.
>> This patch instead bails out for non-constant steps in the place that
>> trunk would try a check for zeroness.
>>
>> I did wonder about trying to use range information to prove nonzeroness
>> for SSA_NAMEs, but that would be entirely new code and didn't seem
>> suitable for a release branch.
>>
>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase
>> to trunk too.
>
> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)?
> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
> Even when not using range-info I think that using !
> tree_expr_nonzero_p (DR_STEP (dra))
> is more to the point of the issue we're fixing -- that also would catch
> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
> patch wouldn't so it looks like a more "complete" fix.
OK.
> Last but not least trunk and your patch guards all this by
> !loop->force_vectorize.
> But force_vectorize doesn't give any such guarantee that step isn't
> zero so I wonder
> why we deliberately choose to possibly miscompile stuff here?
This was based on the pre-existing:
if (loop_vinfo && integer_zerop (step))
{
GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
if (!nested_in_vect_loop_p (loop, stmt))
return DR_IS_READ (dr);
/* Allow references with zero step for outer loops marked
with pragma omp simd only - it guarantees absence of
loop-carried dependencies between inner loop iterations. */
if (!loop->force_vectorize)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"zero step in inner loop of nest\n");
return false;
}
}
AIUI #pragma omp simd really does guarantee that iterations of
the loop can be executed concurrently (up to the limit given by
safelen if present). So something like:
#pragma omp simd
for (int i = 0; i < n; ++i)
a[i * step] += 1;
would be incorrect for step==0. (#pragma ordered simd forces
things to be executed in order where necessary.)
Thanks,
Richard
> Thus I'd like to see a simpler
>
> + if (! tree_expr_nonzero_p (DR_STEP (dra)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "step could be zero.\n");
> + return true;
> + }
>
> if you think that works out and also the force_vectorize guard removed from the
> trunk version.
>
> OK with that change.
>
> Thanks,
> Richard.
>
>> Richard
>>
>>
>> 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org>
>>
>> gcc/
>> PR tree-optimization/84485
>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
>> true for zero dependence distances if either step is variable, and if
>> there is no metadata that guarantees correctness.
>>
>> gcc/testsuite/
>> PR tree-optimization/84485
>> * gcc.dg/vect/pr84485.c: New test.
>>
>> Index: gcc/tree-vect-data-refs.c
>> ===================================================================
>> --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100
>> +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000
>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
>> }
>> }
>>
>> + if (!loop->force_vectorize
>> + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST
>> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))
>> + {
>> + if (dump_enabled_p ())
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "step could be zero.\n");
>> + return true;
>> + }
>> +
>> continue;
>> }
>>
>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c
>> ===================================================================
>> --- /dev/null 2018-02-26 10:26:41.624847060 +0000
>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000
>> @@ -0,0 +1,34 @@
>> +/* { dg-do run } */
>> +
>> +#include "tree-vect.h"
>> +
>> +#define N 256
>> +
>> +void __attribute__ ((noinline, noclone))
>> +f (unsigned long incx, unsigned long incy,
>> + float *restrict dx, float *restrict dy)
>> +{
>> + unsigned long ix = 0, iy = 0;
>> + for (unsigned long i = 0; i < N; ++i)
>> + {
>> + dy[iy] += dx[ix];
>> + ix += incx;
>> + iy += incy;
>> + }
>> +}
>> +
>> +float a = 0.0;
>> +float b[N];
>> +
>> +int
>> +main (void)
>> +{
>> + check_vect ();
>> +
>> + for (int i = 0; i < N; ++i)
>> + b[i] = i;
>> + f (1, 0, b, &a);
>> + if (a != N * (N - 1) / 2)
>> + __builtin_abort ();
>> + return 0;
>> +}