Don't vectorise zero-step rmw operations (PR 84485)

Richard Biener richard.guenther@gmail.com
Thu Mar 1 13:58:00 GMT 2018


On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> GCC 6 and 7 would vectorise:
>>>
>>> void
>>> f (unsigned long incx, unsigned long incy,
>>>    float *restrict dx, float *restrict dy)
>>> {
>>>   unsigned long ix = 0, iy = 0;
>>>   for (unsigned long i = 0; i < 512; ++i)
>>>     {
>>>       dy[iy] += dx[ix];
>>>       ix += incx;
>>>       iy += incy;
>>>     }
>>> }
>>>
>>> without first proving that incy is nonzero.  This is a regression from
>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based
>>> on whether incy is zero, but that's obviously too invasive to backport.
>>> This patch instead bails out for non-constant steps in the place that
>>> trunk would try a check for zeroness.
>>>
>>> I did wonder about trying to use range information to prove nonzeroness
>>> for SSA_NAMEs, but that would be entirely new code and didn't seem
>>> suitable for a release branch.
>>>
>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase
>>> to trunk too.
>>
>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)?
>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
>> Even when not using range-info I think that using !
>> tree_expr_nonzero_p (DR_STEP (dra))
>> is more to the point of the issue we're fixing -- that also would catch
>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
>> patch wouldn't so it looks like a more "complete" fix.
>
> OK.
>
>> Last but not least trunk and your patch guards all this by
>> !loop->force_vectorize.
>> But force_vectorize doesn't give any such guarantee that step isn't
>> zero so I wonder
>> why we deliberately choose to possibly miscompile stuff here?
>
> This was based on the pre-existing:
>
>   if (loop_vinfo && integer_zerop (step))
>     {
>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
>       if (!nested_in_vect_loop_p (loop, stmt))
>         return DR_IS_READ (dr);
>       /* Allow references with zero step for outer loops marked
>          with pragma omp simd only - it guarantees absence of
>          loop-carried dependencies between inner loop iterations.  */
>       if (!loop->force_vectorize)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "zero step in inner loop of nest\n");
>           return false;
>         }
>     }
>
> AIUI #pragma omp simd really does guarantee that iterations of
> the loop can be executed concurrently (up to the limit given by
> safelen if present).  So something like:
>
>   #pragma omp simd
>   for (int i = 0; i < n; ++i)
>     a[i * step] += 1;
>
> would be incorrect for step==0.  (#pragma ordered simd forces
> things to be executed in order where necessary.)

But then we should check safelen, not force_vectorize...

Richard.

> Thanks,
> Richard
>
>> Thus I'd like to see a simpler
>>
>> +         if (! tree_expr_nonzero_p (DR_STEP (dra)))
>> +           {
>> +             if (dump_enabled_p ())
>> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                                "step could be zero.\n");
>> +             return true;
>> +           }
>>
>> if you think that works out and also the force_vectorize guard removed from the
>> trunk version.
>>
>> OK with that change.
>>
>> Thanks,
>> Richard.
>>
>>> Richard
>>>
>>>
>>> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>
>>>
>>> gcc/
>>>         PR tree-optimization/84485
>>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
>>>         true for zero dependence distances if either step is variable, and if
>>>         there is no metadata that guarantees correctness.
>>>
>>> gcc/testsuite/
>>>         PR tree-optimization/84485
>>>         * gcc.dg/vect/pr84485.c: New test.
>>>
>>> Index: gcc/tree-vect-data-refs.c
>>> ===================================================================
>>> --- gcc/tree-vect-data-refs.c   2017-07-27 18:08:43.779978373 +0100
>>> +++ gcc/tree-vect-data-refs.c   2018-02-28 14:16:36.621113244 +0000
>>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
>>>                 }
>>>             }
>>>
>>> +         if (!loop->force_vectorize
>>> +             && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST
>>> +                 || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))
>>> +           {
>>> +             if (dump_enabled_p ())
>>> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> +                                "step could be zero.\n");
>>> +             return true;
>>> +           }
>>> +
>>>           continue;
>>>         }
>>>
>>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c
>>> ===================================================================
>>> --- /dev/null   2018-02-26 10:26:41.624847060 +0000
>>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000
>>> @@ -0,0 +1,34 @@
>>> +/* { dg-do run } */
>>> +
>>> +#include "tree-vect.h"
>>> +
>>> +#define N 256
>>> +
>>> +void __attribute__ ((noinline, noclone))
>>> +f (unsigned long incx, unsigned long incy,
>>> +   float *restrict dx, float *restrict dy)
>>> +{
>>> +  unsigned long ix = 0, iy = 0;
>>> +  for (unsigned long i = 0; i < N; ++i)
>>> +    {
>>> +      dy[iy] += dx[ix];
>>> +      ix += incx;
>>> +      iy += incy;
>>> +    }
>>> +}
>>> +
>>> +float a = 0.0;
>>> +float b[N];
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  check_vect ();
>>> +
>>> +  for (int i = 0; i < N; ++i)
>>> +    b[i] = i;
>>> +  f (1, 0, b, &a);
>>> +  if (a != N * (N - 1) / 2)
>>> +    __builtin_abort ();
>>> +  return 0;
>>> +}



More information about the Gcc-patches mailing list