This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Don't vectorise zero-step rmw operations (PR 84485)


On Fri, Mar 2, 2018 at 3:12 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> GCC 6 and 7 would vectorise:
>>>>>
>>>>> void
>>>>> f (unsigned long incx, unsigned long incy,
>>>>>    float *restrict dx, float *restrict dy)
>>>>> {
>>>>>   unsigned long ix = 0, iy = 0;
>>>>>   for (unsigned long i = 0; i < 512; ++i)
>>>>>     {
>>>>>       dy[iy] += dx[ix];
>>>>>       ix += incx;
>>>>>       iy += incy;
>>>>>     }
>>>>> }
>>>>>
>>>>> without first proving that incy is nonzero.  This is a regression from
>>>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based
>>>>> on whether incy is zero, but that's obviously too invasive to backport.
>>>>> This patch instead bails out for non-constant steps in the place that
>>>>> trunk would try a check for zeroness.
>>>>>
>>>>> I did wonder about trying to use range information to prove nonzeroness
>>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem
>>>>> suitable for a release branch.
>>>>>
>>>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase
>>>>> to trunk too.
>>>>
>>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or
>>>> DR_STEP (drb)?
>>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
>>>> Even when not using range-info I think that using !
>>>> tree_expr_nonzero_p (DR_STEP (dra))
>>>> is more to the point of the issue we're fixing -- that also would catch
>>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
>>>> patch wouldn't so it looks like a more "complete" fix.
>>>
>>> OK.
>>>
>>>> Last but not least trunk and your patch guards all this by
>>>> !loop->force_vectorize.
>>>> But force_vectorize doesn't give any such guarantee that step isn't
>>>> zero so I wonder
>>>> why we deliberately choose to possibly miscompile stuff here?
>>>
>>> This was based on the pre-existing:
>>>
>>>   if (loop_vinfo && integer_zerop (step))
>>>     {
>>>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
>>>       if (!nested_in_vect_loop_p (loop, stmt))
>>>         return DR_IS_READ (dr);
>>>       /* Allow references with zero step for outer loops marked
>>>          with pragma omp simd only - it guarantees absence of
>>>          loop-carried dependencies between inner loop iterations.  */
>>>       if (!loop->force_vectorize)
>>>         {
>>>           if (dump_enabled_p ())
>>>             dump_printf_loc (MSG_NOTE, vect_location,
>>>                              "zero step in inner loop of nest\n");
>>>           return false;
>>>         }
>>>     }
>>>
>>> AIUI #pragma omp simd really does guarantee that iterations of
>>> the loop can be executed concurrently (up to the limit given by
>>> safelen if present).  So something like:
>>>
>>>   #pragma omp simd
>>>   for (int i = 0; i < n; ++i)
>>>     a[i * step] += 1;
>>>
>>> would be incorrect for step==0.  (#pragma ordered simd forces
>>> things to be executed in order where necessary.)
>>
>> But then we should check safelen, not force_vectorize...
>
> OK.  Following on from irc, this version uses expr_not_equal_to
> instead of tree_expr_nonzero_p.  It also adds safelen to the existing
> use of force_vectorize (which seemed safer than replacing it).
>
> Tested on aarch64-linux-gnu.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         PR tree-optimization/84485
>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
>         true for zero dependence distances if the step might be zero,
>         and if there is no metadata that guarantees correctness.
>         (vect_analyze_data_ref_access): Check safelen as well as
>         force_vectorize.
>
> gcc/testsuite/
>         PR tree-optimization/84485
>         * gcc.dg/vect/pr84485.c: New test.
>
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2018-02-28 14:19:22.326678337 +0000
> +++ gcc/tree-vect-data-refs.c   2018-03-02 13:28:06.339321095 +0000
> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
>                 }
>             }
>
> +         unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));
> +         if (loop->safelen < 2
> +             && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "step could be zero.\n");
> +             return true;
> +           }
> +
>           continue;
>         }
>
> @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat
>        /* Allow references with zero step for outer loops marked
>          with pragma omp simd only - it guarantees absence of
>          loop-carried dependencies between inner loop iterations.  */
> -      if (!loop->force_vectorize)
> +      if (!loop->force_vectorize || loop->safelen < 2)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
> Index: gcc/testsuite/gcc.dg/vect/pr84485.c
> ===================================================================
> --- /dev/null   2018-03-01 08:17:49.562264353 +0000
> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +
> +#include "tree-vect.h"
> +
> +#define N 256
> +
> +void __attribute__ ((noinline, noclone))
> +f (unsigned long incx, unsigned long incy,
> +   float *restrict dx, float *restrict dy)
> +{
> +  unsigned long ix = 0, iy = 0;
> +  for (unsigned long i = 0; i < N; ++i)
> +    {
> +      dy[iy] += dx[ix];
> +      ix += incx;
> +      iy += incy;
> +    }
> +}
> +
> +float a = 0.0;
> +float b[N];
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  for (int i = 0; i < N; ++i)
> +    b[i] = i;
> +  f (1, 0, b, &a);
> +  if (a != N * (N - 1) / 2)
> +    __builtin_abort ();
> +  return 0;
> +}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]