[PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Mon Jun 12 08:19:00 GMT 2017

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Sorry to return this old patch, but:
>>
>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>> -/* Calculate the number of iterations under which scalar loop will be
>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>>> -   iterations of prolog loop.  If it's integer const, the integer
>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>>> -   true.  This function also store upper bound of the result in BOUND.  */
>>> +/* Calculate the number of iterations above which vectorized loop will be
>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>>> +   of prolog loop.  If it's integer const, the integer number is also passed
>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>>> +   (rather than vectorized) loop will be executed.  This function stores
>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>>>
>>>  static tree
>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>>> -                          int bound_prolog, int vf, int th, int *bound,
>>> -                          bool check_profitability)
>>> +                          int bound_prolog, int vfm1, int th,
>>> +                          int *bound_scalar, bool check_profitability)
>>>  {
>>>    tree type = TREE_TYPE (niters_prolog);
>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>>> -                          build_int_cst (type, vf));
>>> +                          build_int_cst (type, vfm1));
>>>
>>> -  *bound = vf + bound_prolog;
>>> +  *bound_scalar = vfm1 + bound_prolog;
>>>    if (check_profitability)
>>>      {
>>> -      th++;
>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>>> +      compute the maximum niters of scalar loop.  */
>>> +      th--;
>>
>> Are you sure about this last change?  It looks like it should be dropping
> Hi Richard,
> Thanks for spotting this.  I vaguely remember I got this from the way
> how niter/th was checked in previous peeling code, but did't double
> check it now.  I tend to believe there is inconsistence about th,
> especially with comment like:
>
>   /* Threshold of number of iterations below which vectorzation will not be
>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>      PARAM_MIN_VECT_LOOP_BOUND. */
>   unsigned int th;
>
> I also tend to believe the inconsistence was introduced partly because
> niters in vectorizer stands for latch_niters + 1, while latch_niters
> in rest of the compiler.
>
> and...,
>
>> the increment rather than replacing it with a decrement.
>>
>> It looks like the threshold is already the maximum niters for the scalar
>> loop.  It's set by:
>>
>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>>                             * vectorization_factor) - 1);
>>
>>   /* Use the cost model only if it is more conservative than user specified
>>      threshold.  */
>>   th = (unsigned) min_scalar_loop_bound;
>>   if (min_profitable_iters
>>       && (!min_scalar_loop_bound
>>           || min_profitable_iters > min_scalar_loop_bound))
>>     th = (unsigned) min_profitable_iters;
>>
>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>>
>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>> is the minimum niters for the vector loop.)
> To be honest, min_scalar_loop_bound is more likely for something's
> lower bound which is the niters for the vector loop.  If it refers to
> the niters scalar loop, it is in actuality the "max" value we should
> use.  I am not quite sure here, partly because I am not a native
> speaker.
>
>>
>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>> which the vector loop is used, but vect_estimate_min_profitable_iters
>> instead returns the largest niters for which the scalar loop should be
>> preferred:
>>
>>   /* Cost model disabled.  */
>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>>     {
>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>>       *ret_min_profitable_niters = 0;
>>       *ret_min_profitable_estimate = 0;
>>       return;
>>     }
>>   [...]
>>   /* Because the condition we create is:
>>      if (niters <= min_profitable_iters)
>>        then skip the vectorized loop.  */
>>   min_profitable_iters--;
>>   [...]
>>   min_profitable_estimate --;
>>
>> Also, when deciding whether the threshold needs to be used, we have:
>>
>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>     {
>>       if (dump_enabled_p ())
>>         dump_printf_loc (MSG_NOTE, vect_location,
>>                          "Profitability threshold is %d loop iterations.\n",
>>                          th);
>>       check_profitability = true;
>>     }
> Yes, this indeed indicates that th refers to the maximum niters of the
> scalar loop.  Anyway, we should resolve the inconsistence and make it
> more clear in variable names etc..

Yeah, agree the variable names are really confusing.  Given the choice
between changing the names to match the current meaning and changing
the meaning to match the current names, the latter looks like it would
give cleaner code.  I'm happy to do that it that sounds like the way
to go.

FWIW, I came across this while trying make sense of the "+ 1" in:

  /* Decide whether we need to create an epilogue loop to handle
     remaining scalar iterations.  */
  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);

Thanks,
Richard