[PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Mon Jul 3 08:43:00 GMT 2017

On Mon, 3 Jul 2017, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 12 Jun 2017, Bin.Cheng wrote:
> >> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >> > "Bin.Cheng" <amker.cheng@gmail.com> writes:
> >> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> >> >> <richard.sandiford@linaro.org> wrote:
> >> >>> Sorry to return this old patch, but:
> >> >>>
> >> >>> Bin Cheng <Bin.Cheng@arm.com> writes:
> >> >>>> -/* Calculate the number of iterations under which scalar loop will be
> >> >>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> >> >>>> -   iterations of prolog loop.  If it's integer const, the integer
> >> >>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> >> >>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> >> >>>> -   true.  This function also store upper bound of the result in BOUND.  */
> >> >>>> +/* Calculate the number of iterations above which vectorized loop will be
> >> >>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
> >> >>>> +   of prolog loop.  If it's integer const, the integer number is also passed
> >> >>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
> >> >>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
> >> >>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> >> >>>> +   (rather than vectorized) loop will be executed.  This function stores
> >> >>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
> >> >>>>
> >> >>>>  static tree
> >> >>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> >> >>>> -                          int bound_prolog, int vf, int th, int *bound,
> >> >>>> -                          bool check_profitability)
> >> >>>> +                          int bound_prolog, int vfm1, int th,
> >> >>>> +                          int *bound_scalar, bool check_profitability)
> >> >>>>  {
> >> >>>>    tree type = TREE_TYPE (niters_prolog);
> >> >>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> >> >>>> -                          build_int_cst (type, vf));
> >> >>>> +                          build_int_cst (type, vfm1));
> >> >>>>
> >> >>>> -  *bound = vf + bound_prolog;
> >> >>>> +  *bound_scalar = vfm1 + bound_prolog;
> >> >>>>    if (check_profitability)
> >> >>>>      {
> >> >>>> -      th++;
> >> >>>> +      /* TH indicates the minimum niters of vectorized loop, while we
> >> >>>> +      compute the maximum niters of scalar loop.  */
> >> >>>> +      th--;
> >> >>>
> >> >>> Are you sure about this last change?  It looks like it should be dropping
> >> >> Hi Richard,
> >> >> Thanks for spotting this.  I vaguely remember I got this from the way
> >> >> how niter/th was checked in previous peeling code, but did't double
> >> >> check it now.  I tend to believe there is inconsistence about th,
> >> >> especially with comment like:
> >> >>
> >> >>   /* Threshold of number of iterations below which vectorzation will not be
> >> >>      performed. It is calculated from MIN_PROFITABLE_ITERS and
> >> >>      PARAM_MIN_VECT_LOOP_BOUND. */
> >> >>   unsigned int th;
> >> >>
> >> >> I also tend to believe the inconsistence was introduced partly because
> >> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
> >> >> in rest of the compiler.
> >> >>
> >> >> and...,
> >> >>
> >> >>> the increment rather than replacing it with a decrement.
> >> >>>
> >> >>> It looks like the threshold is already the maximum niters for the scalar
> >> >>> loop.  It's set by:
> >> >>>
> >> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> >> >>>                             * vectorization_factor) - 1);
> >> >>>
> >> >>>   /* Use the cost model only if it is more conservative than user specified
> >> >>>      threshold.  */
> >> >>>   th = (unsigned) min_scalar_loop_bound;
> >> >>>   if (min_profitable_iters
> >> >>>       && (!min_scalar_loop_bound
> >> >>>           || min_profitable_iters > min_scalar_loop_bound))
> >> >>>     th = (unsigned) min_profitable_iters;
> >> >>>
> >> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
> >> >>>
> >> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
> >> >>> is the minimum niters for the vector loop.)
> >> >> To be honest, min_scalar_loop_bound is more likely for something's
> >> >> lower bound which is the niters for the vector loop.  If it refers to
> >> >> the niters scalar loop, it is in actuality the "max" value we should
> >> >> use.  I am not quite sure here, partly because I am not a native
> >> >> speaker.
> >> >>
> >> >>>
> >> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
> >> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
> >> >>> instead returns the largest niters for which the scalar loop should be
> >> >>> preferred:
> >> >>>
> >> >>>   /* Cost model disabled.  */
> >> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> >> >>>     {
> >> >>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
> >> >>>       *ret_min_profitable_niters = 0;
> >> >>>       *ret_min_profitable_estimate = 0;
> >> >>>       return;
> >> >>>     }
> >> >>>   [...]
> >> >>>   /* Because the condition we create is:
> >> >>>      if (niters <= min_profitable_iters)
> >> >>>        then skip the vectorized loop.  */
> >> >>>   min_profitable_iters--;
> >> >>>   [...]
> >> >>>   min_profitable_estimate --;
> >> >>>
> >> >>> Also, when deciding whether the threshold needs to be used, we have:
> >> >>>
> >> >>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >> >>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> >> >>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> >> >>>     {
> >> >>>       if (dump_enabled_p ())
> >> >>>         dump_printf_loc (MSG_NOTE, vect_location,
> >> >>>                          "Profitability threshold is %d loop iterations.\n",
> >> >>>                          th);
> >> >>>       check_profitability = true;
> >> >>>     }
> >> >> Yes, this indeed indicates that th refers to the maximum niters of the
> >> >> scalar loop.  Anyway, we should resolve the inconsistence and make it
> >> >> more clear in variable names etc..
> >> >
> >> > Yeah, agree the variable names are really confusing.  Given the choice
> >> > between changing the names to match the current meaning and changing
> >> > the meaning to match the current names, the latter looks like it would
> >> > give cleaner code.  I'm happy to do that it that sounds like the way
> >> > to go.
> >> I am all for this direction, but Richard B may have some to say here.
> >> BTW, will you also fix the other inconsistence you found?
> >
> > I'm all to making the code clearer but I guess it will remain a
> > tricky area.  Each time I need to change sth there I need half an
> > hour to recap how all the stuff is tied together...
> >
> > So please go forward.  Many bonus points if you manage to
> > create testcases covering the mistakes you fix.
> 
> No testcases, sorry, but I built the patched compiler for
> aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
> powerpc64le-linux-gnu, compiled the testsuite with -O3,
> and compared the result with:
> 
> diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c
> --- gcc/tree-vect-loop.c	2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop.c	2017-07-01 16:00:20.266729964 +0100
> @@ -2247,8 +2247,10 @@
>        /* Niters for at least one iteration of vectorized loop.  */
>        niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>        /* One additional iteration because of peeling for gap.  */
> -      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>  	niters_th++;
> +      /* Convert to maximum number of scalar iters.  */
> +      niters_th--;
>        if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
>  	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
>      }
> diff -ur gcc/tree-vect-loop-manip.c /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c
> --- gcc/tree-vect-loop-manip.c	2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c	2017-07-01 14:00:26.775542411 +0100
> @@ -1143,9 +1143,6 @@
>    *bound_scalar = vfm1 + bound_prolog;
>    if (check_profitability)
>      {
> -      /* TH indicates the minimum niters of vectorized loop, while we
> -	 compute the maximum niters of scalar loop.  */
> -      th--;
>        /* Peeling for constant times.  */
>        if (int_niters_prolog >= 0)
>  	{
> 
> which is the patch that would make everything consistent with the
> current meaning (where the "minimum number of iterations" for which
> vectorisation was profitable were actually the maximum number of
> iterations for which vectorisation wasn't profitable).
> 
> Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu.
> OK to install?

Ok.

Thanks,
Richard.

> Richard
> 
> 
> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound,
> 	min_profitable_iters, and th as inclusive lower bounds.
> 	Fix LOOP_VINFO_PEELING_FOR_GAPS condition.
> 	(vect_estimate_min_profitable_iters): Return inclusive lower bounds
> 	for min_profitable_iters and min_profitable_estimate.
> 	(vect_transform_loop): Treat th as an inclusive lower bound.
> 	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
> 
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c	2017-07-03 08:42:51.149380545 +0100
> +++ gcc/tree-vect-loop.c	2017-07-03 08:50:43.507370205 +0100
> @@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_
>        goto again;
>      }
>  
> -  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> -			    * vectorization_factor) - 1);
> +  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> +			   * vectorization_factor);
>  
>    /* Use the cost model only if it is more conservative than user specified
>       threshold.  */
> -  th = (unsigned) min_scalar_loop_bound;
> -  if (min_profitable_iters
> -      && (!min_scalar_loop_bound
> -          || min_profitable_iters > min_scalar_loop_bound))
> -    th = (unsigned) min_profitable_iters;
> +  th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
>  
>    LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>  
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> -      && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> +      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
>      {
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>      estimated_niter = max_niter;
>    if (estimated_niter != -1
>        && ((unsigned HOST_WIDE_INT) estimated_niter
> -          <= MAX (th, (unsigned)min_profitable_estimate)))
> +          < MAX (th, (unsigned) min_profitable_estimate)))
>      {
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
>  
>    /* Decide whether we need to create an epilogue loop to handle
>       remaining scalar iterations.  */
> -  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
> -        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> -       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
> +	 / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> +	* LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>  
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> @@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>        /* Niters for at least one iteration of vectorized loop.  */
>        niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>        /* One additional iteration because of peeling for gap.  */
> -      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>  	niters_th++;
>        if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
>  	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
> @@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop
>    if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
>      {
>        if (vec_outside_cost <= 0)
> -        min_profitable_iters = 1;
> +        min_profitable_iters = 0;
>        else
>          {
>            min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
> @@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop
>    min_profitable_iters =
>  	min_profitable_iters < vf ? vf : min_profitable_iters;
>  
> -  /* Because the condition we create is:
> -     if (niters <= min_profitable_iters)
> -       then skip the vectorized loop.  */
> -  min_profitable_iters--;
> -
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
>                       "  Runtime profitability threshold = %d\n",
> @@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop
>       SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC  */
>  
>    if (vec_outside_cost <= 0)
> -    min_profitable_estimate = 1;
> +    min_profitable_estimate = 0;
>    else
>      {
>        min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
> @@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop
>  				 / ((scalar_single_iter_cost * vf)
>  				   - vec_inside_cost);
>      }
> -  min_profitable_estimate --;
>    min_profitable_estimate = MAX (min_profitable_estimate, min_profitable_iters);
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
> @@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_
>       run at least the vectorization factor number of times checking
>       is pointless, too.  */
>    th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> -  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> +  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
>        && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>      {
>        if (dump_enabled_p ())
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> --- gcc/tree-vect-loop-manip.c	2017-07-03 08:42:46.863565062 +0100
> +++ gcc/tree-vect-loop-manip.c	2017-07-03 08:50:43.506370248 +0100
> @@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop
>    bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
>  
>    if (check_profitability)
> -    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> +    cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
>  			     build_int_cst (TREE_TYPE (scalar_loop_iters),
>  						       th));
>  
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)