[PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
Richard Biener
rguenther@suse.de
Mon Jul 3 08:43:00 GMT 2017
On Mon, 3 Jul 2017, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 12 Jun 2017, Bin.Cheng wrote:
> >> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >> > "Bin.Cheng" <amker.cheng@gmail.com> writes:
> >> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> >> >> <richard.sandiford@linaro.org> wrote:
> >> >>> Sorry to return this old patch, but:
> >> >>>
> >> >>> Bin Cheng <Bin.Cheng@arm.com> writes:
> >> >>>> -/* Calculate the number of iterations under which scalar loop will be
> >> >>>> - preferred than vectorized loop. NITERS_PROLOG is the number of
> >> >>>> - iterations of prolog loop. If it's integer const, the integer
> >> >>>> - number is also passed by INT_NITERS_PROLOG. VF is vector factor;
> >> >>>> - TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> >> >>>> - true. This function also store upper bound of the result in BOUND. */
> >> >>>> +/* Calculate the number of iterations above which vectorized loop will be
> >> >>>> + preferred than scalar loop. NITERS_PROLOG is the number of iterations
> >> >>>> + of prolog loop. If it's integer const, the integer number is also passed
> >> >>>> + in INT_NITERS_PROLOG. BOUND_PROLOG is the upper bound (included) of
> >> >>>> + number of iterations of prolog loop. VFM1 is vector factor minus one.
> >> >>>> + If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> >> >>>> + (rather than vectorized) loop will be executed. This function stores
> >> >>>> + upper bound (included) of the result in BOUND_SCALAR. */
> >> >>>>
> >> >>>> static tree
> >> >>>> vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> >> >>>> - int bound_prolog, int vf, int th, int *bound,
> >> >>>> - bool check_profitability)
> >> >>>> + int bound_prolog, int vfm1, int th,
> >> >>>> + int *bound_scalar, bool check_profitability)
> >> >>>> {
> >> >>>> tree type = TREE_TYPE (niters_prolog);
> >> >>>> tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> >> >>>> - build_int_cst (type, vf));
> >> >>>> + build_int_cst (type, vfm1));
> >> >>>>
> >> >>>> - *bound = vf + bound_prolog;
> >> >>>> + *bound_scalar = vfm1 + bound_prolog;
> >> >>>> if (check_profitability)
> >> >>>> {
> >> >>>> - th++;
> >> >>>> + /* TH indicates the minimum niters of vectorized loop, while we
> >> >>>> + compute the maximum niters of scalar loop. */
> >> >>>> + th--;
> >> >>>
> >> >>> Are you sure about this last change? It looks like it should be dropping
> >> >> Hi Richard,
> >> >> Thanks for spotting this. I vaguely remember I got this from the way
> >> >> how niter/th was checked in previous peeling code, but did't double
> >> >> check it now. I tend to believe there is inconsistence about th,
> >> >> especially with comment like:
> >> >>
> >> >> /* Threshold of number of iterations below which vectorzation will not be
> >> >> performed. It is calculated from MIN_PROFITABLE_ITERS and
> >> >> PARAM_MIN_VECT_LOOP_BOUND. */
> >> >> unsigned int th;
> >> >>
> >> >> I also tend to believe the inconsistence was introduced partly because
> >> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
> >> >> in rest of the compiler.
> >> >>
> >> >> and...,
> >> >>
> >> >>> the increment rather than replacing it with a decrement.
> >> >>>
> >> >>> It looks like the threshold is already the maximum niters for the scalar
> >> >>> loop. It's set by:
> >> >>>
> >> >>> min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> >> >>> * vectorization_factor) - 1);
> >> >>>
> >> >>> /* Use the cost model only if it is more conservative than user specified
> >> >>> threshold. */
> >> >>> th = (unsigned) min_scalar_loop_bound;
> >> >>> if (min_profitable_iters
> >> >>> && (!min_scalar_loop_bound
> >> >>> || min_profitable_iters > min_scalar_loop_bound))
> >> >>> th = (unsigned) min_profitable_iters;
> >> >>>
> >> >>> LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
> >> >>>
> >> >>> (Note the "- 1" in the min_scalar_loop_bound. The multiplication result
> >> >>> is the minimum niters for the vector loop.)
> >> >> To be honest, min_scalar_loop_bound is more likely for something's
> >> >> lower bound which is the niters for the vector loop. If it refers to
> >> >> the niters scalar loop, it is in actuality the "max" value we should
> >> >> use. I am not quite sure here, partly because I am not a native
> >> >> speaker.
> >> >>
> >> >>>
> >> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
> >> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
> >> >>> instead returns the largest niters for which the scalar loop should be
> >> >>> preferred:
> >> >>>
> >> >>> /* Cost model disabled. */
> >> >>> if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> >> >>> {
> >> >>> dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
> >> >>> *ret_min_profitable_niters = 0;
> >> >>> *ret_min_profitable_estimate = 0;
> >> >>> return;
> >> >>> }
> >> >>> [...]
> >> >>> /* Because the condition we create is:
> >> >>> if (niters <= min_profitable_iters)
> >> >>> then skip the vectorized loop. */
> >> >>> min_profitable_iters--;
> >> >>> [...]
> >> >>> min_profitable_estimate --;
> >> >>>
> >> >>> Also, when deciding whether the threshold needs to be used, we have:
> >> >>>
> >> >>> th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >> >>> if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> >> >>> && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> >> >>> {
> >> >>> if (dump_enabled_p ())
> >> >>> dump_printf_loc (MSG_NOTE, vect_location,
> >> >>> "Profitability threshold is %d loop iterations.\n",
> >> >>> th);
> >> >>> check_profitability = true;
> >> >>> }
> >> >> Yes, this indeed indicates that th refers to the maximum niters of the
> >> >> scalar loop. Anyway, we should resolve the inconsistence and make it
> >> >> more clear in variable names etc..
> >> >
> >> > Yeah, agree the variable names are really confusing. Given the choice
> >> > between changing the names to match the current meaning and changing
> >> > the meaning to match the current names, the latter looks like it would
> >> > give cleaner code. I'm happy to do that it that sounds like the way
> >> > to go.
> >> I am all for this direction, but Richard B may have some to say here.
> >> BTW, will you also fix the other inconsistence you found?
> >
> > I'm all to making the code clearer but I guess it will remain a
> > tricky area. Each time I need to change sth there I need half an
> > hour to recap how all the stuff is tied together...
> >
> > So please go forward. Many bonus points if you manage to
> > create testcases covering the mistakes you fix.
>
> No testcases, sorry, but I built the patched compiler for
> aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
> powerpc64le-linux-gnu, compiled the testsuite with -O3,
> and compared the result with:
>
> diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c
> --- gcc/tree-vect-loop.c 2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop.c 2017-07-01 16:00:20.266729964 +0100
> @@ -2247,8 +2247,10 @@
> /* Niters for at least one iteration of vectorized loop. */
> niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> /* One additional iteration because of peeling for gap. */
> - if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> niters_th++;
> + /* Convert to maximum number of scalar iters. */
> + niters_th--;
> if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
> LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
> }
> diff -ur gcc/tree-vect-loop-manip.c /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c
> --- gcc/tree-vect-loop-manip.c 2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c 2017-07-01 14:00:26.775542411 +0100
> @@ -1143,9 +1143,6 @@
> *bound_scalar = vfm1 + bound_prolog;
> if (check_profitability)
> {
> - /* TH indicates the minimum niters of vectorized loop, while we
> - compute the maximum niters of scalar loop. */
> - th--;
> /* Peeling for constant times. */
> if (int_niters_prolog >= 0)
> {
>
> which is the patch that would make everything consistent with the
> current meaning (where the "minimum number of iterations" for which
> vectorisation was profitable were actually the maximum number of
> iterations for which vectorisation wasn't profitable).
>
> Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu.
> OK to install?
Ok.
Thanks,
Richard.
> Richard
>
>
> 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
>
> gcc/
> * tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound,
> min_profitable_iters, and th as inclusive lower bounds.
> Fix LOOP_VINFO_PEELING_FOR_GAPS condition.
> (vect_estimate_min_profitable_iters): Return inclusive lower bounds
> for min_profitable_iters and min_profitable_estimate.
> (vect_transform_loop): Treat th as an inclusive lower bound.
> * tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
>
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c 2017-07-03 08:42:51.149380545 +0100
> +++ gcc/tree-vect-loop.c 2017-07-03 08:50:43.507370205 +0100
> @@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_
> goto again;
> }
>
> - min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> - * vectorization_factor) - 1);
> + min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> + * vectorization_factor);
>
> /* Use the cost model only if it is more conservative than user specified
> threshold. */
> - th = (unsigned) min_scalar_loop_bound;
> - if (min_profitable_iters
> - && (!min_scalar_loop_bound
> - || min_profitable_iters > min_scalar_loop_bound))
> - th = (unsigned) min_profitable_iters;
> + th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
>
> LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>
> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> - && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> + && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
> estimated_niter = max_niter;
> if (estimated_niter != -1
> && ((unsigned HOST_WIDE_INT) estimated_niter
> - <= MAX (th, (unsigned)min_profitable_estimate)))
> + < MAX (th, (unsigned) min_profitable_estimate)))
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
>
> /* Decide whether we need to create an epilogue loop to handle
> remaining scalar iterations. */
> - th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
> - / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> - * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
> + / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> + * LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>
> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> @@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
> /* Niters for at least one iteration of vectorized loop. */
> niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> /* One additional iteration because of peeling for gap. */
> - if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> niters_th++;
> if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
> LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
> @@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop
> if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
> {
> if (vec_outside_cost <= 0)
> - min_profitable_iters = 1;
> + min_profitable_iters = 0;
> else
> {
> min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
> @@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop
> min_profitable_iters =
> min_profitable_iters < vf ? vf : min_profitable_iters;
>
> - /* Because the condition we create is:
> - if (niters <= min_profitable_iters)
> - then skip the vectorized loop. */
> - min_profitable_iters--;
> -
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> " Runtime profitability threshold = %d\n",
> @@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop
> SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC */
>
> if (vec_outside_cost <= 0)
> - min_profitable_estimate = 1;
> + min_profitable_estimate = 0;
> else
> {
> min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
> @@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop
> / ((scalar_single_iter_cost * vf)
> - vec_inside_cost);
> }
> - min_profitable_estimate --;
> min_profitable_estimate = MAX (min_profitable_estimate, min_profitable_iters);
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_
> run at least the vectorization factor number of times checking
> is pointless, too. */
> th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> - if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> + if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> {
> if (dump_enabled_p ())
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> --- gcc/tree-vect-loop-manip.c 2017-07-03 08:42:46.863565062 +0100
> +++ gcc/tree-vect-loop-manip.c 2017-07-03 08:50:43.506370248 +0100
> @@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop
> bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
>
> if (check_profitability)
> - cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> + cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
> build_int_cst (TREE_TYPE (scalar_loop_iters),
> th));
>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
More information about the Gcc-patches
mailing list