This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Bin Cheng <Bin dot Cheng at arm dot com>
- Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Sat, 10 Jun 2017 10:40:58 +0100
- Subject: Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
- Authentication-results: sourceware.org; auth=none
- References: <VI1PR0802MB2176844C26FACE87D72DF57FE7D20@VI1PR0802MB2176.eurprd08.prod.outlook.com>
Sorry to return this old patch, but:
Bin Cheng <Bin.Cheng@arm.com> writes:
> -/* Calculate the number of iterations under which scalar loop will be
> - preferred than vectorized loop. NITERS_PROLOG is the number of
> - iterations of prolog loop. If it's integer const, the integer
> - number is also passed by INT_NITERS_PROLOG. VF is vector factor;
> - TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> - true. This function also store upper bound of the result in BOUND. */
> +/* Calculate the number of iterations above which vectorized loop will be
> + preferred than scalar loop. NITERS_PROLOG is the number of iterations
> + of prolog loop. If it's integer const, the integer number is also passed
> + in INT_NITERS_PROLOG. BOUND_PROLOG is the upper bound (included) of
> + number of iterations of prolog loop. VFM1 is vector factor minus one.
> + If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> + (rather than vectorized) loop will be executed. This function stores
> + upper bound (included) of the result in BOUND_SCALAR. */
>
> static tree
> vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> - int bound_prolog, int vf, int th, int *bound,
> - bool check_profitability)
> + int bound_prolog, int vfm1, int th,
> + int *bound_scalar, bool check_profitability)
> {
> tree type = TREE_TYPE (niters_prolog);
> tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> - build_int_cst (type, vf));
> + build_int_cst (type, vfm1));
>
> - *bound = vf + bound_prolog;
> + *bound_scalar = vfm1 + bound_prolog;
> if (check_profitability)
> {
> - th++;
> + /* TH indicates the minimum niters of vectorized loop, while we
> + compute the maximum niters of scalar loop. */
> + th--;
Are you sure about this last change? It looks like it should be dropping
the increment rather than replacing it with a decrement.
It looks like the threshold is already the maximum niters for the scalar
loop. It's set by:
min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
* vectorization_factor) - 1);
/* Use the cost model only if it is more conservative than user specified
threshold. */
th = (unsigned) min_scalar_loop_bound;
if (min_profitable_iters
&& (!min_scalar_loop_bound
|| min_profitable_iters > min_scalar_loop_bound))
th = (unsigned) min_profitable_iters;
LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
(Note the "- 1" in the min_scalar_loop_bound. The multiplication result
is the minimum niters for the vector loop.)
min_profitable_iters sounds like it _ought_ to be the minimum niters for
which the vector loop is used, but vect_estimate_min_profitable_iters
instead returns the largest niters for which the scalar loop should be
preferred:
/* Cost model disabled. */
if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
{
dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
*ret_min_profitable_niters = 0;
*ret_min_profitable_estimate = 0;
return;
}
[...]
/* Because the condition we create is:
if (niters <= min_profitable_iters)
then skip the vectorized loop. */
min_profitable_iters--;
[...]
min_profitable_estimate --;
Also, when deciding whether the threshold needs to be used, we have:
th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
&& !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"Profitability threshold is %d loop iterations.\n",
th);
check_profitability = true;
}
where again the "- 1" seems to assume that the threshold is already the
maximum niters for the scalar loop. (Although checking for == seems a bit
pointless too.)
The corresponding profitability check for loop versioning still assumes
that th is the maximum niters for the scalar loop:
if (check_profitability)
cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters),
th));
since we use GT_EXPR instead of GE_EXPR.
Thanks,
Richard