This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch] Optimize condition reductions where the result is an integer induction variable
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Alan Hayward <alan dot hayward at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 12 Nov 2015 11:53:49 +0100
- Subject: Re: [Patch] Optimize condition reductions where the result is an integer induction variable
- Authentication-results: sourceware.org; auth=none
- References: <D268E36F dot 98B3%alan dot hayward at arm dot com> <CAFiYyc0Vuohrj0VU98Hx9abp9ybrw+5cbBvsRv2F1DRNGC+Z-g at mail dot gmail dot com> <D2693EE7 dot 98D1%alan dot hayward at arm dot com>
On Wed, Nov 11, 2015 at 7:54 PM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
> On 11/11/2015 13:25, "Richard Biener" <richard.guenther@gmail.com> wrote:
>
>>On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward <alan.hayward@arm.com>
>>wrote:
>>> Hi,
>>> I hoped to post this in time for Mondayâs cut off date, but
>>>circumstances
>>> delayed me until today. Hoping if possible this patch will still be able
>>> to go in.
>>>
>>>
>>> This patch builds upon the change for PR65947, and reduces the amount of
>>> code produced in a vectorized condition reduction where operand 2 of the
>>> COND_EXPR is an assignment of a increasing integer induction variable
>>>that
>>> won't wrap.
>>>
>>>
>>> For example (assuming all types are ints), this is a match:
>>>
>>> last = 5;
>>> for (i = 0; i < N; i++)
>>> if (a[i] < min_v)
>>> last = i;
>>>
>>> Whereas, this is not because the result is based off a memory access:
>>> last = 5;
>>> for (i = 0; i < N; i++)
>>> if (a[i] < min_v)
>>> last = a[i];
>>>
>>> In the integer induction variable case we can just use a MAX reduction
>>>and
>>> skip all the code I added in my vectorized condition reduction patch -
>>>the
>>> additional induction variables in vectorizable_reduction () and the
>>> additional checks in vect_create_epilog_for_reduction (). From the patch
>>> diff only, it's not immediately obvious that those parts will be skipped
>>> as there is no code changes in those areas.
>>>
>>> The initial value of the induction variable is force set to zero, as any
>>> other value could effect the result of the induction. At the end of the
>>> loop, if the result is zero, then we restore the original initial value.
>>
>>+static bool
>>+is_integer_induction (gimple *stmt, struct loop *loop)
>>
>>is_nonwrapping_integer_induction?
>>
>>+ tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));
>>
>>don't use TYPE_MAX_VALUE.
>>
>>+ /* Check that the induction increments. */
>>+ if (tree_int_cst_compare (step, size_zero_node) <= 0)
>>+ return false;
>>
>>tree_int_cst_sgn (step) == -1
>>
>>+ /* Check that the max size of the loop will not wrap. */
>>+
>>+ if (! max_loop_iterations (loop, &ni))
>>+ return false;
>>+ /* Convert backedges to iterations. */
>>+ ni += 1;
>>
>>just use max_stmt_executions (loop, &ni) which properly checks for
>>overflow
>>of the +1.
>>
>>+ max_loop_value = wi::add (wi::to_widest (base),
>>+ wi::mul (wi::to_widest (step), ni));
>>+
>>+ if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
>>+ return false;
>>
>>you miss a check for the wi::add / wi::mul to overflow. You can use
>>extra args to determine this.
>>
>>Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).
>>
>>I wonder if you want to skip all the overflow checks for
>>TYPE_OVERFLOW_UNDEFINED
>>IV types?
>>
>
> Ok with all the above.
>
> Tried using max_value () but this gave me a wide_int instead of a
> widest_int.
> Instead I've replaced with min_precision and GET_MODE_BITSIZE.
>
> Added an extra check for when the type is TYPE_OVERFLOW_UNDEFINED.
+ /* Set the loop-entry arg of the reduction-phi. */
+
+ if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+ == INTEGER_INDUC_COND_REDUCTION)
extra vertical space
+ tree zero = build_int_cst ( TREE_TYPE (vec_init_def_type), 0);
+ tree zero_vec = build_vector_from_val (vec_init_def_type, zero);
+
build_zero_cst (vec_init_def_type);
+ else
+ {
+ add_phi_arg (as_a <gphi *> (phi), vec_init_def,
loop_preheader_edge (loop), UNKNOWN_LOCATION);
+ }
no {}s around single stmts
+ tree comparez = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
please no l33t speech
+ tmp = build3 (COND_EXPR, scalar_type, comparez, initial_def,
+ new_temp);
+
+ epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+ gimple_assign_set_lhs (epilog_stmt, new_temp);
epilog_stmt = gimple_build_assign (make_ssa_name (new_scalar_dest),
COND_EXPR,
compare, initial_def, new_temp);
+ /* Check that the max size of the loop will not wrap. */
+
+ if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
+ {
+ return (GET_MODE_BITSIZE (TYPE_MODE (lhs_type))
+ >= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (base))));
this mode check will always be true as lhs_type and base are from the
same PHI node.
+ return (wi::min_precision (max_loop_value, TYPE_SIGN (lhs_type))
+ <= GET_MODE_BITSIZE (TYPE_MODE (lhs_type)));
please use TYPE_PRECISION (lhs_type) instead.
Ok with those changes.
Thanks,
Richard.
>
>
> Alan.
>