[PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
Richard Biener
richard.guenther@gmail.com
Fri Sep 18 12:22:00 GMT 2015
On Tue, Sep 15, 2015 at 5:32 PM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
> On 15/09/2015 13:09, "Richard Biener" <richard.guenther@gmail.com> wrote:
>
>>
>>Now comments on the patch itself.
>>
>>+ if (code == COND_EXPR)
>>+ *v_reduc_type = COND_REDUCTION;
>>
>>so why not simply use COND_EXPR instead of the new v_reduc_type?
>
> v_reduc_type is also dependant on check_reduction (which comes from
> !nested_cycle in vectorizable_reduction).
> It seemed messy to keep on checking for both those things throughout.
>
> In my patch to catch simpler condition reductions, I’ll be adding another
> value to this enum too. v_reduc_type will be set to this new value based
> on the same properties for COND_REDUCTION plus some additional constraints.
>
>>
>>+ if (check_reduction && code != COND_EXPR &&
>>+ vect_is_slp_reduction (loop_info, phi, def_stmt))
>>
>>&&s go to the next line
>
> ok
>
>>
>>+ /* Reduction of the max index and a reduction of the found
>>+ values. */
>>+ epilogue_cost += add_stmt_cost (target_cost_data, 1,
>>+ vec_to_scalar, stmt_info, 0,
>>+ vect_epilogue);
>>
>>vec_to_scalar once isn't what the comment suggests. Instead the
>>comment suggests twice what a regular reduction would do
>>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>>with the broadcast. Thus make the above two vector_stmt costs?
>>
>>+ /* A broadcast of the max value. */
>>+ epilogue_cost += add_stmt_cost (target_cost_data, 2,
>>+ scalar_to_vec, stmt_info, 0,
>>+ vect_epilogue);
>>
>>comment suggests a single broadcast.
>
> I’ve made a copy/paste error here. Just need to swap the 1 and the 2.
>
>
>>
>>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>> the final vector of induction results: */
>> exit_phi = NULL;
>> FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>>- {
>>+ {
>> gimple use_stmt = USE_STMT (use_p);
>> if (is_gimple_debug (use_stmt))
>> continue;
>>
>>please avoid unrelated whitespace changes.
>
> Ok. I was changing “8 spaces” to a tab, but happy to revert.
>
>>
>>+ case COND_EXPR:
>>+ if (v_reduc_type == COND_REDUCTION)
>>+ {
>>...
>>+ /* Fall through. */
>>+
>> case MIN_EXPR:
>> case MAX_EXPR:
>>- case COND_EXPR:
>>
>>aww, so we could already handle COND_EXPR reductions? How do they
>>differ from what you add? Did you see if that path is even exercised
>>today?
>
> Today, COND_EXPRs are only supported when they are nested inside a loop.
> See the vect-cond-*.c tests.
> For example:
>
> for (j = 0; j < M; j++)
> {
> x = x_in[j];
> curr_a = a[0];
>
> for (i = 0; i < N; i++)
> {
> next_a = a[i+1];
> curr_a = x > c[i] ? curr_a : next_a;
> }
> x_out[j] = curr_a;
> }
>
>
> In that case, only the outer loop is vectorised.
>
>>
>>+ /* Create a vector of {init_value, 0, 0, 0...}. */
>>+ vec<constructor_elt, va_gc> *v;
>>+ vec_alloc (v, nunits);
>>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>>+ if (SCALAR_FLOAT_TYPE_P (scalar_type))
>>+ for (i = 1; i < nunits; ++i)
>>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+ build_real (scalar_type,
>>dconst0));
>>+ else
>>+ for (i = 1; i < nunits; ++i)
>>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+ build_int_cst (scalar_type, 0));
>>+ init_def = build_constructor (vectype, v);
>>
>>you can unify the float/int case by using build_zero_cst (scalar_type).
>>Note that you should build a vector constant instead of a constructor
>>if init_val is a constant. The convenient way is to build the vector
>>elements into a tree[] array and use build_vector_stat in that case.
>
> Ok, will switch to build_zero_cst.
> Also, I will switch my vector to {init_value, init_value, init_value…}.
> I had {init_value, 0, 0, 0…} because I was going have the option of
> ADD_REDUC_EXPR,
> But that got removed along the way.
You can then simply use build_vector_from_val ().
>>
>>+ /* Find maximum value from the vector of found indexes. */
>>+ tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>>
>>just use make_ssa_name (index_scalar_type);
>
> Ok
>
>>
>>+ /* Convert the vector of data to the same type as the EQ. */
>>+ tree vec_data_cast;
>>+ if ( TYPE_UNSIGNED (index_vec_type))
>>+ {
>>
>>How come it never happens the element
>>sizes do not match? (int index type and double data type?)
>
> This was a little unclear.
> The induction index is originally created as an unsigned version of the
> type as the data vector.
> (see the definition of cr_index_vector_type in vectorizable_reduction(),
> which is then used to create cond_name)
>
> I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED
> (index_vec_type))
>
>
>>
>>+ /* Where the max index occured, use the value from the data
>>vector. */
>>+ tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL,
>>"");
>>+ gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
>>+ vec_compare,
>>vec_data_cast);
>>
>>that is, don't you need to do some widening/shortening on the comparison
>>result?
>>(also what happens in the case "or all of the values"?)
>>
>>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)
>
> Given the induction index is the same size as the data, this will be ok.
>
> I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other
> way of making it work.
>
>
> The case of "all of the values” is when there were no matches in the loop.
> When this happens, the data vector will contain only the default values
> {init_value, 0, 0, 0…}
> (… once I change my code due to the previous comment, we’ll have
> {init_value, init_value, init_value…} ).
> The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”.
>
>>
>>+ This function also handles reduction of condition expressions, for
>>example:
>>+ for (int i = 0; i < N; i++)
>>+ if (a[i] < value)
>>+ last = a[i];
>>+ This is handled by vectorising the loop and creating an additional
>>vector
>>+ containing the loop indexes for which "a[i] < value" was true. In the
>>+ function epilogue this is reduced to a single max value and then used
>>to
>>+ index into the vector of results.
>>
>>I miss a comment that shows the kind of code we transform this to.
>>"an additional vector containing the loop indexes" can't work - the vector
>>will not be large enough ;) Naiively I would have made 'last' a vector,
>>performing the reduction element-wise and in the epilogue reduce
>>'last' itself. And it looks like we are already doing that for
>>
>>int foo (int *a)
>>{
>> int val = 0;
>> for (int i = 0; i < 1024; ++i)
>> if (a[i] > val)
>> val = a[i];
>> return val;
>>}
>>
>>I must be missing something. Yes, I think we can't do index reduction
>>yet,
>>but pr65947-10.c is alrady handled?
>
>
> I think an easier way of looking thinking about this is to see what
> happens each iteration of the loop:
>
> Assuming the example above and a vector length of 4.
> The index vector starts as (0,0,0,0).
> On the first iteration, the condition passes for, lets say, i=2 and i=3,
> then index vector is (0,0,2,3)
> on the next iteration, the condition passes for i=4, then index vector is
> now (4,0,2,3)
> on the next iteration, the condition passes for i=8 and i=9, then index
> vector is now (8,9,2,3)
> Etc
> In the end we might end up with something like index vector=(26,2,54,14).
> The “2” has not been set since the first iteration.
> The only value we care about is the last one set - the 54. The rest are
> meaningless to us.
> At the same time the data vector has been storing values at the same time
> as the index vector, and so now contains (a[26], a[2], a[54], a[14])
>
> In the epilogue, we reduce the index vector down to “54”, then create a
> vector (54,54,54,54).
> This is matched against the index vector to give us
> (false,false,true,false).
> Then we use that vector to index the data vector, giving us (0, 0, a[54],
> 0).
> Which we then reduce down to a single value.
Ok, I see.
That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to
int foo (int *a)
{
int val = 0;
for (int i = 0; i < 1024; ++i)
if (a[i] > val)
val = a[i] + 1;
return val;
}
makes it no longer handled by current code.
>
>
>
>
>>
>>Stopping here.
>
> Thanks for the comments so far.
> I’ll put together a new patch with the changes above.
>
>
> Alan.
>
>
More information about the Gcc-patches
mailing list