This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
- From: Alan Hayward <alan dot hayward at arm dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 01 Oct 2015 16:21:49 +0100
- Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
- Authentication-results: sourceware.org; auth=none
- References: <D217578B dot 7FE4%alan dot hayward at arm dot com> <CAFiYyc0tYw9p9_z=5at7+2V8r9LzN0iRdA=w6TQ1==ZiqXb0bA at mail dot gmail dot com> <D21DD124 dot 813A%alan dot hayward at arm dot com> <CAFiYyc1gxyLSS3VcHiJtE43DpmxLPpR12tBKKGLhU_pO1hUBwg at mail dot gmail dot com> <55FC110D dot 1070806 at arm dot com> <D221D55E dot 8386%alan dot hayward at arm dot com> <D2286A5B dot 84EE%alan dot hayward at arm dot com> <CAFiYyc0GSS1Jb0yzHKuCWiK7Det3ZNTzUaqdrjYeVdUexjQoSw at mail dot gmail dot com>
On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:
>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>wrote:
>>
>>
>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>
>>>
>>>
>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>
>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>
>>>>> Ok, I see.
>>>>>
>>>>> That this case is already vectorized is because it implements
>>>>>MAX_EXPR,
>>>>> modifying it slightly to
>>>>>
>>>>> int foo (int *a)
>>>>> {
>>>>> int val = 0;
>>>>> for (int i = 0; i < 1024; ++i)
>>>>> if (a[i] > val)
>>>>> val = a[i] + 1;
>>>>> return val;
>>>>> }
>>>>>
>>>>> makes it no longer handled by current code.
>>>>>
>>>>
>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>expressions
>>>>like
>>>>
>>>>int foo (int *a)
>>>>{
>>>> int val = 0;
>>>> for (int i = 0; i < 1024; ++i)
>>>> if (some_expression (i))
>>>> val = another_expression (i);
>>>> return val;
>>>>}
>>>
>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>everything.
>>>
>>
>> Attached is a new version of the patch containing all the changes
>> requested by Richard.
>
>+ /* Compare the max index vector to the vector of found indexes to
>find
>+ the postion of the max value. This will result in either a
>single
>+ match or all of the values. */
>+ tree vec_compare = make_ssa_name (index_vec_type_signed);
>+ gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>EQ_EXPR,
>+ induction_index,
>+ max_index_vec);
>
>I'm not sure all targets can handle this. If I deciper the code
>correctly then we do
>
> mask = induction_index == max_index_vec;
> vec_and = mask & vec_data;
>
>plus some casts. So this is basically
>
> vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>
>without the need to relate the induction index vector type to the data
>vector type.
>I believe this is also the form all targets support.
Ok, Iâll replace this.
>
>I am missing a comment before all this code-generation that shows the
>transform
>result with the variable names used in the code-gen. I have a hard
>time connecting
>things here.
Ok, Iâll add some comments.
>
>+ tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>scalar_type,
>+ matched_data_reduc);
>+ epilog_stmt = gimple_build_assign (new_scalar_dest,
>+ matched_data_reduc_cast);
>+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>+ gimple_assign_set_lhs (epilog_stmt, new_temp);
>
>this will leave the stmt unsimplified. scalar sign-changes should use
>NOP_EXPR,
>not VIEW_CONVERT_EXPR. The easiest fix is to use fold_convert instead.
>Also just do like before - first make_ssa_name and then directly use it
>in the
>gimple_build_assign.
We need the VIEW_CONVERT_EXPR for the cases where we have float data
values. The index is always integer.
>
>The patch is somewhat hard to parse with all the indentation changes. A
>context
>diff would be much easier to read in those contexts.
Ok, Iâll make the next patch like that
>
>+ if (v_reduc_type == COND_REDUCTION)
>+ {
>+ widest_int ni;
>+
>+ if (! max_loop_iterations (loop, &ni))
>+ {
>+ if (dump_enabled_p ())
>+ dump_printf_loc (MSG_NOTE, vect_location,
>+ "loop count not known, cannot create cond "
>+ "reduction.\n");
>
>ugh. That's bad.
>
>+ /* The additional index will be the same type as the condition.
>Check
>+ that the loop can fit into this less one (because we'll use up
>the
>+ zero slot for when there are no matches). */
>+ tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>+ if (wi::geu_p (ni, wi::to_widest (max_index)))
>+ {
>+ if (dump_enabled_p ())
>+ dump_printf_loc (MSG_NOTE, vect_location,
>+ "loop size is greater than data size.\n");
>+ return false;
>
>Likewise.
We could do better if we made the index type larger.
But as a first implementation of this optimisation, I didnât want to
overcomplicate things more.
>
>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>gimple_stmt_iterator *gsi,
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>
>+ STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>+
> /* FORNOW: Multiple types are not supported for condition. */
> if (code == COND_EXPR)
>
>this change looks odd (or wrong). The type should be _only_ set/changed
>during
>analysis.
The problem is, for COND_EXPRs, this function calls
vectorizable_condition(), which sets STMT_VINFO_TYPE to
condition_vec_info_type.
Therefore we need something to restore it back to reduc_vec_info_type on
the non-analysis call.
I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
the call to vectorizable_condition(), but that looked worse.
I could back up the value of STMT_VINFO_TYPE before calling
vectorizable_condition() and then restore it after? I think thatâll look a
lot better.
>
>+
>+ /* For cond reductions we need to add an additional conditional
>based on
>+ the loop index. */
>+ if (v_reduc_type == COND_REDUCTION)
>+ {
>+ int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>+ int k;
>...
>+ STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>+ set_vinfo_for_stmt (index_condition, index_vec_info);
>+
>+ /* Update the phi with the vec cond. */
>+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>+ UNKNOWN_LOCATION);
>
>same as before - I am missing a comment that shows the generated code
>and connects
>the local vars used.
Ok, Iâll add something
>
>
>+ tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>+ gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>ccompare);
>+ gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
>+ gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>
>hum - are you sure this works with ncopies > 1? Will it use the
>correct vec_stmt?
We donât support this when ncopies >1.
In vectorizable_reduction():
if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"multiple types in double reduction or condition "
"reduction.\n");
return false;
}
>
>I still dislike the v_reduc_type plastered and passed everywhere. Can
>you explore
>adding the reduction kind to stmt_info?
Ok, I can do that.
Thanks for the comments.
Iâll put together a patch with the above changes.
Thanks,
Alan.