This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)



On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:

>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>wrote:
>>
>>
>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>
>>>
>>>
>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>
>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>
>>>>> Ok, I see.
>>>>>
>>>>> That this case is already vectorized is because it implements
>>>>>MAX_EXPR,
>>>>> modifying it slightly to
>>>>>
>>>>> int foo (int *a)
>>>>> {
>>>>>    int val = 0;
>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>      if (a[i] > val)
>>>>>        val = a[i] + 1;
>>>>>    return val;
>>>>> }
>>>>>
>>>>> makes it no longer handled by current code.
>>>>>
>>>>
>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>expressions
>>>>like
>>>>
>>>>int foo (int *a)
>>>>{
>>>>    int val = 0;
>>>>    for (int i = 0; i < 1024; ++i)
>>>>      if (some_expression (i))
>>>>        val = another_expression (i);
>>>>    return val;
>>>>}
>>>
>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>everything.
>>>
>>
>> Attached is a new version of the patch containing all the changes
>> requested by Richard.
>
>+      /* Compare the max index vector to the vector of found indexes to
>find
>+        the postion of the max value.  This will result in either a
>single
>+        match or all of the values.  */
>+      tree vec_compare = make_ssa_name (index_vec_type_signed);
>+      gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>EQ_EXPR,
>+                                                    induction_index,
>+                                                    max_index_vec);
>
>I'm not sure all targets can handle this.  If I deciper the code
>correctly then we do
>
>  mask = induction_index == max_index_vec;
>  vec_and = mask & vec_data;
>
>plus some casts.  So this is basically
>
>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>
>without the need to relate the induction index vector type to the data
>vector type.
>I believe this is also the form all targets support.


Ok, Iâll replace this.

>
>I am missing a comment before all this code-generation that shows the
>transform
>result with the variable names used in the code-gen.  I have a hard
>time connecting
>things here.

Ok, Iâll add some comments.

>
>+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>scalar_type,
>+                                            matched_data_reduc);
>+      epilog_stmt = gimple_build_assign (new_scalar_dest,
>+                                        matched_data_reduc_cast);
>+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>
>this will leave the stmt unsimplified.  scalar sign-changes should use
>NOP_EXPR,
>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>Also just do like before - first make_ssa_name and then directly use it
>in the
>gimple_build_assign.

We need the VIEW_CONVERT_EXPR for the cases where we have float data
values. The index is always integer.


>
>The patch is somewhat hard to parse with all the indentation changes.  A
>context
>diff would be much easier to read in those contexts.

Ok, Iâll make the next patch like that

>
>+  if (v_reduc_type == COND_REDUCTION)
>+    {
>+      widest_int ni;
>+
>+      if (! max_loop_iterations (loop, &ni))
>+       {
>+         if (dump_enabled_p ())
>+           dump_printf_loc (MSG_NOTE, vect_location,
>+                            "loop count not known, cannot create cond "
>+                            "reduction.\n");
>
>ugh.  That's bad.
>
>+      /* The additional index will be the same type as the condition.
>Check
>+        that the loop can fit into this less one (because we'll use up
>the
>+        zero slot for when there are no matches).  */
>+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>+       {
>+         if (dump_enabled_p ())
>+           dump_printf_loc (MSG_NOTE, vect_location,
>+                            "loop size is greater than data size.\n");
>+         return false;
>
>Likewise.

We could do better if we made the index type larger.
But as a first implementation of this optimisation, I didnât want to
overcomplicate things more.

>
>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>gimple_stmt_iterator *gsi,
>   if (dump_enabled_p ())
>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>
>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>+
>   /* FORNOW: Multiple types are not supported for condition.  */
>   if (code == COND_EXPR)
>
>this change looks odd (or wrong).  The type should be _only_ set/changed
>during
>analysis.


The problem is, for COND_EXPRs, this function calls
vectorizable_condition(), which sets STMT_VINFO_TYPE to
condition_vec_info_type.

Therefore we need something to restore it back to reduc_vec_info_type on
the non-analysis call.

I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
the call to vectorizable_condition(), but that looked worse.
I could back up the value of STMT_VINFO_TYPE before calling
vectorizable_condition() and then restore it after? I think thatâll look a
lot better.


>
>+
>+      /* For cond reductions we need to add an additional conditional
>based on
>+        the loop index.  */
>+      if (v_reduc_type == COND_REDUCTION)
>+       {
>+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>+         int k;
>...
>+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>+
>+         /* Update the phi with the vec cond.  */
>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>+                      UNKNOWN_LOCATION);
>
>same as before - I am missing a comment that shows the generated code
>and connects
>the local vars used.

Ok, Iâll add something

>
>
>+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>+         gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>ccompare);
>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>
>hum - are you sure this works with ncopies > 1?  Will it use the
>correct vec_stmt?

We donât support this when ncopies >1.

In vectorizable_reduction():

if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
    {
      if (dump_enabled_p ())
	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
			 "multiple types in double reduction or condition "
			 "reduction.\n");
      return false;
    }



>
>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>you explore
>adding the reduction kind to stmt_info?

Ok, I can do that.


Thanks for the comments.
Iâll put together a patch with the above changes.

Thanks,
Alan.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]