This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: making the new if-converter not mangle IR that is already vectorizer-friendly


On Fri, Jul 3, 2015 at 11:37 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Abe wrote:
>
>> In other words, the problem about which I was concerned is not going to be
>> triggered by e.g. "if (c)  x = ..."
>> which lacks an attached "else  x = ..." in a multithreaded program without
>> enough locking just because 'x' is global/static.
>>
>> The only remaining case to consider is if some code being compiler takes
>> the address of something thread-local and then "gives"
>> that pointer to another thread.  Even for _that_ extreme case, Sebastian
>> says that the gimplifier will detect this
>> "address has been taken" situation and do the right thing such that the
>> new if converter also does the right thing.
>
>
> Great :). I don't understand much/anything about how gcc deals with
> thread-locals, but everything before that, all sounds good...
>
>> [Alan wrote:]
>>
>>> Can you give an example?
>>
>>
>> The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and
>> "gcc.dg/vect/vect-mask-load-1.c"
>> currently test as: the new if-converter is "converting" something that`s
>> already vectorizer-friendly...
>
>> [snip]
>>
>> However, TTBOMK the vectorizer already "understands" that in cases where
>> its input looks like:
>>
>>    x = c ? y : z;
>>
>> ... and 'y' and 'z' are both pure [side-effect-free] -- including, but not
>> limited to, they must be non-"volatile" --
>> it may vectorize a loop containing code like the preceding, ignoring for
>> this particular instance the C mandate
>> that only one of {y, z} be evaluated...
>
>
> My understanding, is that any decision as to whether one or both of y or z
> is evaluated (when 'evaluation' involves doing any work, e.g. a load), has
> already been encoded into the gimple/tree IR. Thus, if we are to only
> evaluate one of 'y' or 'z' in your example, the IR will (prior to
> if-conversion), contain basic blocks and control flow, that means we jump
> around the one that's not evaluated.
>
> This appears to be the case in pr61194.c: prior to if-conversion, the IR for
> the loop in barX is
>
>  <bb 3>:
>   # i_16 = PHI <i_13(7), 0(2)>
>   # ivtmp_21 = PHI <ivtmp_20(7), 1024(2)>
>   _5 = x[i_16];
>   _6 = _5 > 0.0;
>   _7 = w[i_16];
>   _8 = _7 < 0.0;
>   _9 = _6 & _8;
>   if (_9 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
>   <bb 4>:
>   iftmp.0_10 = z[i_16];
>   goto <bb 6>;
>
>   <bb 5>:
>   iftmp.0_11 = y[i_16];
>
>   <bb 6>:
>   # iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)>
>   z[i_16] = iftmp.0_2;
>   i_13 = i_16 + 1;
>   ivtmp_20 = ivtmp_21 - 1;
>   if (ivtmp_20 != 0)
>     goto <bb 7>;
>   else
>     goto <bb 8>;
>
>   <bb 7>:
>   goto <bb 3>;
>
> which clearly contains (unvectorizable!) control flow. Without
> -ftree-loop-if-convert-stores, if-conversion leaves this alone, and
> vectorization fails (i.e. the vectorizer bails out because the loop has >2
> basic blocks). With -ftree-loop-if-convert-stores, if-conversion produces
>
>  <bb 3>:
>   # i_16 = PHI <i_13(4), 0(2)>
>   # ivtmp_21 = PHI <ivtmp_20(4), 1024(2)>
>   _5 = x[i_16];
>   _6 = _5 > 0.0;
>   _7 = w[i_16];
>   _8 = _7 < 0.0;
>   _9 = _6 & _8;
>   iftmp.0_10 = z[i_16]; // <== here
>   iftmp.0_11 = y[i_16]; // <== here
>   iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11;
>   z[i_16] = iftmp.0_2;
>   i_13 = i_16 + 1;
>   ivtmp_20 = ivtmp_21 - 1;
>   if (ivtmp_20 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
>   <bb 4>:
>   goto <bb 3>;
>
> where I have commented the conditional loads that have become unconditional.
> (Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the
> if-conversion phase converts loads, too - please correct me if I
> misunderstand (Richard?) ?!) This contains no control flow, and so is
> vectorizable.

Not sure why the above is only handled with -ftree-loop-if-convert-stores.  It's
clearly if-converting the load as the store is unconditional.  Ok, the dump says

iftmp.0_10 = z[i_16];
tree could trap...

from

  if (gimple_assign_rhs_could_trap_p (stmt))
    {
      if (ifcvt_can_use_mask_load_store (stmt))
        {
          gimple_set_plf (stmt, GF_PLF_2, true);
          *any_mask_load_store = true;
          return true;
        }
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "tree could trap...\n");
      return false;

I suppose that should use ifcvt_could_trap_p instead (clearly we write to
z[i_16] unconditionally after the conditional load).  Note that it says
the access may trap because the check is somewhat too conservative,
not using range-info of i_16.

Looks like to do this we'd need to enable some analysis code
currently run conditional on flag_tree_loop_if_convert_stores only.

Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c  (revision 225599)
+++ gcc/tree-if-conv.c  (working copy)
@@ -874,7 +874,7 @@ if_convertible_gimple_assign_stmt_p (gim
       return true;
     }

-  if (gimple_assign_rhs_could_trap_p (stmt))
+  if (ifcvt_could_trap_p (stmt, refs))
     {
       if (ifcvt_can_use_mask_load_store (stmt))
        {
@@ -1297,18 +1297,15 @@ if_convertible_loop_p_1 (struct loop *lo
          }
     }

-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
+  data_reference_p dr;

-      for (i = 0; refs->iterate (i, &dr); i++)
-       {
-         dr->aux = XNEW (struct ifc_dr);
-         DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
-         DR_RW_UNCONDITIONALLY (dr) = -1;
-       }
-      predicate_bbs (loop);
+  for (i = 0; refs->iterate (i, &dr); i++)
+    {
+      dr->aux = XNEW (struct ifc_dr);
+      DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
+      DR_RW_UNCONDITIONALLY (dr) = -1;
     }
+  predicate_bbs (loop);

   for (i = 0; i < loop->num_nodes; i++)
     {
@@ -1323,9 +1320,8 @@ if_convertible_loop_p_1 (struct loop *lo
            return false;
     }

-  if (flag_tree_loop_if_convert_stores)
-    for (i = 0; i < loop->num_nodes; i++)
-      free_bb_predicate (ifc_bbs[i]);
+  for (i = 0; i < loop->num_nodes; i++)
+    free_bb_predicate (ifc_bbs[i]);

   /* Checking PHIs needs to be done after stmts, as the fact whether there
      are any masked loads or stores affects the tests.  */
@@ -1399,14 +1395,10 @@ if_convertible_loop_p (struct loop *loop
   res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs,
                                 any_mask_load_store);

-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-      unsigned int i;
-
-      for (i = 0; refs.iterate (i, &dr); i++)
-       free (dr->aux);
-    }
+  data_reference_p dr;
+  unsigned int i;
+  for (i = 0; refs.iterate (i, &dr); i++)
+    free (dr->aux);

   free_data_refs (refs);
   free_dependence_relations (ddrs);

fixes the testcase for me.

Richard.

>
> (This is all without your scratchpad patch, of course.) IOW this being
> vectorized, or not, relies upon the preceding if-conversion phase removing
> the control flow.
>
> HTH
> Alan
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]