This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/3] Extended if-conversion


Richard,

Here is updated patch2 with the following changes:
1. Delete functions  phi_has_two_different_args and find_insertion_point.
2. Use only one function for extended predication -
predicate_extended_scalar_phi.
3. Save gsi before insertion of predicate computations for basic
blocks if it has 2 predecessors and
both incoming edges are critical or it gas more than 2 predecessors
and at least one incoming edge
is critical. This saved iterator can be used by extended phi predication.

Here is motivated test-case which explains this point.
Test-case is attached (t5.c) and it must be compiled with -O2
-ftree-loop-vectorize -fopenmp options.
The problem phi is in bb-7:

  bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
  {
    <bb 5>:
    xmax_edge_18 = xmax_edge_36 + 1;
    if (xmax_17 == xmax_27)
      goto <bb 7>;
    else
      goto <bb 9>;

  }
  bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
  {
    <bb 6>:
    if (xmax_17 == xmax_27)
      goto <bb 7>;
    else
      goto <bb 8>;

  }
  bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
  {
    <bb 7>:
    # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
    xmax_edge_19 = xmax_edge_39 + 1;
    goto <bb 11>;

  }

Note that both incoming edges to bb_7 are critical. If we comment out
restoring gsi in predicate_all_scalar_phi:
#if 0
 if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
     || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
   gsi = bb_insert_point (bb);
 else
#endif
   gsi = gsi_after_labels (bb);

we will get ICE:
t5.c: In function 'foo':
t5.c:9:6: error: definition in block 4 follows the use
 void foo (int n)
      ^
for SSA_NAME: _1 in statement:
_52 = _1 & _3;
t5.c:9:6: internal compiler error: verify_ssa failed

smce predicate computations were inserted in bb_7.

ChangeLog is

2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c : Include hash-map.h.
(struct bb_predicate_s): Add new field to save copy of gimple
statement iterator.
(bb_insert_point): New function.
(set_bb_insert_point): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
AGGRESSIVE_IF_CONV is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(predicate_scalar_phi): Add new arguments for call of
is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
to true if BB containing phi has more than 2 predecessors or both
incoming edges are critical. Invoke find_phi_replacement_condition and
predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
has 2 predecessors and both incoming edges are critical or it has more
than 2 predecessors and atleast one incoming edge is critical.
Use standard gsi_after_labels otherwise.
Invoke predicate_extended_scalar_phi if EXTENDED is true.
(insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
to save gsi before insertion of predicate computations. SEt-up it to
true for BB with 2 predecessors and critical incoming edges either
        number of predecessors is geater 2 and at least one incoming edge is
critical.
Add check that non-predicated block may have statements to insert.
Insert predicate computation of BB just after label if
EXTENDED_PREDICATION is true.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field.




2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I did simple change by saving gsi iterator for each bb that has
>> critical edges by adding additional field to bb_predicate_s:
>>
>> typedef struct bb_predicate_s {
>>
>>   /* The condition under which this basic block is executed.  */
>>   tree predicate;
>>
>>   /* PREDICATE is gimplified, and the sequence of statements is
>>      recorded here, in order to avoid the duplication of computations
>>      that occur in previous conditions.  See PR44483.  */
>>   gimple_seq predicate_gimplified_stmts;
>>
>>   /* Insertion point for blocks having incoming critical edges.  */
>>   gimple_stmt_iterator gsi;
>> } *bb_predicate_p;
>>
>> and this iterator is saved in  insert_gimplified_predicates before
>> insertion code for predicate computation. I checked that this fix
>> works.
>
> Huh?  I still wonder what the issue is with inserting everything
> after the PHI we predicate.
>
> Well, your updated patch will come with testcases for the testsuite
> that will hopefully fail if doing that.
>
> Richard.
>
>>
>> Now I am implementing merging of predicate_extended.. and
>> predicate_arbitrary.. functions as you proposed.
>>
>> Best regards.
>> Yuri.
>>
>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Thanks Richard for your quick reply!
>>>>
>>>> 1. I agree that we can combine predicate_extended_ and
>>>> predicate_arbitrary_ to one function as you proposed.
>>>> 2. What is your opinion about using more simple decision about
>>>> insertion point - if bb has use of phi result insert phi predication
>>>> before it and at the bb end otherwise. I assume that critical edge
>>>> splitting is not a good decision.
>>>
>>> Why not always insert before the use?  Which would be after labels,
>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>> can only happen for backedges but those you can't remove in any case.
>>>
>>> Richard.
>>>
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>> I also very sorry that I sent you bad patch.
>>>>>>
>>>>>> Now let me answer on your questions related to second patch.
>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>
>>>>>> Let's consider the following simple test-case:
>>>>>>
>>>>>>   #pragma omp simd safelen(8)
>>>>>>   for (i=0; i<512; i++)
>>>>>>   {
>>>>>>     float t = a[i];
>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>> res += 1;
>>>>>>   }
>>>>>>
>>>>>> we can see the following phi node correspondent to res:
>>>>>>
>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>
>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>
>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>> treat it as an 'else' case.  That even works for
>>>>>
>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>
>>>>> where the condition for edges 5 and 7 can be computed as
>>>>> ! (condition for 3 || condition for 4).
>>>>>
>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>> used for edges 3 and 4 combined.
>>>>>
>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>> only critical incoming edges and both contain code computing edge
>>>>>> predicates, e.g.
>>>>>>
>>>>>> <bb 7>:
>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>> _46 = xmax_17 == xmax_37;
>>>>>> _47 = xmax_17 == xmax_27;
>>>>>> _48 = _46 & _47;
>>>>>> _53 = xmax_17 == xmax_37;
>>>>>> _54 = ~_53;
>>>>>> _55 = xmax_17 == xmax_27;
>>>>>> _56 = _54 & _55;
>>>>>> _57 = _48 | _56;
>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>> goto <bb 11>;
>>>>>>
>>>>>> It is evident that we can not put phi predication at the block
>>>>>> beginning but need to put it after predicate computations.
>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>> have use in this block too, so we can't put predication code to the
>>>>>> block end.
>>>>>
>>>>> So the issue is that predicate insertion for edge predicates does
>>>>> not happen on the edge but somewhere else (generally impossible
>>>>> for critical edges unless you split them).
>>>>>
>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>> GENERIC expressions.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Let me know if you still have any questions.
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>
>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>> lead to less efficient binaries.
>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>> scalar reduction is applied.
>>>>>>>> This is correspondent to the following statement:
>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>> are already inserted above this block and
>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>> after it (with some minor exceptions).
>>>>>>>
>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>
>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>
>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>
>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>
>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>
>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>> non-critical incoming edge.
>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>> statement before/after gsi point.
>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>> EXTENDED value.
>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>> is copy of inner or outer loop field force_vectorize.
#define N 512
#define max(x,y) (x) >= (y)? (x) : (y)
#define min(x,y) (x) <= (y)? (x) : (y)
int c_X[N];
int x_max;
int x_min;
extern int nx;

void foo (int n)
{
  int i, x;
  int xmin, xmax;
  int xmin_edge, xmax_edge;

  x = c_X[0];
  xmin = xmax = x;
  xmin_edge = xmax_edge  = 1;
#pragma omp simd safelen(8)
  for (i = 1; i<n; i++) {
    x = c_X[i];
    x = max(min(x,nx),1);
    if (x == xmin) {  
       xmin_edge++;
    }
    if (x == xmax) {
       xmax_edge++;
    }
    else if (x < xmin) {
       xmin = x;
       xmin_edge = 1;
    }
    else if (x > xmax) {
       xmax = x;
       xmax_edge = 1;
    }

  }
    x_max = xmax_edge;
    x_min = xmin_edge;
}

   

Attachment: if-conv.patch2.2.1
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]