This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/3] Extended if-conversion


Richard,

I did simple change by saving gsi iterator for each bb that has
critical edges by adding additional field to bb_predicate_s:

typedef struct bb_predicate_s {

  /* The condition under which this basic block is executed.  */
  tree predicate;

  /* PREDICATE is gimplified, and the sequence of statements is
     recorded here, in order to avoid the duplication of computations
     that occur in previous conditions.  See PR44483.  */
  gimple_seq predicate_gimplified_stmts;

  /* Insertion point for blocks having incoming critical edges.  */
  gimple_stmt_iterator gsi;
} *bb_predicate_p;

and this iterator is saved in  insert_gimplified_predicates before
insertion code for predicate computation. I checked that this fix
works.

Now I am implementing merging of predicate_extended.. and
predicate_arbitrary.. functions as you proposed.

Best regards.
Yuri.

2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard for your quick reply!
>>
>> 1. I agree that we can combine predicate_extended_ and
>> predicate_arbitrary_ to one function as you proposed.
>> 2. What is your opinion about using more simple decision about
>> insertion point - if bb has use of phi result insert phi predication
>> before it and at the bb end otherwise. I assume that critical edge
>> splitting is not a good decision.
>
> Why not always insert before the use?  Which would be after labels,
> what we do for two-arg PHIs.  That is, how can it be that you predicate
> a PHI in BB1 and then for an edge predicate on one of its incoming
> edges you get SSA uses with defs that are in BB1 itself?  That
> can only happen for backedges but those you can't remove in any case.
>
> Richard.
>
>>
>> Best regards.
>> Yuri.
>>
>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> I resend you patch1 and patch2 with minor changes:
>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>> I also very sorry that I sent you bad patch.
>>>>
>>>> Now let me answer on your questions related to second patch.
>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>> predicate_arbitrary_scalar_phi?
>>>>
>>>> Let's consider the following simple test-case:
>>>>
>>>>   #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>       if (c[i] != 0)  /* c is integer array. */
>>>> res += 1;
>>>>   }
>>>>
>>>> we can see the following phi node correspondent to res:
>>>>
>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>
>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>> and only one check can be used for phi predication (for reduction in
>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>> even if we sort all phi argument values since we still have to produce
>>>> a chain of cond expressions to perform phi predication (see comments
>>>> for predicate_arbitrary_scalar_phi).
>>>
>>> How so?  We can always use !(condition) for the "last" value, thus
>>> treat it as an 'else' case.  That even works for
>>>
>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>
>>> where the condition for edges 5 and 7 can be computed as
>>> ! (condition for 3 || condition for 4).
>>>
>>> Of course it is worthwhile to also sort single-occurances first
>>> so your case gets just the condiiton for edge 5 and its inversion
>>> used for edges 3 and 4 combined.
>>>
>>>> 2. Why we need to introduce find_insertion_point?
>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>> only critical incoming edges and both contain code computing edge
>>>> predicates, e.g.
>>>>
>>>> <bb 7>:
>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>> _46 = xmax_17 == xmax_37;
>>>> _47 = xmax_17 == xmax_27;
>>>> _48 = _46 & _47;
>>>> _53 = xmax_17 == xmax_37;
>>>> _54 = ~_53;
>>>> _55 = xmax_17 == xmax_27;
>>>> _56 = _54 & _55;
>>>> _57 = _48 | _56;
>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>> goto <bb 11>;
>>>>
>>>> It is evident that we can not put phi predication at the block
>>>> beginning but need to put it after predicate computations.
>>>> Note also that if there are no critical edges for phi arguments
>>>> insertion point will be "after labels" Note also that phi result can
>>>> have use in this block too, so we can't put predication code to the
>>>> block end.
>>>
>>> So the issue is that predicate insertion for edge predicates does
>>> not happen on the edge but somewhere else (generally impossible
>>> for critical edges unless you split them).
>>>
>>> I think I've told you before that I prefer simple solutions to such issues,
>>> like splitting the edge!  Certainly not involving a function walking
>>> GENERIC expressions.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Let me know if you still have any questions.
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Here is the second patch related to extended predication.
>>>>>> Few comments which explain a main goal of design.
>>>>>>
>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>> lead to less efficient binaries.
>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>> scalar reduction is applied.
>>>>>> This is correspondent to the following statement:
>>>>>>     if (q1 && q2 && q3) var++
>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>> are already inserted above this block and
>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>> block. But this is not true for critical edges for which predicate
>>>>>> computations are  in the block where code for phi predication must be
>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>> simply found out the last statement in block defining predicates
>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>> after it (with some minor exceptions).
>>>>>
>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>
>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>
>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>
>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>> required for the predicates are defined upward - and the complex CFG
>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>> inserted code if you insert after labels just like for the other case.
>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>
>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>> times that would have been nice to vectorize.  I suggest to
>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>
>>>>> Otherwise patch 2 looks ok to me.
>>>>>
>>>>> Richard.
>>>>>
>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>> non-critical incoming edge.
>>>>>> (phi_has_two_different_args): New function.
>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>> containing reduction statement candidate is predecessor
>>>>>> of phi-block since phi may have more than two arguments.
>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>> statement before/after gsi point.
>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>> convert_scalar_cond_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>> EXTENDED value.
>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>> is copy of inner or outer loop field force_vectorize.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]