This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/3] Extended if-conversion


On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did simple change by saving gsi iterator for each bb that has
> critical edges by adding additional field to bb_predicate_s:
>
> typedef struct bb_predicate_s {
>
>   /* The condition under which this basic block is executed.  */
>   tree predicate;
>
>   /* PREDICATE is gimplified, and the sequence of statements is
>      recorded here, in order to avoid the duplication of computations
>      that occur in previous conditions.  See PR44483.  */
>   gimple_seq predicate_gimplified_stmts;
>
>   /* Insertion point for blocks having incoming critical edges.  */
>   gimple_stmt_iterator gsi;
> } *bb_predicate_p;
>
> and this iterator is saved in  insert_gimplified_predicates before
> insertion code for predicate computation. I checked that this fix
> works.

Huh?  I still wonder what the issue is with inserting everything
after the PHI we predicate.

Well, your updated patch will come with testcases for the testsuite
that will hopefully fail if doing that.

Richard.

>
> Now I am implementing merging of predicate_extended.. and
> predicate_arbitrary.. functions as you proposed.
>
> Best regards.
> Yuri.
>
> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks Richard for your quick reply!
>>>
>>> 1. I agree that we can combine predicate_extended_ and
>>> predicate_arbitrary_ to one function as you proposed.
>>> 2. What is your opinion about using more simple decision about
>>> insertion point - if bb has use of phi result insert phi predication
>>> before it and at the bb end otherwise. I assume that critical edge
>>> splitting is not a good decision.
>>
>> Why not always insert before the use?  Which would be after labels,
>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>> a PHI in BB1 and then for an edge predicate on one of its incoming
>> edges you get SSA uses with defs that are in BB1 itself?  That
>> can only happen for backedges but those you can't remove in any case.
>>
>> Richard.
>>
>>>
>>> Best regards.
>>> Yuri.
>>>
>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> I resend you patch1 and patch2 with minor changes:
>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>> I also very sorry that I sent you bad patch.
>>>>>
>>>>> Now let me answer on your questions related to second patch.
>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>> predicate_arbitrary_scalar_phi?
>>>>>
>>>>> Let's consider the following simple test-case:
>>>>>
>>>>>   #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>> res += 1;
>>>>>   }
>>>>>
>>>>> we can see the following phi node correspondent to res:
>>>>>
>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>
>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>> and only one check can be used for phi predication (for reduction in
>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>> even if we sort all phi argument values since we still have to produce
>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>> for predicate_arbitrary_scalar_phi).
>>>>
>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>> treat it as an 'else' case.  That even works for
>>>>
>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>
>>>> where the condition for edges 5 and 7 can be computed as
>>>> ! (condition for 3 || condition for 4).
>>>>
>>>> Of course it is worthwhile to also sort single-occurances first
>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>> used for edges 3 and 4 combined.
>>>>
>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>> only critical incoming edges and both contain code computing edge
>>>>> predicates, e.g.
>>>>>
>>>>> <bb 7>:
>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>> _46 = xmax_17 == xmax_37;
>>>>> _47 = xmax_17 == xmax_27;
>>>>> _48 = _46 & _47;
>>>>> _53 = xmax_17 == xmax_37;
>>>>> _54 = ~_53;
>>>>> _55 = xmax_17 == xmax_27;
>>>>> _56 = _54 & _55;
>>>>> _57 = _48 | _56;
>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>> goto <bb 11>;
>>>>>
>>>>> It is evident that we can not put phi predication at the block
>>>>> beginning but need to put it after predicate computations.
>>>>> Note also that if there are no critical edges for phi arguments
>>>>> insertion point will be "after labels" Note also that phi result can
>>>>> have use in this block too, so we can't put predication code to the
>>>>> block end.
>>>>
>>>> So the issue is that predicate insertion for edge predicates does
>>>> not happen on the edge but somewhere else (generally impossible
>>>> for critical edges unless you split them).
>>>>
>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>> like splitting the edge!  Certainly not involving a function walking
>>>> GENERIC expressions.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Let me know if you still have any questions.
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Here is the second patch related to extended predication.
>>>>>>> Few comments which explain a main goal of design.
>>>>>>>
>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>> lead to less efficient binaries.
>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>> scalar reduction is applied.
>>>>>>> This is correspondent to the following statement:
>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>> are already inserted above this block and
>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>> simply found out the last statement in block defining predicates
>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>> after it (with some minor exceptions).
>>>>>>
>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>
>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>
>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>
>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>
>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>
>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>> non-critical incoming edge.
>>>>>>> (phi_has_two_different_args): New function.
>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>> containing reduction statement candidate is predecessor
>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>> statement before/after gsi point.
>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>> convert_scalar_cond_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>> EXTENDED value.
>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>> is copy of inner or outer loop field force_vectorize.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]