This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/3] Extended if-conversion


Richard,

Sorry that I forgot to delete debug dump from my fix.
I have few questions about your comments.

1. You wrote :
> You also still have two functions for PHI predication.  And the
> new extended variant doesn't commonize the 2-args and general
> path
 Did you mean that I must combine predicate_scalar_phi and
predicate_extended scalar phi to one function?
Please note that if additional flag was not set up (i.e.
aggressive_if_conv is false) extended predication is required more
compile time since it builds hash_map.

2. About critical edge splitting.

Did you mean that we should perform it (1) under aggressive_if_conv
option only; (2) should we split all critical edges.
Note that this leads to recomputing of topological order.

It is worth noting that in current implementation bb's with 2
predecessors and both are on critical edges are accepted without
additional option.

Thanks ahead.
Yuri.
2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Here is updated patch2 with the following changes:
>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>> 2. Use only one function for extended predication -
>> predicate_extended_scalar_phi.
>> 3. Save gsi before insertion of predicate computations for basic
>> blocks if it has 2 predecessors and
>> both incoming edges are critical or it gas more than 2 predecessors
>> and at least one incoming edge
>> is critical. This saved iterator can be used by extended phi predication.
>>
>> Here is motivated test-case which explains this point.
>> Test-case is attached (t5.c) and it must be compiled with -O2
>> -ftree-loop-vectorize -fopenmp options.
>> The problem phi is in bb-7:
>>
>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>   {
>>     <bb 5>:
>>     xmax_edge_18 = xmax_edge_36 + 1;
>>     if (xmax_17 == xmax_27)
>>       goto <bb 7>;
>>     else
>>       goto <bb 9>;
>>
>>   }
>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>   {
>>     <bb 6>:
>>     if (xmax_17 == xmax_27)
>>       goto <bb 7>;
>>     else
>>       goto <bb 8>;
>>
>>   }
>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>   {
>>     <bb 7>:
>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>     xmax_edge_19 = xmax_edge_39 + 1;
>>     goto <bb 11>;
>>
>>   }
>>
>> Note that both incoming edges to bb_7 are critical. If we comment out
>> restoring gsi in predicate_all_scalar_phi:
>> #if 0
>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>    gsi = bb_insert_point (bb);
>>  else
>> #endif
>>    gsi = gsi_after_labels (bb);
>>
>> we will get ICE:
>> t5.c: In function 'foo':
>> t5.c:9:6: error: definition in block 4 follows the use
>>  void foo (int n)
>>       ^
>> for SSA_NAME: _1 in statement:
>> _52 = _1 & _3;
>> t5.c:9:6: internal compiler error: verify_ssa failed
>>
>> smce predicate computations were inserted in bb_7.
>
> The issue is obviously that the predicates have already been emitted
> in the target BB - that's of course the wrong place.  This is done
> by insert_gimplified_predicates.
>
> This just shows how edge predicate handling is broken - we don't
> seem to have a sequence of gimplified stmts for edge predicates
> but push those to e->dest which makes this really messy.
>
> Rather than having a separate phase where we insert all
> gimplified bb predicates we should do that on-demand when
> predicating a PHI.
>
> Your patch writes to stderr - that's bad - use dump_file and guard
> the printfs properly.
>
> You also still have two functions for PHI predication.  And the
> new extended variant doesn't commonize the 2-args and general
> paths.
>
> I'm not at all happy with this code.  It may be existing if-conv codes
> fault but making it even worse is not an option.
>
> Again - what's wrong with simply splitting critical edges if
> aggressive_if_conv?  I think that would very much simplify
> things here.  Or alternatively use gsi_insert_on_edge and
> commit edge insertions before merging the blocks.
>
> Thanks,
> Richard.
>
>> ChangeLog is
>>
>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c : Include hash-map.h.
>> (struct bb_predicate_s): Add new field to save copy of gimple
>> statement iterator.
>> (bb_insert_point): New function.
>> (set_bb_insert_point): New function.
>> (has_pred_critical_p): New function.
>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>> AGGRESSIVE_IF_CONV is true.
>> (if_convertible_bb_p): Delete check that bb has at least one
>> non-critical incoming edge.
>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>> Allow interchange PHI arguments if EXTENDED is false.
>> Change check that block containing reduction statement candidate
>> is predecessor of phi-block since phi may have more than two arguments.
>> (predicate_scalar_phi): Add new arguments for call of
>> is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (struct phi_args_hash_traits): New type.
>> (phi_args_hash_traits::hash): New function.
>> (phi_args_hash_traits::equal_keys): New function.
>> (gen_phi_arg_condition): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>> to true if BB containing phi has more than 2 predecessors or both
>> incoming edges are critical. Invoke find_phi_replacement_condition and
>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>> has 2 predecessors and both incoming edges are critical or it has more
>> than 2 predecessors and atleast one incoming edge is critical.
>> Use standard gsi_after_labels otherwise.
>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>> to save gsi before insertion of predicate computations. SEt-up it to
>> true for BB with 2 predecessors and critical incoming edges either
>>         number of predecessors is geater 2 and at least one incoming edge is
>> critical.
>> Add check that non-predicated block may have statements to insert.
>> Insert predicate computation of BB just after label if
>> EXTENDED_PREDICATION is true.
>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>> is copy of inner or outer loop force_vectorize field.
>>
>>
>>
>>
>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I did simple change by saving gsi iterator for each bb that has
>>>> critical edges by adding additional field to bb_predicate_s:
>>>>
>>>> typedef struct bb_predicate_s {
>>>>
>>>>   /* The condition under which this basic block is executed.  */
>>>>   tree predicate;
>>>>
>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>      recorded here, in order to avoid the duplication of computations
>>>>      that occur in previous conditions.  See PR44483.  */
>>>>   gimple_seq predicate_gimplified_stmts;
>>>>
>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>   gimple_stmt_iterator gsi;
>>>> } *bb_predicate_p;
>>>>
>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>> insertion code for predicate computation. I checked that this fix
>>>> works.
>>>
>>> Huh?  I still wonder what the issue is with inserting everything
>>> after the PHI we predicate.
>>>
>>> Well, your updated patch will come with testcases for the testsuite
>>> that will hopefully fail if doing that.
>>>
>>> Richard.
>>>
>>>>
>>>> Now I am implementing merging of predicate_extended.. and
>>>> predicate_arbitrary.. functions as you proposed.
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard for your quick reply!
>>>>>>
>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>> 2. What is your opinion about using more simple decision about
>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>> splitting is not a good decision.
>>>>>
>>>>> Why not always insert before the use?  Which would be after labels,
>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>
>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>
>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>
>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>   {
>>>>>>>>     float t = a[i];
>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>> res += 1;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>
>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>
>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>
>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>
>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>
>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>
>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>> used for edges 3 and 4 combined.
>>>>>>>
>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>> predicates, e.g.
>>>>>>>>
>>>>>>>> <bb 7>:
>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>> _48 = _46 & _47;
>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>> _54 = ~_53;
>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>> _56 = _54 & _55;
>>>>>>>> _57 = _48 | _56;
>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>> goto <bb 11>;
>>>>>>>>
>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>> block end.
>>>>>>>
>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>> for critical edges unless you split them).
>>>>>>>
>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>> GENERIC expressions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Let me know if you still have any questions.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>
>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>> are already inserted above this block and
>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>
>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>
>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>
>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>
>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>
>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>
>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>> EXTENDED value.
>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>> is copy of inner or outer loop field force_vectorize.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]