[PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

Yuri Rumyantsev ysrumyan@gmail.com
Fri Nov 7 14:08:00 GMT 2014


Richard,

Did you have a chance to look at it?

Thanks.
Yuri.

2014-10-24 14:21 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Patch containing new core related to extended predication is attached.
> Here is few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries (I remember some performance issue
> when we designed lazy code motion algorithm in SPARC compiler).
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following:
>     if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal edge
> are already inserted above this block and
> core related to PHI predication can be inserted at the beginning of
> block. But this is not true for critical edges for which predicate
> computations are  in the block where code for phi predication must be
> inserted. So new function find_insertion_point is introduced which is
> simply found out the last statement in block defining predicates
> correspondent to all incoming edges and insert phi predication code
> after it (with some minor exceptions).
>
> If you need more comments or something unclear will let me know.
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
> FLAG_FORCE_VECTORIZE instead of loop flag.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> FLAG_FORCE_VECTORIZE is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (phi_has_two_different_args): New function.
> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
> to phi arguments. Invoke phi_has_two_different_args to get phi
> arguments if EXTENDED is true. Change check that block
> containing reduction statement candidate is predecessor
> of phi-block since phi may have more than two arguments.
> (convert_scalar_cond_reduction): Add argument BEFORE to insert
> statement before/after gsi point.
> (predicate_scalar_phi): Add argument false (which means non-extended
> predication) to call of is_cond_scalar_reduction. Add argument
> true (which correspondent to argument BEFORE) to call of
> convert_scalar_cond_reduction.
> (get_predicate_for_edge): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (find_insertion_point): New function.
> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
> BEFORE. Initialize EXTENDED to true if BB containing phi has more
> than 2 predecessors or both incoming edges are critical. Invoke
> find_phi_replacement_condition and predicate_scalar_phi or
> find_insertion_point and predicate_extended_scalar_phi depending on
> EXTENDED value.
> (insert_gimplified_predicates): Add check that non-predicated block
> may have statements to insert. Insert predicate of BB just after label
> if FLAG_FORCE_VECTORIZE is true.
> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
> is copy of inner or outer loop field force_vectorize.
>
>
> 2014-10-24 13:12 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> In my initial design I did such splitting but before start real
>>> if-conversion but I decided to not perform it since code size for
>>> if-converted loop is growing (number of phi nodes is increased). It is
>>> worth noting also that for phi with #nodes > 2 we need to get all
>>> predicates (except one) to do phi-predication and it means that block
>>> containing such phi can have only 1 critical edge.
>>
>> Can you point me to the patch with the special insertion code then?
>> I definitely want to avoid the mess we ran into with the reassoc
>> code "clever" insertion code.
>>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I saw the sources of these functions, but I can't understand why I
>>>>>> should use something else? Note that all predicate computations are
>>>>>> located in basic blocks ( by design of if-conv) and there is special
>>>>>> function that put these computations in bb
>>>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>>>> computations. New function - find_insertion_point() does very simple
>>>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>>>> predicates taken from all incoming edges.
>>>>>> In original algorithm the predicate of non-critical edge is taken to
>>>>>> perform phi-node predication since for critical edge it does not work
>>>>>> properly.
>>>>>>
>>>>>> My question is: does your comments mean that I should re-design my extensions?
>>>>>
>>>>> Well, we have infrastructure for inserting code on edges and you've
>>>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>>>> I realize this is very similar to my initial suggestion to simply split
>>>>> critical edges in loops you want to if-convert but delays splitting
>>>>> until it turns out to be necessary (which might be good for the
>>>>> !force_vect case).
>>>>>
>>>>> For edge predicates you simply can emit their computation on the
>>>>> edge, no?
>>>>>
>>>>> Btw, I very originally suggested to rework if-conversion to only
>>>>> record edge predicates - having both block and edge predicates
>>>>> somewhat complicates the code and makes it harder to
>>>>> maintain (thus also the suggestion to simply split critical edges
>>>>> if necessary to make BB predicates work always).
>>>>>
>>>>> Your patches add a lot of code and to me it seems we can avoid
>>>>> doing so much special casing.
>>>>
>>>> For example attacking the critical edge issue by a simple
>>>>
>>>> Index: tree-if-conv.c
>>>> ===================================================================
>>>> --- tree-if-conv.c      (revision 216508)
>>>> +++ tree-if-conv.c      (working copy)
>>>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>>>         if (EDGE_COUNT (e->src->succs) == 1)
>>>>           found = true;
>>>>        if (!found)
>>>> -       {
>>>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>>>> -           fprintf (dump_file, "only critical predecessors\n");
>>>> -         return false;
>>>> -       }
>>>> +       split_edge (EDGE_PRED (bb, 0));
>>>>      }
>>>>
>>>>    return true;
>>>>
>>>> it changes the number of blocks in the loop, so
>>>> get_loop_body_in_if_conv_order should probably be re-done with the
>>>> above eventually signalling that it created a new block.  Or the above
>>>> should populate a vector of edges to split and do that after the
>>>> loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> BTW Jeff did initial review of my changes related to predicate
>>>>>> computation for join blocks. I presented him updated patch with
>>>>>> test-case and some minor changes in patch. But still did not get any
>>>>>> feedback on it. Could you please take a look also on it?
>>>>>>
>>>>>>
>>>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>>>> with critical incoming edges only performs another function which is
>>>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>>>
>>>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>>>> only but you propose to use it for tree also.
>>>>>>>> Did I miss something?
>>>>>>>
>>>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>>>> if you want easy access to the newly created basic block to push
>>>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>>>> in the future (more precise in patch.4).
>>>>>>>>>
>>>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>>>
>>>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>>>
>>>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please review it.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>>>> after adding support for extended predication.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>>
>>>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>>>> did you mean by:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>>correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>>>> is ok.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>>>> +             return false;
>>>>>>>>>>>>>> +           }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.



More information about the Gcc-patches mailing list