[PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

Yuri Rumyantsev ysrumyan@gmail.com
Tue Oct 21 14:36:00 GMT 2014


Richard,

In my initial design I did such splitting but before start real
if-conversion but I decided to not perform it since code size for
if-converted loop is growing (number of phi nodes is increased). It is
worth noting also that for phi with #nodes > 2 we need to get all
predicates (except one) to do phi-predication and it means that block
containing such phi can have only 1 critical edge.

Thanks.
Yuri.

2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I saw the sources of these functions, but I can't understand why I
>>> should use something else? Note that all predicate computations are
>>> located in basic blocks ( by design of if-conv) and there is special
>>> function that put these computations in bb
>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>> computations. New function - find_insertion_point() does very simple
>>> search - it finds out the latest (in current bb) operand def-stmt of
>>> predicates taken from all incoming edges.
>>> In original algorithm the predicate of non-critical edge is taken to
>>> perform phi-node predication since for critical edge it does not work
>>> properly.
>>>
>>> My question is: does your comments mean that I should re-design my extensions?
>>
>> Well, we have infrastructure for inserting code on edges and you've
>> made critical edges predicated correctly.  So why re-invent the wheel?
>> I realize this is very similar to my initial suggestion to simply split
>> critical edges in loops you want to if-convert but delays splitting
>> until it turns out to be necessary (which might be good for the
>> !force_vect case).
>>
>> For edge predicates you simply can emit their computation on the
>> edge, no?
>>
>> Btw, I very originally suggested to rework if-conversion to only
>> record edge predicates - having both block and edge predicates
>> somewhat complicates the code and makes it harder to
>> maintain (thus also the suggestion to simply split critical edges
>> if necessary to make BB predicates work always).
>>
>> Your patches add a lot of code and to me it seems we can avoid
>> doing so much special casing.
>
> For example attacking the critical edge issue by a simple
>
> Index: tree-if-conv.c
> ===================================================================
> --- tree-if-conv.c      (revision 216508)
> +++ tree-if-conv.c      (working copy)
> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>         if (EDGE_COUNT (e->src->succs) == 1)
>           found = true;
>        if (!found)
> -       {
> -         if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "only critical predecessors\n");
> -         return false;
> -       }
> +       split_edge (EDGE_PRED (bb, 0));
>      }
>
>    return true;
>
> it changes the number of blocks in the loop, so
> get_loop_body_in_if_conv_order should probably be re-done with the
> above eventually signalling that it created a new block.  Or the above
> should populate a vector of edges to split and do that after the
> loop calling if_convertible_bb_p.
>
> Richard.
>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> BTW Jeff did initial review of my changes related to predicate
>>> computation for join blocks. I presented him updated patch with
>>> test-case and some minor changes in patch. But still did not get any
>>> feedback on it. Could you please take a look also on it?
>>>
>>>
>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>> with critical incoming edges only performs another function which is
>>>>> absent (predicate_extended_scalar_phi).
>>>>>
>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>> only but you propose to use it for tree also.
>>>>> Did I miss something?
>>>>
>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>> if you want easy access to the newly created basic block to push
>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>>
>>>>>
>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>> in the future (more precise in patch.4).
>>>>>>
>>>>>> But the same reasoning applies to this version of the patch when
>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>
>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>
>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Could you please review it.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_preds_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>> after adding support for extended predication.
>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>
>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Thanks for your answer!
>>>>>>>>
>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>> did you mean by:
>>>>>>>>>>
>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>correctly.
>>>>>>>>>>
>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>
>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>
>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>> is ok.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>
>>>>>>>>>> I attached modified patch.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>
>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>> +             return false;
>>>>>>>>>>> +           }
>>>>>>>>>>> +
>>>>>>>>>>> +        }
>>>>>>>>>>>
>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>
>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>
>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>> +
>>>>>>>>>>> +static inline bool
>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>> +{
>>>>>>>>>>>
>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>
>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>> +       return false;
>>>>>>>>>>> +    }
>>>>>>>>>>>
>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>
>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>> correctly.
>>>>>>>>>>>
>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>> more.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>> +       {
>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>
>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>
>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>
>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>
>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.



More information about the Gcc-patches mailing list