This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.


On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> In my initial design I did such splitting but before start real
> if-conversion but I decided to not perform it since code size for
> if-converted loop is growing (number of phi nodes is increased). It is
> worth noting also that for phi with #nodes > 2 we need to get all
> predicates (except one) to do phi-predication and it means that block
> containing such phi can have only 1 critical edge.

Can you point me to the patch with the special insertion code then?
I definitely want to avoid the mess we ran into with the reassoc
code "clever" insertion code.

Richard.

> Thanks.
> Yuri.
>
> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I saw the sources of these functions, but I can't understand why I
>>>> should use something else? Note that all predicate computations are
>>>> located in basic blocks ( by design of if-conv) and there is special
>>>> function that put these computations in bb
>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>> computations. New function - find_insertion_point() does very simple
>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>> predicates taken from all incoming edges.
>>>> In original algorithm the predicate of non-critical edge is taken to
>>>> perform phi-node predication since for critical edge it does not work
>>>> properly.
>>>>
>>>> My question is: does your comments mean that I should re-design my extensions?
>>>
>>> Well, we have infrastructure for inserting code on edges and you've
>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>> I realize this is very similar to my initial suggestion to simply split
>>> critical edges in loops you want to if-convert but delays splitting
>>> until it turns out to be necessary (which might be good for the
>>> !force_vect case).
>>>
>>> For edge predicates you simply can emit their computation on the
>>> edge, no?
>>>
>>> Btw, I very originally suggested to rework if-conversion to only
>>> record edge predicates - having both block and edge predicates
>>> somewhat complicates the code and makes it harder to
>>> maintain (thus also the suggestion to simply split critical edges
>>> if necessary to make BB predicates work always).
>>>
>>> Your patches add a lot of code and to me it seems we can avoid
>>> doing so much special casing.
>>
>> For example attacking the critical edge issue by a simple
>>
>> Index: tree-if-conv.c
>> ===================================================================
>> --- tree-if-conv.c      (revision 216508)
>> +++ tree-if-conv.c      (working copy)
>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>         if (EDGE_COUNT (e->src->succs) == 1)
>>           found = true;
>>        if (!found)
>> -       {
>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>> -           fprintf (dump_file, "only critical predecessors\n");
>> -         return false;
>> -       }
>> +       split_edge (EDGE_PRED (bb, 0));
>>      }
>>
>>    return true;
>>
>> it changes the number of blocks in the loop, so
>> get_loop_body_in_if_conv_order should probably be re-done with the
>> above eventually signalling that it created a new block.  Or the above
>> should populate a vector of edges to split and do that after the
>> loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> BTW Jeff did initial review of my changes related to predicate
>>>> computation for join blocks. I presented him updated patch with
>>>> test-case and some minor changes in patch. But still did not get any
>>>> feedback on it. Could you please take a look also on it?
>>>>
>>>>
>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>> with critical incoming edges only performs another function which is
>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>
>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>> only but you propose to use it for tree also.
>>>>>> Did I miss something?
>>>>>
>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>> if you want easy access to the newly created basic block to push
>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks ahead.
>>>>>>
>>>>>>
>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>> in the future (more precise in patch.4).
>>>>>>>
>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>
>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>
>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Could you please review it.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>> after adding support for extended predication.
>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>
>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your answer!
>>>>>>>>>
>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>> did you mean by:
>>>>>>>>>>>
>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>correctly.
>>>>>>>>>>>
>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>
>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>
>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>> is ok.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>
>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>
>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>> +             return false;
>>>>>>>>>>>> +           }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        }
>>>>>>>>>>>>
>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>
>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>
>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>> +{
>>>>>>>>>>>>
>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>> +       return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>
>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>
>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>> correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoÑation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoÑation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]