This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.


Richard,

Thanks for your answer!

In current implementation phi node conversion assume that one of
incoming edge to bb containing given phi has at least one non-critical
edge and choose it to insert predicated code. But if we choose
critical edge we need to determine insert point and insertion
direction (before/after) since in other case we can get invalid ssa
form (use before def). This is done by my new function which is not in
current patch ( I will present this patch later). SO I assume that we
need to leave this patch as it is to not introduce new bugs.

Thanks.
Yuri.

2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I reworked the patch as you proposed, but I didn't understand what
>> did you mean by:
>>
>>>So please rework the patch so critical edges are always handled
>>>correctly.
>>
>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>> with more than 2 arguments; (2) to reject basic blocks with only
>> critical incoming edges since support for extended predication of phi
>> nodes will be in next patch.
>
> I mean that (2) should not be rejected dependent on flag_force_vectorize.
> It was rejected because if-cvt couldn't handle it correctly before but with
> this patch this is fixed.  I see no reason to still reject this then even
> for !flag_force_vectorize.
>
> Rejecting PHIs with more than two arguments with flag_force_vectorize
> is ok.
>
> Richard.
>
>> Could you please clarify your statement.
>>
>> I attached modified patch.
>>
>> ChangeLog:
>>
>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>> if destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Use call of all_preds_critical_p
>> to reject block if-conversion with incoming critical edges only if
>> FLAG_FORCE_VECTORIZE was not set-up.
>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>> to compute predicate instead of fold_build2_loc.
>> Add zeroing of edge 'aux' field.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>> Nullify 'aux' field of edges for blocks with two successors.
>>
>>
>>
>>
>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>> Could you please look at it ( I have already sent the patch with
>>>> changes in add_to_predicate_list for review).
>>>
>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>> +             return false;
>>> +           }
>>> +
>>> +        }
>>>
>>> Excess vertical space.
>>>
>>>
>>> +/* Assumes that BB has more than 2 predecessors.
>>>
>>> More than 1 predecessor?
>>>
>>> +   Returns false if at least one successor is not on critical edge
>>> +   and true otherwise.  */
>>> +
>>> +static inline bool
>>> +all_edges_are_critical (basic_block bb)
>>> +{
>>>
>>> "all_preds_critical_p" would be a better name
>>>
>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>> +    {
>>> +      if (!flag_force_vectorize)
>>> +       return false;
>>> +    }
>>>
>>> as I said in the last review I don't think we should restrict edge
>>> predicates to flag_force_vectorize.  At least I can't see how
>>> if-conversion is magically more expensive for that case?
>>>
>>> So please rework the patch so critical edges are always handled
>>> correctly.
>>>
>>> Ok with that and the above suggested changes.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
>>>> Thanks.
>>>> Yuri.
>>>> ChangeLog
>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>> if destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>> to reject block if-conversion with incoming critical edges only if
>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>> to compute predicate instead of fold_build2_loc.
>>>> Add zeroing of edge 'aux' field.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>
>>>>
>>>>
>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>
>>>>>> Second part of patch will be sent later.
>>>>>
>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>> more.
>>>>>
>>>>>  static inline void
>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>  {
>>>>> ...
>>>>>
>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>> +      if (dom_bb != loop->header
>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>> +       {
>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>> +         bc = bb_predicate (dom_bb);
>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>
>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>
>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>> +
>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>> +    set_edge_predicate (e, cond);
>>>>>
>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>> the case where e->src dominates e->dest).
>>>>>
>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>> for that).  So stuff like
>>>>>
>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>> +         if (flag_force_vectorize)
>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>
>>>>> shouldn't be necessary.
>>>>>
>>>>> I think the edge predicate handling should also be unconditionally
>>>>> and not depend on flag_force_vectorize.
>>>>>
>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>> +      if (bb == loop->latch
>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>
>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>> loop->latch bb predicate the change looks broken.
>>>>>
>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>> +            For example, such conversion is generated for sequence:
>>>>> +               _Bool _7, _8, _9;
>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>> +
>>>>> +         if (CONVERT_EXPR_P (c)
>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>
>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>
>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>> -                                    unshare_expr (cond), c2);
>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>> +                                    unshare_expr (c2));
>>>>>
>>>>> why is it necessary to unshare c2?
>>>>>
>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>> that in detail).
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>
>>>>>> Changelog.
>>>>>>
>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>> for join blocks if it exists.
>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>> that extended predication must be applied).
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>> for innermost loop marked with pragma omp simd and
>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>> for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>> Richard,
>>>>>>>
>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>> Let's me also answer on your comments.
>>>>>>>
>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>
>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>     c = bb_predicate (b);
>>>>>>>   else
>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>     c = edge_predicate (e);
>>>>>>>
>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>> local-dce function in next patch.
>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>> scalar reduction can be applied also.
>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>> omp simd only.
>>>>>>>
>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (convert_name_to_cmp): New function.
>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>> for join blocks if it exists.
>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>> that extended predication must be applied).
>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>> for blocks with two successors.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard!
>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>
>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>> negate_predicate was deleted.
>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>> be critical.
>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>> blocks to simplify it.
>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>> of kind
>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>> arguments.
>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>
>>>>>>>>> Updated patch is attached.
>>>>>>>>>
>>>>>>>>> Any comments will be appreciated.
>>>>>>>>
>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>> it hard to review.
>>>>>>>>
>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>> all this added logic.
>>>>>>>>
>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>
>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>
>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>> +   conditional expressions.  */
>>>>>>>>
>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>
>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>
>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>
>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>
>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>> add_to_predicate_list.
>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>> (predicate_scalar_phi): Add invoÑation of convert_name_to_cmp if cond
>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>> predication to build mask.
>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>
>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>> include:
>>>>>>>>>>>
>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>    predecessors.
>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>
>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>
>>>>>>>>>> No?
>>>>>>>>>>
>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>> with some limitations:
>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>> with next edge.
>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>
>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>
>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>
>>>>>>>>>> becomes
>>>>>>>>>>
>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>
>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>
>>>>>>>>>> becomes
>>>>>>>>>>
>>>>>>>>>>   bb 5:
>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>
>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>
>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>
>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>
>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>   {
>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>> res += 1;
>>>>>>>>>>>   }
>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>
>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>
>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>> equal to false.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>> (predicate_scalar_phi): Add invoÑation of convert_name_to_cmp if cond
>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>> predication to build mask.
>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>> innermost loop marked with pragma omp simd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]