This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/3] Extended if-conversion


On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> I resend you patch1 and patch2 with minor changes:
> 1. I renamed flag_force_vectorize to aggressive_if_conv.
> 2. Use static cast for the first argument of gimple_phi_arg_edge.
> I also very sorry that I sent you bad patch.
>
> Now let me answer on your questions related to second patch.
> 1. Why we need both predicate_extended_scalar_phi and
> predicate_arbitrary_scalar_phi?
>
> Let's consider the following simple test-case:
>
>   #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)  /* c is integer array. */
> res += 1;
>   }
>
> we can see the following phi node correspondent to res:
>
> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>
> It is clear that we can optimize it to phi node with 2 arguments only
> and only one check can be used for phi predication (for reduction in
> our case), namely predicate of bb_5. In general case we can't do it
> even if we sort all phi argument values since we still have to produce
> a chain of cond expressions to perform phi predication (see comments
> for predicate_arbitrary_scalar_phi).

How so?  We can always use !(condition) for the "last" value, thus
treat it as an 'else' case.  That even works for

# res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>

where the condition for edges 5 and 7 can be computed as
! (condition for 3 || condition for 4).

Of course it is worthwhile to also sort single-occurances first
so your case gets just the condiiton for edge 5 and its inversion
used for edges 3 and 4 combined.

> 2. Why we need to introduce find_insertion_point?
>  Let's consider another test-case extracted from 175.vpr ( t5.c is
> attached) and we can see that bb_7 and bb_9 containig phi nodes has
> only critical incoming edges and both contain code computing edge
> predicates, e.g.
>
> <bb 7>:
> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
> _46 = xmax_17 == xmax_37;
> _47 = xmax_17 == xmax_27;
> _48 = _46 & _47;
> _53 = xmax_17 == xmax_37;
> _54 = ~_53;
> _55 = xmax_17 == xmax_27;
> _56 = _54 & _55;
> _57 = _48 | _56;
> xmax_edge_19 = xmax_edge_39 + 1;
> goto <bb 11>;
>
> It is evident that we can not put phi predication at the block
> beginning but need to put it after predicate computations.
> Note also that if there are no critical edges for phi arguments
> insertion point will be "after labels" Note also that phi result can
> have use in this block too, so we can't put predication code to the
> block end.

So the issue is that predicate insertion for edge predicates does
not happen on the edge but somewhere else (generally impossible
for critical edges unless you split them).

I think I've told you before that I prefer simple solutions to such issues,
like splitting the edge!  Certainly not involving a function walking
GENERIC expressions.

Thanks,
Richard.

> Let me know if you still have any questions.
>
> Best regards.
> Yuri.
>
>
>
>
> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is the second patch related to extended predication.
>>> Few comments which explain a main goal of design.
>>>
>>> 1. I don't want to insert any critical edge splitting since it may
>>> lead to less efficient binaries.
>>> 2. One special case of extended PHI node predication was introduced
>>> when #arguments is more than 2 but only two arguments are different
>>> and one argument has the only occurrence. For such PHI conditional
>>> scalar reduction is applied.
>>> This is correspondent to the following statement:
>>>     if (q1 && q2 && q3) var++
>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>> 3. Original algorithm for PHI predication used assumption that at
>>> least one incoming edge for blocks containing PHI is not critical - it
>>> guarantees that all computations related to predicate of normal edge
>>> are already inserted above this block and
>>> code related to PHI predication can be inserted at the beginning of
>>> block. But this is not true for critical edges for which predicate
>>> computations are  in the block where code for phi predication must be
>>> inserted. So new function find_insertion_point is introduced which is
>>> simply found out the last statement in block defining predicates
>>> correspondent to all incoming edges and insert phi predication code
>>> after it (with some minor exceptions).
>>
>> Unfortunately the patch doesn't apply for me - I get
>>
>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>> predicate_all_scalar_phis (struct loop *loop)
>>
>> a few remarks nevertheless.  I don't see how we need both
>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>> Couldn't we simply sort an array of (edge, value) pairs after value
>> and handle equal values specially in predicate_extended_scalar_phi?
>> That would even make PHI <a, a, b, c, c> more optimal.
>>
>> I don't understand the need for find_insertion_point.  All SSA names
>> required for the predicates are defined upward - and the complex CFG
>> is squashed to a single basic-block, thus the defs will dominate the
>> inserted code if you insert after labels just like for the other case.
>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>> to happen in dominator order - but I guess that happens already?)
>>
>> I'd like the extended PHI handling to be enablable by a flag even
>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>> times that would have been nice to vectorize.  I suggest to
>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>> followup, but please rename the local flag_force_vectorize flag
>> to something less looking like a flag, like simply 'aggressive'.
>>
>> Otherwise patch 2 looks ok to me.
>>
>> Richard.
>>
>>
>>> ChangeLog:
>>>
>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>> FLAG_FORCE_VECTORIZE is true.
>>> (if_convertible_bb_p): Delete check that bb has at least one
>>> non-critical incoming edge.
>>> (phi_has_two_different_args): New function.
>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>> arguments if EXTENDED is true. Change check that block
>>> containing reduction statement candidate is predecessor
>>> of phi-block since phi may have more than two arguments.
>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>> statement before/after gsi point.
>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>> predication) to call of is_cond_scalar_reduction. Add argument
>>> true (which correspondent to argument BEFORE) to call of
>>> convert_scalar_cond_reduction.
>>> (get_predicate_for_edge): New function.
>>> (predicate_arbitrary_scalar_phi): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (find_insertion_point): New function.
>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>> than 2 predecessors or both incoming edges are critical. Invoke
>>> find_phi_replacement_condition and predicate_scalar_phi or
>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>> EXTENDED value.
>>> (insert_gimplified_predicates): Add check that non-predicated block
>>> may have statements to insert. Insert predicate of BB just after label
>>> if FLAG_FORCE_VECTORIZE is true.
>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>> is copy of inner or outer loop field force_vectorize.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]