This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH 2/3] Extended if-conversion

From: Richard Biener <richard dot guenther at gmail dot com>
To: Yuri Rumyantsev <ysrumyan at gmail dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, Igor Zamyatin <izamyatin at gmail dot com>
Date: Wed, 14 Jan 2015 15:40:10 +0100
Subject: Re: [PATCH 2/3] Extended if-conversion
Authentication-results: sourceware.org; auth=none
References: <CAEoMCqSS=M5vOV46ZdjNKpELqGqFHOi5ct39QMRg5NdO5uzfWg at mail dot gmail dot com> <CAFiYyc3ONiHL1nsLK1MSEqZR94pHYmb+rK-M24n2fzACvFDiVA at mail dot gmail dot com> <CAEoMCqQX5qEExZ7Xtz_JNzus5FQizvWkbFSW5y+TGNjt71BHaQ at mail dot gmail dot com> <CAFiYyc2TKP4FxbZ7L8=iDVgx-nQC_Dorz5tqaidSa+5vS_EAPQ at mail dot gmail dot com> <CAEoMCqR3i8PgA3RF9dS6QrNBpBAujiaEN0xtLqeMUrcgMCgLfA at mail dot gmail dot com> <CAFiYyc3Cj0E37qBc5scafkS5OoecczKqt0zfv_upkGPAM-9Q2g at mail dot gmail dot com> <CAEoMCqTC5XHC2QjLrMPjNGj7f6APpgF6FXMxj+8ZNN_F=Bx6fA at mail dot gmail dot com> <CAFiYyc0dmN=tNAEhbwpKt8UVmdZfEdWZENVzAkQmE75n+Cc4PQ at mail dot gmail dot com> <CAEoMCqTB59Z-mcvgme4G7zpVTjiHE4T_vRf86HN3ECnrwmiR=Q at mail dot gmail dot com> <CAFiYyc3-3zUrpFe1KCWWisZbk6o+vooWuda_mV-W923Frbkufg at mail dot gmail dot com> <CAEoMCqTfFPZYWd0jU9XATtSeQ8ccfSzmvUeh+UDbAFWXENOphg at mail dot gmail dot com> <CAFiYyc0fVbik5mcWGOarAomBsFJMva6nXRFhsU_Pf=KufzjUmA at mail dot gmail dot com> <CAEoMCqQuyQx0Hu9oi_80Jbp5cq+aJBr+MC56yQeFkM31sG9SRw at mail dot gmail dot com> <CAFiYyc2LB3hnfp6uagZr+J_k6PT+K61ufxx_SsouKYE=h7=t2Q at mail dot gmail dot com> <CAEoMCqQz+LgqMAYVCrO9JGYVyURcU+FY19u4BA3GK3Kk78cEkg at mail dot gmail dot com> <CAFiYyc269c8r=gABMsbu2GKu5mL=wjg-D1j-b+v-9pD8kz9vBg at mail dot gmail dot com> <CAEoMCqT0SFS8w0CLwTMPOBqZraAVcn0pvQ8S4XQRMpDXgzRB_g at mail dot gmail dot com> <CAFiYyc3uGOyHPgmZdutu_DTJUvjbLdbT8aA-uqyY+KXhPwebuQ at mail dot gmail dot com> <CAEoMCqRyFyKLgNUo-nmF1-B2ANVd3JrP_ZRcyMWxF1fbkyZC3g at mail dot gmail dot com> <CAFiYyc2MF7TzmT4a5iJYzHiHZMUcH9uVrjFy6jijYF_7sd9pjg at mail dot gmail dot com> <CAEoMCqQB2VzgZSptTrgzjK2praR8w=SSngOhUqiy+k8bp94B7Q at mail dot gmail dot com>
On Wed, Jan 14, 2015 at 2:14 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did all changes proposed by you and add couple tests.
> Bootstrap, including aggressive one proposed by you, and regression
> testing did not show any new failures.
>
> Is it OK for trunk?

+++ b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-fopenmp" } */

please use { dg-additional-options "-fopenmp-simd" } instead and
vect_simd_clones target, not fopenmp target.

+/* { dg-options "-mavx2 -ffast-math -O3 -fopenmp -fdump-tree-vect-details" } */

likewise (for -ffast-math - don't use -mavx2, instead require a proper
vect target on the scan-tree-dump-times line

It would also be nice to have these runtime testcases so it can
be verified the code executes correctly instead of possibly producing
random crap ;)

The patch is ok with the above changes.

Thanks,
Richard.

> ChangeLog:
>
> 2015-01-14  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c: Include hash-map.h.
> (aggressive_if_conv): New variable.
> (fold_build_cond_expr): Add simplification of non-zero condition.
> (add_to_dst_predicate_list): Invoke add_to_predicate_list if edge
> destination block is not always executed.
> (if_convertible_phi_p): Fix commentary, allow phi nodes have more
> than two predecessors if AGGRESSIVE_IF_CONV is true.
> (if_convertible_stmt_p): Fix commentary.
> (all_preds_critical_p): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true
> BB can have more than two predecessors and all incoming edges can be
> critical.
> (predicate_bbs): Skip predication for loop exit block, use build2_loc
> to compute predicate for true edge.
> (find_phi_replacement_condition): Delete this function.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (phi_args_hash_traits): New helper structure.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_scalar_phi): Add handling of phi nodes with more than two
> arguments, delete COND and TRUE_BB arguments, insert body of
> find_phi_replacement_condition to predicate ordinary phi nodes.
> (predicate_all_scalar_phis): Skip blocks with the only predecessor,
> delete call of find_phi_replacement_condition and invoke
> predicate_scalar_phi with two arguments.
> (insert_gimplified_predicates): Add assert that non-predicated block
> don't have statements to insert.
> (ifcvt_split_critical_edges): New function.
> (ifcvt_split_def_stmt): Likewise.
> (ifcvt_walk_pattern_tree): Likewise.
> (stmt_is_root_of_bool_pattern): Likewise.
> (ifcvt_repair_bool_pattern): Likewise.
> (ifcvt_local_dce): Likewise.
> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
> is copy of inner or outer loop force_vectorize field, invoke
> ifcvt_split_critical_edges, ifcvt_local_dce and
> ifcvt_repair_bool_pattern for aggressive if-conversion.
>
> gcc/testsuite/ChangeLog
>
> * gcc.dg/vect/vect-aggressive-1.c: New test.
> * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
>
> 2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I changed algorithm for bool pattern repair.
>>> It turned out that ifcvt_local_dce phaase is required since for
>>> test-case I sent you in previous mail vectorization is not performed
>>> without dead code elimination:
>>>
>>> For the loop
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>
>>> I've got the following message from vectorizer:
>>>
>>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>>>
>>> t3.c:10:11: note: bit-precision arithmetic not supported.
>>> t3.c:10:11: note: not vectorized: relevant stmt not supported:
>>> _ifc__39 = t_5 > 0.0;
>>>
>>> It is caused by the following dead predicate computations after
>>> critical edge splitting:
>>>
>>> (after combine blocks):
>>>
>>> <bb 3>:
>>> # res_15 = PHI <res_1(7), 0(19)>
>>> # i_16 = PHI <i_11(7), 0(19)>
>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
>>> t_5 = a[i_16];
>>> _6 = t_5 > 0.0;
>>> _7 = t_5 < 9.9999998430674944e+16;
>>> _8 = _6 & _7;
>>> _10 = &c[i_16];
>>> _ifc__36 = _8 ? 4294967295 : 0;
>>> _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>> _28 = _8;
>>> _29 = _9 != 0;
>>> _30 = _28 & _29;
>>> // Statements below are dead!!
>>> _31 = _8;
>>> _32 = _9 != 0;
>>> _33 = ~_32;
>>> _34 = _31 & _33;
>>> // End of dead statements.
>>> _ifc__35 = _30 ? 1 : 0;
>>> res_1 = res_15 + _ifc__35;
>>> i_11 = i_16 + 1;
>>> ivtmp_13 = ivtmp_14 - 1;
>>> if (ivtmp_13 != 0)
>>>   goto <bb 7>;
>>> else
>>>   goto <bb 8>;
>>>
>>> But if we delete these statements loop will be vectorized.
>>
>> Hm, ok.  We insert predicates too early obviously and not only when
>> needed.  But let's fix that later.
>>
>>> New patch is attached.
>>
>>  fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
>>  {
>>    tree rhs1, lhs1, cond_expr;
>> +
>> +  /* If COND is comparison r != 0 and r has boolean type, convert COND
>> +     to SSA_NAME to accept by vect bool pattern.  */
>> +  if (TREE_CODE (cond) == NE_EXPR)
>> +    {
>> +      tree op0 = TREE_OPERAND (cond, 0);
>> +      tree op1 = TREE_OPERAND (cond, 1);
>> +      if (TREE_CODE (op0) == SSA_NAME
>> +         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
>> +         && (integer_zerop (op1)))
>> +       cond = op0;
>> +      else if (TREE_CODE (op1) == SSA_NAME
>> +              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
>> +              && (integer_zerop (op0)))
>> +       cond = op1;
>>
>> The 2nd form, 0 != SSA_NAME doesn't happen due to operand
>> canonicalization.  Please remove its handling.
>>
>> +      if (gimple_phi_num_args (phi) != 2)
>> +       {
>> +         if (!aggressive_if_conv)
>>
>> && !aggressive_if_conv
>>
>> +  if (EDGE_COUNT (bb->preds) > 2)
>> +    {
>> +      if (!aggressive_if_conv)
>>
>> Likewise.
>>
>> -      gimple reduc;
>> +         && (rhs = gimple_phi_arg_def (phi, 0)))) {
>>
>> the { goes to the next line
>>
>>  static void
>>  predicate_mem_writes (loop_p loop)
>>  {
>> -  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> +  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
>> +  tree mask_vec[10];
>>
>> an upper limit of 10?
>>
>> +      for (j=0; j<10; j++)
>>
>> spaces around '<' and '='
>>
>> +       mask_vec[j] = NULL_TREE;
>> +
>>
>> +           gcc_assert (exact_log2 (bitsize) != -1);
>> +           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
>> +             {
>>
>> this seems to be a completely separate "optimization"?  Note that
>> there are targets with non-power-of-two bitsize modes (PSImode),
>> so the assert will likely trigger.  I would prefer if you separate this
>> part of the patch.
>>
>> +      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
>> +       continue;
>>
>> no space before gimple_code
>>
>> +  imm_use_iterator imm_iter;
>> +
>> +
>> +  worklist.create (64);
>>
>> excessive vertical space.
>>
>> The patch misses the addition of new testcases - please add some,
>> otherwise the code will be totally untested.
>>
>> I assume the patch passes bootstrap and regtest (you didn't say so).
>> Can you also do a bootstrap with aggressive_if_conv forced to
>> true and --with-build-config=bootstrap-O3 --disable-werror?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>>>> and patch.2 will let me know and i'll send you reduced patch.
>>>>>
>>>>> Below are few comments regarding your remarks for patch.3.
>>>>>
>>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>>>> when dead code elimination is required to vectorize loop, i.e. dead
>>>>> statement is marked as relevant.
>>>>> 2. You wrote:
>>>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>>>> times instead of just doing sth like
>>>>>>
>>>>>>  while (!has_single_use (lhs))
>>>>>>    {
>>>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>>>      ifcvt_walk_pattern_tree (copy);
>>>>>>    }
>>>>>>
>>>>>> thus returning the copy you create and re-process it (the copy should
>>>>>> now have a single-use).
>>>>>
>>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>>>> but some intermediate variables too. For example, for the following
>>>>> test-case
>>>>>
>>>>> float a[1000];
>>>>> int c[1000];
>>>>>
>>>>> int foo()
>>>>> {
>>>>>   int i, res = 0;
>>>>> #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>       if (c[i] != 0)
>>>>> res += 1;
>>>>>   }
>>>>>   return res;
>>>>> }
>>>>>
>>>>> After combine_blocks we have the following bb:
>>>>>
>>>>> <bb 3>:
>>>>> # res_15 = PHI <res_1(7), 0(15)>
>>>>> # i_16 = PHI <i_11(7), 0(15)>
>>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>>>> t_5 = a[i_16];
>>>>> _6 = t_5 > 0.0;
>>>>> _7 = t_5 < 9.9999998430674944e+16;
>>>>> _8 = _6 & _7;
>>>>> _10 = &c[i_16];
>>>>> _ifc__32 = _8 ? 4294967295 : 0;
>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>>>> _28 = _8;
>>>>> _29 = _9 != 0;
>>>>> _30 = _28 & _29;
>>>>> _ifc__31 = _30 ? 1 : 0;
>>>>> res_1 = res_15 + _ifc__31;
>>>>> i_11 = i_16 + 1;
>>>>> ivtmp_13 = ivtmp_14 - 1;
>>>>> if (ivtmp_13 != 0)
>>>>>   goto <bb 7>;
>>>>> else
>>>>>   goto <bb 8>;
>>>>>
>>>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>>>> _8 = _6 & _7
>>>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>>>> iterative algorithm as the simplest one.
>>>>
>>>> But it walks the entire pattern again and again while you only need to
>>>> ensure you walk the pattern tree of the now single-use DEF again
>>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>>>> you should pass down the user_operand_p that you need to make
>>>> single-use).
>>>>
>>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>>>> for stmt_is_root_of_bool_pattern.
>>>>>
>>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>>>> to check on COND_EXPR only since PHI predication and memory access
>>>>> predication produced only such statements,i.e. it can look like
>>>>>
>>>>> static bool
>>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>>>> {
>>>>>   enum tree_code code;
>>>>>   tree lhs, rhs;
>>>>>
>>>>>   code = gimple_assign_rhs_code (stmt);
>>>>>   if (code == COND_EXPR)
>>>>>     {
>>>>>       rhs = gimple_assign_rhs1 (stmt);
>>>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>>>> return false;
>>>>>       *var = rhs;
>>>>>       return true;
>>>>>     }
>>>>>   return false;
>>>>> }
>>>>>
>>>>> I also did few minor changes in patch.2.
>>>>>
>>>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>>>> passed to compiler, i.e. we perform if-conversion for loop
>>>>> vectorization only and if it does not take place, we should delete
>>>>> if-converted version of loop.
>>>>> What is your opinion?
>>>>
>>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>>>> looks in need of some refactoring to avoid duplicate code.  We can
>>>> do that a followup.
>>>>
>>>> Part 3 still needs the iteration to be resolved and make the use we
>>>> actually care about single-use, not a random one so we can avoid
>>>> iterating completely.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Here is updated patch which includes
>>>>>>> (1) split critical edges for aggressive if conversion.
>>>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>>>
>>>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>>>> aggressive if conversion by default.
>>>>>>>
>>>>>>> What is your opinion?
>>>>>>
>>>>>> Looks ok overall, but please simply do
>>>>>>
>>>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>>>       split_edge (e);
>>>>>>
>>>>>> for all blocks apart from the latch.
>>>>>>
>>>>>> Can you please send a combined patch up to this one?  Looking at
>>>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>>>> patches from patch1 to this one.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your reply!
>>>>>>>>>
>>>>>>>>> I didn't understand your point:
>>>>>>>>>
>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>>>
>>>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>>>
>>>>>>>> I don't mind means I am fine with it.
>>>>>>>>
>>>>>>>>> Also I assume that
>>>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>>>
>>>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>>>> be surprised if so but that may be possible.
>>>>>>>>
>>>>>>>>> I prefer to do something
>>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>>>> destination bb belongs to loop).
>>>>>>>>
>>>>>>>> That works for me as well but it is more complicated to implement.
>>>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>>>> I have few questions about your comments.
>>>>>>>>>>>
>>>>>>>>>>> 1. You wrote :
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> path
>>>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>>>
>>>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>>>> non-aggressive if-conversion.
>>>>>>>>>>
>>>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>>>
>>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>>>
>>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>>>> do something like
>>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>>>      return 0;
>>>>>>>>>>
>>>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>>>> +      {
>>>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>>>> +         {
>>>>>>>>>> +           split_critical_edges ();
>>>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>>>> +         }
>>>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>>>> +      }
>>>>>>>>>>
>>>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>>>    {
>>>>>>>>>>
>>>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>>>> additional option.
>>>>>>>>>>
>>>>>>>>>> Yes, I know.
>>>>>>>>>>
>>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Thanks ahead.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>>>> #if 0
>>>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>>>  else
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>>>
>>>>>>>>>>>>> we will get ICE:
>>>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>>>       ^
>>>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>>>
>>>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>>>
>>>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>>>
>>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>>>
>>>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>>>> predicating a PHI.
>>>>>>>>>>>>
>>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>>>> the printfs properly.
>>>>>>>>>>>>
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> paths.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>>>
>>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>>>> statement iterator.
>>>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>>>> critical.
>>>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
References:
- Re: [PATCH 2/3] Extended if-conversion
  - From: Richard Biener
- Re: [PATCH 2/3] Extended if-conversion
  - From: Yuri Rumyantsev
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]