This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def


On November 23, 2015 4:37:18 PM GMT+01:00, Tom de Vries <Tom_deVries@mentor.com> wrote:
>On 23/11/15 12:31, Richard Biener wrote:
>>>>  From the dump below I understand you want no memory references in
>>>> > >the outer loop?
>>>> > >So the issue seems to be that store motion fails
>>>> > >to insert the preheader load / exit store to the outermost loop
>>>> > >possible and thus another LIM pass is needed to "store motion"
>those
>>>> > >again?
>>> >
>>> >Yep.
>>> >
>>>> > >  But a simple testcase
>>>> > >
>>>> > >int a;
>>>> > >int *p = &a;
>>>> > >int foo (int n)
>>>> > >{
>>>> > >    for (int i = 0; i < n; ++i)
>>>> > >      for (int j = 0; j < 100; ++j)
>>>> > >        *p += j + i;
>>>> > >    return a;
>>>> > >}
>>>> > >
>>>> > >shows that LIM can do this in one step.
>>> >
>>> >I've filed a FTR PR68465 - "pass_lim doesn't detect identical loop
>entry
>>> >conditions" for a test-case where that doesn't happen (when using
>>> >-fno-tree-dominator-opts).
>>> >
>>>> > >Which means it should
>>>> > >be investigated why it doesn't do this properly for your
>testcase
>>>> > >(store motion of *_25).
>>> >
>>> >There seems to be two related problems:
>>> >1. the store has tree_could_trap_p (ref->mem.ref) true, which
>should be
>>> >    false. I'll work on a fix for this.
>>> >2. Give that the store can trap, I  was running into PR68465. I
>managed
>>> >    to eliminate the 2nd pass_lim by moving the pass_dominator
>instance
>>> >    before the pass_lim instance.
>>> >
>>> >Attached patch shows the pass group with only one pass_lim. I hope
>to be able
>>> >to eliminate the first pass_dominator instance before pass_lim once
>I fix 1.
>>> >
>>>> > >Simply adding two LIM passes either papers over a wrong-code
>>>> > >bug (in LIM or in DOM) or over a missed-optimization in LIM.
>>> >
>>> >AFAIU now, it's PR68465, a missed optimization in LIM.
>> Ok, it's not really LIMs job to cleanup loop header copying that way.
>>
>> DOM performs jump-threading for this but FRE should also be able
>> to handle this just fine.  Ah, it doesn't because the outer loop
>> header directly contains the condition
>>
>> Index: gcc/tree-ssa-sccvn.c
>> ===================================================================
>> --- gcc/tree-ssa-sccvn.c        (revision 230737)
>> +++ gcc/tree-ssa-sccvn.c        (working copy)
>> @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b
>>
>>     /* If we have a single predecessor record the equivalence from a
>>        possible condition on the predecessor edge.  */
>> -  if (single_pred_p (bb))
>> +  edge pred_e = NULL;
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +    {
>> +      if (e->flags & EDGE_DFS_BACK)
>> +       continue;
>> +      if (! pred_e)
>> +       pred_e = e;
>> +      else
>> +       {
>> +         pred_e = NULL;
>> +         break;
>> +       }
>> +    }
>> +  if (pred_e)
>>       {
>> -      edge e = single_pred_edge (bb);
>>         /* Check if there are multiple executable successor edges in
>>           the source block.  Otherwise there is no additional info
>>           to be recorded.  */
>>         edge e2;
>> -      FOR_EACH_EDGE (e2, ei, e->src->succs)
>> -       if (e2 != e
>> +      FOR_EACH_EDGE (e2, ei, pred_e->src->succs)
>> +       if (e2 != pred_e
>>              && e2->flags & EDGE_EXECUTABLE)
>>            break;
>>         if (e2 && (e2->flags & EDGE_EXECUTABLE))
>>          {
>> -         gimple *stmt = last_stmt (e->src);
>> +         gimple *stmt = last_stmt (pred_e->src);
>>            if (stmt
>>                && gimple_code (stmt) == GIMPLE_COND)
>>              {
>> @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b
>>                tree lhs = gimple_cond_lhs (stmt);
>>                tree rhs = gimple_cond_rhs (stmt);
>>                record_conds (bb, code, lhs, rhs,
>> -                           (e->flags & EDGE_TRUE_VALUE) != 0);
>> +                           (pred_e->flags & EDGE_TRUE_VALUE) != 0);
>>                code = invert_tree_comparison (code, HONOR_NANS
>(lhs));
>>                if (code != ERROR_MARK)
>>                  record_conds (bb, code, lhs, rhs,
>> -                             (e->flags & EDGE_TRUE_VALUE) == 0);
>> +                             (pred_e->flags & EDGE_TRUE_VALUE) ==
>0);
>>              }
>>          }
>>       }
>>
>> fixes this for me (for a small testcase).  Does it help yours?
>>
>
>Yes, it has the desired effect (of not needing pass_dominator before 
>pass_lim) . But, patch "Mark by_ref mem_ref in build_receiver_ref as 
>non-trapping" committed as r230738, also has that effect, so AFAIU I 
>don't require this tree-ssa-sccvn.c fix.

OK, I committed it anyway already.

Richard.

>Thanks,
>- Tom
>
>> Otherwise untested of course (I hope EDGE_DFS_BACK is good enough,
>> it's supposed to match edges that have the src dominated by the
>dest).
>> Testing the above now.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]