This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- From: Richard Biener <rguenther at suse dot de>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>,"gcc-patches at gnu dot org" <gcc-patches at gnu dot org>,Jakub Jelinek <jakub at redhat dot com>
- Date: Mon, 23 Nov 2015 17:36:28 +0100
- Subject: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Authentication-results: sourceware.org; auth=none
- References: <5640BD31 dot 2060602 at mentor dot com> <5640FB07 dot 6010008 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511111159040 dot 4884 at t29 dot fhfr dot qr> <5649C41A dot 40403 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511161341420 dot 4884 at t29 dot fhfr dot qr> <564A64B3 dot 7080305 at mentor dot com> <CAFiYyc0i3BX=9Ae1w-a=ySUwLbXNLE-TbKLOjqYRGExPJJ_A_Q at mail dot gmail dot com> <564B3F69 dot 50600 at mentor dot com> <564D1930 dot 8040104 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511201119180 dot 4884 at t29 dot fhfr dot qr> <56502AFB dot 8050105 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511231217040 dot 4884 at t29 dot fhfr dot qr> <565332AE dot 1020907 at mentor dot com>
On November 23, 2015 4:37:18 PM GMT+01:00, Tom de Vries <Tom_deVries@mentor.com> wrote:
>On 23/11/15 12:31, Richard Biener wrote:
>>>> From the dump below I understand you want no memory references in
>>>> > >the outer loop?
>>>> > >So the issue seems to be that store motion fails
>>>> > >to insert the preheader load / exit store to the outermost loop
>>>> > >possible and thus another LIM pass is needed to "store motion"
>those
>>>> > >again?
>>> >
>>> >Yep.
>>> >
>>>> > > But a simple testcase
>>>> > >
>>>> > >int a;
>>>> > >int *p = &a;
>>>> > >int foo (int n)
>>>> > >{
>>>> > > for (int i = 0; i < n; ++i)
>>>> > > for (int j = 0; j < 100; ++j)
>>>> > > *p += j + i;
>>>> > > return a;
>>>> > >}
>>>> > >
>>>> > >shows that LIM can do this in one step.
>>> >
>>> >I've filed a FTR PR68465 - "pass_lim doesn't detect identical loop
>entry
>>> >conditions" for a test-case where that doesn't happen (when using
>>> >-fno-tree-dominator-opts).
>>> >
>>>> > >Which means it should
>>>> > >be investigated why it doesn't do this properly for your
>testcase
>>>> > >(store motion of *_25).
>>> >
>>> >There seems to be two related problems:
>>> >1. the store has tree_could_trap_p (ref->mem.ref) true, which
>should be
>>> > false. I'll work on a fix for this.
>>> >2. Give that the store can trap, I was running into PR68465. I
>managed
>>> > to eliminate the 2nd pass_lim by moving the pass_dominator
>instance
>>> > before the pass_lim instance.
>>> >
>>> >Attached patch shows the pass group with only one pass_lim. I hope
>to be able
>>> >to eliminate the first pass_dominator instance before pass_lim once
>I fix 1.
>>> >
>>>> > >Simply adding two LIM passes either papers over a wrong-code
>>>> > >bug (in LIM or in DOM) or over a missed-optimization in LIM.
>>> >
>>> >AFAIU now, it's PR68465, a missed optimization in LIM.
>> Ok, it's not really LIMs job to cleanup loop header copying that way.
>>
>> DOM performs jump-threading for this but FRE should also be able
>> to handle this just fine. Ah, it doesn't because the outer loop
>> header directly contains the condition
>>
>> Index: gcc/tree-ssa-sccvn.c
>> ===================================================================
>> --- gcc/tree-ssa-sccvn.c (revision 230737)
>> +++ gcc/tree-ssa-sccvn.c (working copy)
>> @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b
>>
>> /* If we have a single predecessor record the equivalence from a
>> possible condition on the predecessor edge. */
>> - if (single_pred_p (bb))
>> + edge pred_e = NULL;
>> + FOR_EACH_EDGE (e, ei, bb->preds)
>> + {
>> + if (e->flags & EDGE_DFS_BACK)
>> + continue;
>> + if (! pred_e)
>> + pred_e = e;
>> + else
>> + {
>> + pred_e = NULL;
>> + break;
>> + }
>> + }
>> + if (pred_e)
>> {
>> - edge e = single_pred_edge (bb);
>> /* Check if there are multiple executable successor edges in
>> the source block. Otherwise there is no additional info
>> to be recorded. */
>> edge e2;
>> - FOR_EACH_EDGE (e2, ei, e->src->succs)
>> - if (e2 != e
>> + FOR_EACH_EDGE (e2, ei, pred_e->src->succs)
>> + if (e2 != pred_e
>> && e2->flags & EDGE_EXECUTABLE)
>> break;
>> if (e2 && (e2->flags & EDGE_EXECUTABLE))
>> {
>> - gimple *stmt = last_stmt (e->src);
>> + gimple *stmt = last_stmt (pred_e->src);
>> if (stmt
>> && gimple_code (stmt) == GIMPLE_COND)
>> {
>> @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b
>> tree lhs = gimple_cond_lhs (stmt);
>> tree rhs = gimple_cond_rhs (stmt);
>> record_conds (bb, code, lhs, rhs,
>> - (e->flags & EDGE_TRUE_VALUE) != 0);
>> + (pred_e->flags & EDGE_TRUE_VALUE) != 0);
>> code = invert_tree_comparison (code, HONOR_NANS
>(lhs));
>> if (code != ERROR_MARK)
>> record_conds (bb, code, lhs, rhs,
>> - (e->flags & EDGE_TRUE_VALUE) == 0);
>> + (pred_e->flags & EDGE_TRUE_VALUE) ==
>0);
>> }
>> }
>> }
>>
>> fixes this for me (for a small testcase). Does it help yours?
>>
>
>Yes, it has the desired effect (of not needing pass_dominator before
>pass_lim) . But, patch "Mark by_ref mem_ref in build_receiver_ref as
>non-trapping" committed as r230738, also has that effect, so AFAIU I
>don't require this tree-ssa-sccvn.c fix.
OK, I committed it anyway already.
Richard.
>Thanks,
>- Tom
>
>> Otherwise untested of course (I hope EDGE_DFS_BACK is good enough,
>> it's supposed to match edges that have the src dominated by the
>dest).
>> Testing the above now.
- References:
- [PATCH series, 16] Use parloops to parallelize oacc kernels regions
- [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def