This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta
- From: Richard Biener <rguenther at suse dot de>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: Thomas Schwinge <thomas at codesourcery dot com>, Jakub Jelinek <jakub at redhat dot com>, "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>
- Date: Thu, 3 Dec 2015 12:12:55 +0100 (CET)
- Subject: Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta
- Authentication-results: sourceware.org; auth=none
- References: <565C0F47 dot 5020604 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511301010570 dot 4884 at t29 dot fhfr dot qr> <565C3CEC dot 9040209 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511301423530 dot 4884 at t29 dot fhfr dot qr> <565C7B09 dot 6000206 at mentor dot com> <565DADE6 dot 8020908 at mentor dot com> <87zixsloli dot fsf at kepler dot schwinge dot homeip dot net> <565F7F68 dot 1080903 at mentor dot com> <565F8881 dot 90609 at mentor dot com> <565F8C69 dot 1070906 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1512030958510 dot 4884 at t29 dot fhfr dot qr> <566022D0 dot 2030906 at mentor dot com>
On Thu, 3 Dec 2015, Tom de Vries wrote:
> On 03/12/15 09:59, Richard Biener wrote:
> > On Thu, 3 Dec 2015, Tom de Vries wrote:
> >
> > > On 03/12/15 01:10, Tom de Vries wrote:
> > > >
> > > > I've managed to reproduce it. The difference between pass and fail is
> > > > whether the compiler is configured with or without accelerator.
> > > >
> > > > I'll look into it.
> > >
> > > In the configuration with accelerator, the flag node->force_output is on
> > > for
> > > foo._omp.fn.
> > >
> > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the
> > > optimization to fail.
> > >
> > > The flag is decribed as:
> > > ...
> > > /* The symbol will be assumed to be used in an invisible way (like
> > > by an toplevel asm statement). */
> > > ...
> > >
> > > Looks like I have to ignore the force_output flag as well in
> > > ipa_pta_execute
> > > for this sort of node.
> >
> > It rather looks like the flag shouldn't be set. The fn after all has
> > its address taken!(?)
> >
>
> The flag is set here in expand_omp_target:
> ...
> 12682 /* Prevent IPA from removing child_fn as unreachable,
> since there are no
> 12683 refs from the parent function to child_fn in offload
> LTO mode. */
> 12684 if (ENABLE_OFFLOADING)
> 12685 cgraph_node::get (child_fn)->mark_force_output ();
> ...
>
How are there no refs from the "parent"? Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?
I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.
> I guess setting forced_by_abi instead would also mean child_fn is not removed
> as unreachable, while still allowing optimizations:
> ...
> /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
> to be exported. Unlike FORCE_OUTPUT this flag gets cleared to
> symbols promoted to static and it does not inhibit
> optimization. */
> unsigned forced_by_abi : 1;
> ...
>
> But I suspect that other optimizations (than ipa-pta) might break things.
How so?
> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
> and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
>
> I wonder if setting the force_output flag only when streaming the bytecode for
> offloading would work. That way, it wouldn't be set in the host compiler,
> while being set in the accelerator compiler.
Yeah, that was my original thinking btw.
Richard.