This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta


On Thu, 3 Dec 2015, Tom de Vries wrote:

> On 03/12/15 09:59, Richard Biener wrote:
> > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > 
> > > On 03/12/15 01:10, Tom de Vries wrote:
> > > > 
> > > > I've managed to reproduce it. The difference between pass and fail is
> > > > whether the compiler is configured with or without accelerator.
> > > > 
> > > > I'll look into it.
> > > 
> > > In the configuration with accelerator, the flag node->force_output is on
> > > for
> > > foo._omp.fn.
> > > 
> > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the
> > > optimization to fail.
> > > 
> > > The flag is decribed as:
> > > ...
> > >    /* The symbol will be assumed to be used in an invisible way (like
> > >       by an toplevel asm statement).  */
> > >   ...
> > > 
> > > Looks like I have to ignore the force_output flag as well in
> > > ipa_pta_execute
> > > for this sort of node.
> > 
> > It rather looks like the flag shouldn't be set.  The fn after all has
> > its address taken!(?)
> > 
> 
> The flag is set here in expand_omp_target:
> ...
> 12682         /* Prevent IPA from removing child_fn as unreachable,
>                  since there are no
> 12683            refs from the parent function to child_fn in offload
>                  LTO mode.  */
> 12684         if (ENABLE_OFFLOADING)
> 12685           cgraph_node::get (child_fn)->mark_force_output ();
> ...
> 

How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?

I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.

> I guess setting forced_by_abi instead would also mean child_fn is not removed
> as unreachable, while still allowing optimizations:
> ...
>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>      symbols promoted to static and it does not inhibit
>      optimization.  */
>   unsigned forced_by_abi : 1;
> ...
> 
> But I suspect that other optimizations (than ipa-pta) might break things.

How so?

> Essentially we have two situations:
> - in the host compiler, there is no need for the forced_output flag,
>   and it inhibits optimization
> - in the accelerator compiler, it (or some equivalent) is needed
> 
> I wonder if setting the force_output flag only when streaming the bytecode for
> offloading would work. That way, it wouldn't be set in the host compiler,
> while being set in the accelerator compiler.

Yeah, that was my original thinking btw.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]