This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IPA/cgraph: propagating node frequencies to offloaded functions


Hi,

On Fri, Feb 12, 2016 at 05:40:58PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> As I'm touching areas of GCC here, that I have no noteworthy experience
> with (IPA optimizations, cgraph), I'm asking for your help.  Thanks!
> 
> This is primarily to implement a better "avoid offloading" policy for
> un-parallelized OpenACC kernels constructs with nvptx offloading,
> <http://news.gmane.org/find-root.php?message_id=%3C87twlf7ego.fsf%40hertz.schwinge.homeip.net%3E>,
> but I imagine that potentially also any non-nvptx offloading can benefit
> from this, by better parameterization of GCC's optimization passes.
> 
> For this consideration, simplified, "offloading" means that:
> 
>     int main()
>     {
>       [block 1]
>       #pragma omp target
>         {
>           [block 2]
>         }
>       [block 3]
>     }
> 
> ... is re-written into something like:
> 
>     void main_offloaded()
>     {
>       [block 2]
>     }
> 
>     int main()
>     {
>       [block 1]
>       GOMP_target(&main_offloaded);
>       [block 3]
>     }
> 
> ..., and the code of main_offloaded is then not run on the CPU but is in
> GOMP_target launched to execute on an offloading device (not actually
> important for this consideration).
> 
> GOMP_target, described by gcc/omp-builtins.def:BUILT_IN_GOMP_TARGET, is
> implemented externally to the compiler (in libgomp), but we "control" its
> implementation, and so we're free to have the compiler make certain
> assumptions about its behavior.
> 
> (I guess) due to it being passed to the intermediary GOMP_target call,
> the address of main_offloaded "escapes".  But, as we know how/what for
> GOMP_target is using it, we should be able to (teach the compiler to)
> handle it like a "static" function -- I hope.
> 
> For reference, I'm assuming the example above should look/work a bit like
> the following example:
> 
>     static __attribute__((noinline)) int fun_s(int x)
>     {
>       return x + 1;
>     }
>     
>     int main()
>     {
>       return fun_s(10);
>     }
> 
> ..., where I do observe that, for example, main's
> NODE_FREQUENCY_EXECUTED_ONCE is propagated to fun_s.  (For main
> initialized in gcc/predict.c:compute_function_frequency, and the
> propagation to fun_s then happens in
> gcc/ipa-profile.c:ipa_propagate_frequency.)  See
> test.c.067i.profile_estimate: "Node foo_s promoted to executed once".
> However, this node frequency propagation does not currently happen for
> main_offloaded.  Looking into this for a bit, I think I have identified
> two issues.
> 
> IPA/cgraph does not consider main_offloaded to be "local" (as in struct
> cgraph_local_info; handled/set in
> gcc/ipa-visibility.c:cgraph_node::local_p), so does not even attempt to
> do such optimizations.  (But, per my comment above, we should actually be
> able to treat main_offloaded at least similar to a "static" function.)

for cgraph/IPA purposes, a cgraph_node is local if we have under
control all its potential call-sites.  So in general case, an address
escaped function cannot be local, regardless of its static-ness.

> 
> If I hack gcc/ipa-visibility.c:cgraph_node::local_p to forcefully mark
> main_offloaded as "local", the node frequency propagation still doesn't
> work correctly, because no callers (cgraph_edge) have been registered for
> the respective cgraph_node, so gcc/ipa-profile.c:ipa_propagate_frequency
> doesn't have the data available that it needs in order to set the
> ipa_propagate_frequency_data, which later will be used to set the node
> frequency.

If you only care about ipa-profile propagation, then I would suggest
hacking just ipa-profile.c so that whenever it sees a cgraph_edge
going to gomp_target (and perhaps also gomp_parallel, gomp_task etc.),
it knows it has to propagate something to the corresponding parameter
of that function (the parameter value can easily be extracted from
ipa-prop/ipa-cp jump functions because it is always going to be
constant).  That way, you do not need to make any changes to symtab
core.

If, on the other hand, you want all of IPA, such as IPA-CP (and
associated inlining heuristics), work seamlessly across gomp_ calls,
you probably want to invent some new kind of special call graph edge
for such invocations.  This would would be a bigger task because a
large portion of code dealing with cgraph_edge would need to be made
aware of the new kind.  I am not sure it would be worth it.

> 
> As it already has some support for gomp_parallel and gomp_task (which I
> understand to have similar semantics in that regard, comparing to
> gomp_target), maybe extending gcc/cgraphbuild.c will help with the latter
> issue?

The special support for gomp_parallel and gomp_task in cgraphbuild.c
only makes sure we extract references correctly from the special omp
statements so that we don't remove them as unreachable (it seems that
gomp_target should be handled there as well, really).  I don' think
that can help frequency propagation in any way.  Are the frequencies
to gomp_parallel really propagated?

> (That bit of code handling gomp_* looks like its purpose is to
> "pass through" function addresses, such as main_offloaded, which are
> called through intermediary functions, such as GOMP_target, but I have
> not yet confirmed if that's exactly what's being done there.)
> 
> What to do about the cgraph_node "local" bit for main_offloaded?

Keep it as it is.  We remove local nodes with no direct callees, and
actually ICE when we fail to do that.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]