OpenACC support in 4.9

Tue May 7 10:42:00 GMT 2013

On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <burnus@net-b.de> wrote:
> Richard Biener wrote:
>>
>> We're going to look at supporting HSA from GCC (which would make it more
>> or less trivial to also target openCL I think)
>
>
> For the friends of link-time optimization (LTO):
>
> Unless I missed some fine point in OpenACC and OpenMP's target, they only
> work with directives which are locally visible. Thus, if one does a function
> call in the device/target section, it can only be placed on the accelerator
> if the function can be inlined.
>
> Thus, it would be useful, if LTO could be used to inline such function into
> device code. I know one OpenACC code which calls functions in different
> translation units (TU) - and the Cray compiler handles this via LTO. Thus,
> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure
> could do likewise, which also means deferring the error that an external
> function cannot be used to the middle-end/LTO FE and not placing it into the
> FE. - In the mentioned code, the called function does not have any OpenACC
> annotation but only consists of constructs which are permitted by the
> accelerator - thus, no automatic code gen of accelerator code happens for
> that. TU.
>
> (I just want to mention this to ensure that this kind of LTO/accelerator
> inlining is kept in mind when implementing the infrastructure for
> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not
> supported initially.)

In my view we'd get the "regular" OpenMP processing done during omp
lowering/expansion (which happens before LTO) which should mark the
generated worker functions apropriately.  Emitting accelerator code should
then happen at LTRANS time, thus after all IPA inlining took place.  The
interesting bits we can borrow from OMP is basically marking of functions
that are a) interesting, b) possible to transform.  Unmarked functions / loops
will have to go the autopar way, thus we have to prove via dependence analysis
that executing iterations in parallel is possible.

Richard.

> Tobias