This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: OpenACC support in 4.9
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Tobias Burnus <burnus at net-b dot de>
- Cc: Jeff Law <law at redhat dot com>, Evgeny Gavrin <e dot gavrin at samsung dot com>, gcc at gcc dot gnu dot org, GarbuzovViacheslav <v dot garbuzov at samsung dot com>, dtemirbulatov at gmail dot com
- Date: Tue, 7 May 2013 12:46:12 +0200
- Subject: Re: OpenACC support in 4.9
- References: <51879F4E dot 10402 at samsung dot com> <5187B30F dot 1050709 at net-b dot de> <5187C958 dot 9020606 at redhat dot com> <CAFiYyc3bnnFL=k8w-ZqJnL3UtQrFjdNmrdNmiA7mCiuGVtK_aQ at mail dot gmail dot com> <5188C310 dot 5050305 at net-b dot de> <CAFiYyc24FM9Z9meh2DF94bv3VV0gaUgP-nc3CWAZXotBb4ZA0w at mail dot gmail dot com>
On Tue, May 7, 2013 at 12:42 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <burnus@net-b.de> wrote:
>> Richard Biener wrote:
>>>
>>> We're going to look at supporting HSA from GCC (which would make it more
>>> or less trivial to also target openCL I think)
>>
>>
>> For the friends of link-time optimization (LTO):
>>
>> Unless I missed some fine point in OpenACC and OpenMP's target, they only
>> work with directives which are locally visible. Thus, if one does a function
>> call in the device/target section, it can only be placed on the accelerator
>> if the function can be inlined.
>>
>> Thus, it would be useful, if LTO could be used to inline such function into
>> device code. I know one OpenACC code which calls functions in different
>> translation units (TU) - and the Cray compiler handles this via LTO. Thus,
>> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure
>> could do likewise, which also means deferring the error that an external
>> function cannot be used to the middle-end/LTO FE and not placing it into the
>> FE. - In the mentioned code, the called function does not have any OpenACC
>> annotation but only consists of constructs which are permitted by the
>> accelerator - thus, no automatic code gen of accelerator code happens for
>> that. TU.
>>
>> (I just want to mention this to ensure that this kind of LTO/accelerator
>> inlining is kept in mind when implementing the infrastructure for
>> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not
>> supported initially.)
>
> In my view we'd get the "regular" OpenMP processing done during omp
> lowering/expansion (which happens before LTO) which should mark the
> generated worker functions apropriately. Emitting accelerator code should
> then happen at LTRANS time, thus after all IPA inlining took place. The
> interesting bits we can borrow from OMP is basically marking of functions
> that are a) interesting, b) possible to transform. Unmarked functions / loops
> will have to go the autopar way, thus we have to prove via dependence analysis
> that executing iterations in parallel is possible.
Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation
between accelerator code and regular thread code is impossible. Which
means changing the GOMP runtime in a way to be able to pass a descriptor
which eventually has accelerator code (and a fallback regular function so
you can disable accelerator usage at runtime).
Richard.
> Richard.
>
>> Tobias