This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: OpenACC support in 4.9


On Tue, May 7, 2013 at 12:42 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <burnus@net-b.de> wrote:
>> Richard Biener wrote:
>>>
>>> We're going to look at supporting HSA from GCC (which would make it more
>>> or less trivial to also target openCL I think)
>>
>>
>> For the friends of link-time optimization (LTO):
>>
>> Unless I missed some fine point in OpenACC and OpenMP's target, they only
>> work with directives which are locally visible. Thus, if one does a function
>> call in the device/target section, it can only be placed on the accelerator
>> if the function can be inlined.
>>
>> Thus, it would be useful, if LTO could be used to inline such function into
>> device code. I know one OpenACC code which calls functions in different
>> translation units (TU) - and the Cray compiler handles this via LTO. Thus,
>> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure
>> could do likewise, which also means deferring the error that an external
>> function cannot be used to the middle-end/LTO FE and not placing it into the
>> FE. - In the mentioned code, the called function does not have any OpenACC
>> annotation but only consists of constructs which are permitted by the
>> accelerator - thus, no automatic code gen of accelerator code happens for
>> that. TU.
>>
>> (I just want to mention this to ensure that this kind of LTO/accelerator
>> inlining is kept in mind when implementing the infrastructure for
>> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not
>> supported initially.)
>
> In my view we'd get the "regular" OpenMP processing done during omp
> lowering/expansion (which happens before LTO) which should mark the
> generated worker functions apropriately.  Emitting accelerator code should
> then happen at LTRANS time, thus after all IPA inlining took place.  The
> interesting bits we can borrow from OMP is basically marking of functions
> that are a) interesting, b) possible to transform.  Unmarked functions / loops
> will have to go the autopar way, thus we have to prove via dependence analysis
> that executing iterations in parallel is possible.

Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation
between accelerator code and regular thread code is impossible.  Which
means changing the GOMP runtime in a way to be able to pass a descriptor
which eventually has accelerator code (and a fallback regular function so
you can disable accelerator usage at runtime).

Richard.

> Richard.
>
>> Tobias


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]