This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: OpenACC support in 4.9


On Wed, May 8, 2013 at 10:25 PM, Torvald Riegel <triegel@redhat.com> wrote:
> On Tue, 2013-05-07 at 12:46 +0200, Richard Biener wrote:
>> On Tue, May 7, 2013 at 12:42 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> > On Tue, May 7, 2013 at 11:02 AM, Tobias Burnus <burnus@net-b.de> wrote:
>> >> Richard Biener wrote:
>> >>>
>> >>> We're going to look at supporting HSA from GCC (which would make it more
>> >>> or less trivial to also target openCL I think)
>> >>
>> >>
>> >> For the friends of link-time optimization (LTO):
>> >>
>> >> Unless I missed some fine point in OpenACC and OpenMP's target, they only
>> >> work with directives which are locally visible. Thus, if one does a function
>> >> call in the device/target section, it can only be placed on the accelerator
>> >> if the function can be inlined.
>> >>
>> >> Thus, it would be useful, if LTO could be used to inline such function into
>> >> device code. I know one OpenACC code which calls functions in different
>> >> translation units (TU) - and the Cray compiler handles this via LTO. Thus,
>> >> it would be great if the HSA/OpenMP target/OpenACC middle-end infrastructure
>> >> could do likewise, which also means deferring the error that an external
>> >> function cannot be used to the middle-end/LTO FE and not placing it into the
>> >> FE. - In the mentioned code, the called function does not have any OpenACC
>> >> annotation but only consists of constructs which are permitted by the
>> >> accelerator - thus, no automatic code gen of accelerator code happens for
>> >> that. TU.
>> >>
>> >> (I just want to mention this to ensure that this kind of LTO/accelerator
>> >> inlining is kept in mind when implementing the infrastructure for
>> >> HSA/OpenACC/OpenMP target/OpenCL - even if cross-TU inlining is not
>> >> supported initially.)
>> >
>> > In my view we'd get the "regular" OpenMP processing done during omp
>> > lowering/expansion (which happens before LTO) which should mark the
>> > generated worker functions apropriately.  Emitting accelerator code should
>> > then happen at LTRANS time, thus after all IPA inlining took place.  The
>> > interesting bits we can borrow from OMP is basically marking of functions
>> > that are a) interesting, b) possible to transform.  Unmarked functions / loops
>> > will have to go the autopar way, thus we have to prove via dependence analysis
>> > that executing iterations in parallel is possible.
>>
>> Btw, we plan to re-use the GOMP runtime as otherwise any synchronisation
>> between accelerator code and regular thread code is impossible.
>
> I can't follow this line of reasoning.  Can you elaborate?  Which kind
> of synchronization are you referring to?
>
> As far as parallel execution and resource management is concerned,
> libgomp has just the kinds of scheduler that you need in the OpenMP rule
> set.  Work-stealing schedulers such as Cilk's are others, and might
> actually become the more common approach.  And there are other thread
> pools that programs might use; e.g., there's lots of discussion about
> all this in ISO C++ study group 1 on parallelism and concurrency, and
> several different proposals.
>
> With that in mind, I'm wondering whether the cooperative scheduling that
> we likely need should be at a lower level than libgomp or the Cilk
> runtime.  Otherwise, libgomp needs to become the scheduler that runs
> them all (that is, if you want it to work well when combined with other
> abstractions for parallelism), and I'm not sure whether that's the right
> approach.

See my other mail.

>> Which
>> means changing the GOMP runtime in a way to be able to pass a descriptor
>> which eventually has accelerator code (and a fallback regular function so
>> you can disable accelerator usage at runtime).
>
> It probably should be a list of different codes -- you might have more
> than one suitable accelerator available.

Of course.  And the descriptor should be versioned to avoid future ABI
changes.  Note that I'd always generate code for the CPU as fallback.

> BTW: What about putting this topic on the Cauldron agenda?  Is there
> still time available to discuss what GCC might do regarding accelerators
> and HW heterogeneity?

I am not able to attend, but certainly the topic is interesting.

Richard.

>
> Torvald
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]