This is the mail archive of the
mailing list for the GCC project.
Re: [gomp4] OpenACC async re-work
- From: Cesar Philippidis <cesar at codesourcery dot com>
- To: Chung-Lin Tang <chunglin_tang at mentor dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Thomas Schwinge <thomas at codesourcery dot com>
- Date: Mon, 26 Jun 2017 15:45:28 -0700
- Subject: Re: [gomp4] OpenACC async re-work
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org>
I still need more time to review this, but ...
On 06/24/2017 12:54 AM, Chung-Lin Tang wrote:
> Hi Cesar, Thomas,
> This patch is the re-implementation of OpenACC async we talked about.
> The changes are rather large, so I am putting it here for a few days before
> actually committing them to gomp-4_0-branch. Would appreciate if you guys
> take a look.
> To overall describe the highlights of the changes:
> (1) Instead of essentially implementing the entire OpenACC async support
> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented
> by the plugin, along with core 'test', 'synchronize', 'serialize', etc.
> plugin functions. Most of the OpenACC specific logic is pulled into
I'm not sure if plugins need to maintain backwards compatibility.
However, I don't see any changes inside libgomp.map, so maybe it's not
> (2) CUDA events are no longer used. The limitation of no CUDA calls inside
> CUDA callbacks were a problem for resource freeing, but we now stash
> them onto the ptx_device and free them later.
> (3) For 'wait + async', we now add a local thread synchronize, instead
> of just ordering the streams.
> (4) To work with the (3) change, some front end changes were added to
> propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to
> represent a 'wait all'.
What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been
easier to make that change in the gimplifier?
> Patch was tested to have no regressions on gomp-4_0-branch. I'll commit
> this after the weekend (or Tues.)
> * plugin/plugin-nvptx.c (struct cuda_map): Remove.
> (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code.
> (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function.
These two functions seem extremely similar. I wonder if you should
Overall, I like how you were able eliminate the externally managed map_*
data structure which was used to pass in arguments to nvptx_exec.
Although I wonder if we should just pass in those individual arguments
directly to cuLaunchKernel. But that's a big change in itself.