This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] OpenACC async re-work

I still need more time to review this, but ...

On 06/24/2017 12:54 AM, Chung-Lin Tang wrote:
> Hi Cesar, Thomas,
> This patch is the re-implementation of OpenACC async we talked about.
> The changes are rather large, so I am putting it here for a few days before
> actually committing them to gomp-4_0-branch. Would appreciate if you guys
> take a look.
> To overall describe the highlights of the changes:
> (1) Instead of essentially implementing the entire OpenACC async support
> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented
> by the plugin, along with core 'test', 'synchronize', 'serialize', etc.
> plugin functions. Most of the OpenACC specific logic is pulled into
> libgomp/oacc-async.c

I'm not sure if plugins need to maintain backwards compatibility.
However, I don't see any changes inside, so maybe it's not

> (2) CUDA events are no longer used. The limitation of no CUDA calls inside
> CUDA callbacks were a problem for resource freeing, but we now stash
> them onto the ptx_device and free them later.


> (3) For 'wait + async', we now add a local thread synchronize, instead
> of just ordering the streams.
> (4) To work with the (3) change, some front end changes were added to
> propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to
> represent a 'wait all'.

What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been
easier to make that change in the gimplifier?

> Patch was tested to have no regressions on gomp-4_0-branch. I'll commit
> this after the weekend (or Tues.)

> 	* plugin/plugin-nvptx.c (struct cuda_map): Remove.
>         (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code.
>         (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function.

These two functions seem extremely similar.  I wonder if you should
consolidate them.

Overall, I like how you were able eliminate the externally managed map_*
data structure which was used to pass in arguments to nvptx_exec.
Although I wonder if we should just pass in those individual arguments
directly to cuLaunchKernel. But that's a big change in itself.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]