[PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime

Thomas Schwinge thomas@codesourcery.com
Tue Jul 27 10:01:18 GMT 2021


Hi!

On 2021-06-29T16:42:03-0700, Julian Brown <julian@codesourcery.com> wrote:
> This patch fixes several places in libgomp/target.c where "ephemeral" data
> (on the stack or in temporary heap locations) may be used as the source of
> an asynchronous host-to-device copy that may not complete before the host
> data disappears.  Versions of the patch have been posted several times
> before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves
> all logic into target.c rather than pushing it out to each target plugin.

Thanks for the re-work!

> An existing, but flawed, workaround for this problem in the AMD GCN
> libgomp offloading plugin is currently present on mainline, and was
> posted for the og9 branch here:
>
>   https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html
>
> and previous versions of this patch were posted here (for mainline/og9):
>
>   https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
>   https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

... but this version here I like best!  ;-)


> This patch exposes a problem with OpenACC profiling support that is
> fixed by the next patch in the series. The acc_prof-parallel-1.c test
> is XFAILed for now.

(Ought to XFAIL 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c' for GCN
only.  Also, 'libgomp.oacc-c-c++-common/acc_prof-init-1.c' did similarly
FAIL for GCN, intermittently.)

But, actually no XFAILing is necessary, given my recent
commit 29ddaf43f70e19fd1110b539e8b3d0436c757e34 "[OpenACC]
Clarify sequencing of 'async' data copying vs. profiling events
in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'".


A few more comments:

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

> @@ -2934,7 +2931,7 @@ gomp_offload_free (void *ptr)
>
>  static void
>  queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src,
> -              size_t len, bool free_src)
> +              size_t len)

Also have to update function comment.

> @@ -3916,15 +3912,7 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src,
>  {
>    struct agent_info *agent = get_agent_info (device);
>    assert (agent == aq->agent);
> -  /* The source data does not necessarily remain live until the deferred
> -     copy happens.  Taking a snapshot of the data here avoids reading
> -     uninitialised data later, but means that (a) data is copied twice and
> -     (b) modifications to the copied data between the "spawning" point of
> -     the asynchronous kernel and when it is executed will not be seen.
> -     But, that is probably correct.  */
> -  void *src_copy = GOMP_PLUGIN_malloc (n);
> -  memcpy (src_copy, src, n);
> -  queue_push_copy (aq, dst, src_copy, n, true);
> +  queue_push_copy (aq, dst, src, n);
>    return true;
>  }

:-)

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> -static inline void
> -goacc_device_copy_async (struct gomp_device_descr *devicep,
> -                      bool (*copy_func) (int, void *, const void *, size_t,
> -                                         struct goacc_asyncqueue *),
> -                      const char *dst, void *dstaddr,
> -                      const char *src, const void *srcaddr,
> -                      size_t size, struct goacc_asyncqueue *aq)
> -{
> -  if (!copy_func (devicep->target_id, dstaddr, srcaddr, size, aq))
> -    {
> -      gomp_mutex_unlock (&devicep->lock);
> -      gomp_fatal ("Copying of %s object [%p..%p) to %s object [%p..%p) failed",
> -               src, srcaddr, srcaddr + size, dst, dstaddr, dstaddr + size);
> -    }
> -}

For symmetry with 'gomp_device_copy', I prefer to keep (and thus
restored) 'goacc_device_copy_async', adding a 'srcaddr_orig' parameter
for error reporting purposes.

Pushed 'Fix OpenACC "ephemeral" asynchronous host-to-device copies' to
master branch in commit 9c41f5b9cddd93f1b56eb71bff87b255d37d16f4, see
attached.

Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-1.c'.


> +/* Copy host memory to an offload device.  In asynchronous mode (if AQ is
> +   non-NULL), when the source data is stack or may otherwise be deallocated
> +   before the asynchronous copy takes place, EPHEMERAL must be passed as
> +   TRUE.  The CBUF isn't used for non-ephemeral asynchronous copies, because
> +   the host data might not be computed yet (by an earlier asynchronous compute
> +   region).  */
> +
>  [gomp_copy_host2dev]

Code changes related to the latter sentence have moved into a separate
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to
master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, see
attached.

Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-2.c'.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-OpenACC-ephemeral-asynchronous-host-to-device-co.patch
Type: text/x-diff
Size: 15434 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210727/134d4189/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Don-t-use-libgomp-cbuf-buffering-with-OpenACC-async.patch
Type: text/x-diff
Size: 6192 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210727/134d4189/attachment-0003.bin>


More information about the Gcc-patches mailing list