[PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime
Thomas Schwinge
thomas@codesourcery.com
Tue Jul 27 10:01:18 GMT 2021
Hi!
On 2021-06-29T16:42:03-0700, Julian Brown <julian@codesourcery.com> wrote:
> This patch fixes several places in libgomp/target.c where "ephemeral" data
> (on the stack or in temporary heap locations) may be used as the source of
> an asynchronous host-to-device copy that may not complete before the host
> data disappears. Versions of the patch have been posted several times
> before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves
> all logic into target.c rather than pushing it out to each target plugin.
Thanks for the re-work!
> An existing, but flawed, workaround for this problem in the AMD GCN
> libgomp offloading plugin is currently present on mainline, and was
> posted for the og9 branch here:
>
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html
>
> and previous versions of this patch were posted here (for mainline/og9):
>
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html
... but this version here I like best! ;-)
> This patch exposes a problem with OpenACC profiling support that is
> fixed by the next patch in the series. The acc_prof-parallel-1.c test
> is XFAILed for now.
(Ought to XFAIL 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c' for GCN
only. Also, 'libgomp.oacc-c-c++-common/acc_prof-init-1.c' did similarly
FAIL for GCN, intermittently.)
But, actually no XFAILing is necessary, given my recent
commit 29ddaf43f70e19fd1110b539e8b3d0436c757e34 "[OpenACC]
Clarify sequencing of 'async' data copying vs. profiling events
in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'".
A few more comments:
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -2934,7 +2931,7 @@ gomp_offload_free (void *ptr)
>
> static void
> queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src,
> - size_t len, bool free_src)
> + size_t len)
Also have to update function comment.
> @@ -3916,15 +3912,7 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src,
> {
> struct agent_info *agent = get_agent_info (device);
> assert (agent == aq->agent);
> - /* The source data does not necessarily remain live until the deferred
> - copy happens. Taking a snapshot of the data here avoids reading
> - uninitialised data later, but means that (a) data is copied twice and
> - (b) modifications to the copied data between the "spawning" point of
> - the asynchronous kernel and when it is executed will not be seen.
> - But, that is probably correct. */
> - void *src_copy = GOMP_PLUGIN_malloc (n);
> - memcpy (src_copy, src, n);
> - queue_push_copy (aq, dst, src_copy, n, true);
> + queue_push_copy (aq, dst, src, n);
> return true;
> }
:-)
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> -static inline void
> -goacc_device_copy_async (struct gomp_device_descr *devicep,
> - bool (*copy_func) (int, void *, const void *, size_t,
> - struct goacc_asyncqueue *),
> - const char *dst, void *dstaddr,
> - const char *src, const void *srcaddr,
> - size_t size, struct goacc_asyncqueue *aq)
> -{
> - if (!copy_func (devicep->target_id, dstaddr, srcaddr, size, aq))
> - {
> - gomp_mutex_unlock (&devicep->lock);
> - gomp_fatal ("Copying of %s object [%p..%p) to %s object [%p..%p) failed",
> - src, srcaddr, srcaddr + size, dst, dstaddr, dstaddr + size);
> - }
> -}
For symmetry with 'gomp_device_copy', I prefer to keep (and thus
restored) 'goacc_device_copy_async', adding a 'srcaddr_orig' parameter
for error reporting purposes.
Pushed 'Fix OpenACC "ephemeral" asynchronous host-to-device copies' to
master branch in commit 9c41f5b9cddd93f1b56eb71bff87b255d37d16f4, see
attached.
Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-1.c'.
> +/* Copy host memory to an offload device. In asynchronous mode (if AQ is
> + non-NULL), when the source data is stack or may otherwise be deallocated
> + before the asynchronous copy takes place, EPHEMERAL must be passed as
> + TRUE. The CBUF isn't used for non-ephemeral asynchronous copies, because
> + the host data might not be computed yet (by an earlier asynchronous compute
> + region). */
> +
> [gomp_copy_host2dev]
Code changes related to the latter sentence have moved into a separate
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to
master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, see
attached.
Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-2.c'.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-OpenACC-ephemeral-asynchronous-host-to-device-co.patch
Type: text/x-diff
Size: 15434 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210727/134d4189/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Don-t-use-libgomp-cbuf-buffering-with-OpenACC-async.patch
Type: text/x-diff
Size: 6192 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210727/134d4189/attachment-0003.bin>
More information about the Gcc-patches
mailing list