[PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
Thomas Schwinge
thomas@codesourcery.com
Tue Jun 8 17:32:22 GMT 2021
Hi Chung-Lin!
;-) It's been a while:
On 2018-09-10T23:04:18+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> * testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
> * testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
> * testsuite/libgomp.oacc-fortran/lib-16.f90: New test.
Do you happen to remember why in these testcases you're using the
following pattern:
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i;
> + int async = 8;
> + unsigned char *h;
> +
> + h = (unsigned char *) malloc (N);
> +
> + for (i = 0; i < N; i++)
> + {
> + h[i] = i;
> + }
> +
> + acc_copyin_async (h, N, async);
> +
> + memset (h, 0, N);
> +
> + acc_wait (async);
You first issue 'acc_copyin_async', then (while potentially that's still
accessing 'h') already 'memset' 'h' (potentially overwriting data that
'acc_copyin_async' is still working on), and only then 'acc_wait'?
My understanding of OpenACC would swap 'memset' and 'acc_wait', but maybe
you have a specific reason to do it in this way?
In particular, the GCC nvptx offloading implementation "doesn't seem to
care" (as discussed elsewhere; 'OpenACC "ephemeral" asynchronous
host-to-device copies', etc.) -- but I suppose if you meant to test such
implementation traits here, you'd have commented that?
> +
> + acc_copyout_async (h, N, async + 1);
> +
> + acc_wait (async + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> +
> + return 0;
> +}
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
> @@ -0,0 +1,45 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i, q = 5;
> + unsigned char *h, *g;
> + void *d;
> +
> + h = (unsigned char *) malloc (N);
> + g = (unsigned char *) malloc (N);
> + for (i = 0; i < N; i++)
> + {
> + g[i] = i;
> + }
> +
> + acc_create_async (h, N, q);
> +
> + acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
> + memset (&h[0], 0, N);
> +
> + acc_wait (q);
Similar here.
> + acc_update_self_async (h, N, q + 1);
> + acc_delete_async (h, N, q + 1);
> +
> + acc_wait (q + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> + free (g);
> +
> + return 0;
> +}
> --- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
(Later also similarly copied into 'libgomp.oacc-fortran/lib-16-2.f90'.)
Similar:
> @@ -0,0 +1,57 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +program main
> + use openacc
> + implicit none
> +
> + integer, parameter :: N = 256
> + integer, allocatable :: h(:)
> + integer :: i
> + integer :: async = 5
> +
> + allocate (h(N))
> +
> + do i = 1, N
> + h(i) = i
> + end do
> +
> + call acc_copyin (h)
> +
> + do i = 1, N
> + h(i) = i + i
> + end do
> +
> + call acc_update_device_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
Don't we need 'acc_wait' here (while 'acc_update_device_async' may still
be reading from 'h'), before overwriting 'h' here:
> +
> + h(:) = 0
> +
> + call acc_copyout_async (h, sizeof (h), async)
> +
> + call acc_wait (async)
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_copyin (h, sizeof (h))
> +
> + h(:) = 0
> +
> + call acc_update_self_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
Don't we need 'acc_wait' here (to make sure we finish device to host copy
of 'h'), before evaluating 'h' here:
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_delete_async (h, async)
> +
> + call acc_wait (async)
> +
> + if (acc_is_present (h) .neqv. .FALSE.) call abort
> +
> +end program
Julian has patches for most of these (as part of other commits).
Grüße
Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
More information about the Gcc-patches
mailing list