[PATCH, nvptx, libgomp] Avoid use of GOMP_PLUGIN_acc_thread() in nvptx_free()

Chung-Lin Tang cltang@codesourcery.com
Thu Aug 20 13:03:20 GMT 2020

Hi Tom,
this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a
"GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under
OpenMP offloading.

This check was originally used to determine if nvptx_free() was running under
CUDA callback context, when freeing resources from an OpenACC asynchronous compute
region. Since CUDA API calls are not allowed inside callback context, we have
to save the freed block to ptx_dev->free_blocks, and cuMemFree it later.

The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal host thread
vs. callback thread worked under -fopenacc, but since the OpenACC per-thread data
is not created under -fopenmp, and always returned NULL, we have a leak situation
where OpenMP offloading kept accumulating freed device memory blocks until failing;
nvptx_free() never reaches the part where cuMemFree() is actually called.

I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is returned
for such CUDA calls inside callback context, and it appears to be enough to replace
the current check, so the new code sees if this error is returned on the first
cuMemGetAddressRange call to determine callback context. This should now work
for both OpenACC/OpenMP.

(Tobias, Catherine, the earlier internal patch to re-organize this callback context
checking did not work in general, since OpenACC also uses the .queue_callback
functionality to free the struct target_mem_desc asynchronously, so in general we
have to ensure nvptx_free() could be used under both normal/callback context)

This patch has been libgomp tested for x86_64-linux with nvptx offloading without
regressions, and should be applied for mainline and GCC10. Is this okay?


2020-08-20  Chung-Lin Tang  <cltang@codesourcery.com>

	* plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread () == NULL"
	test into check of CUDA_ERROR_NOT_PERMITTED status for cuMemGetAddressRange.
	Adjust comments.
-------------- next part --------------
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index ec103a2f40b..188a34f1d04 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread *thr, void *p)
 static bool
 nvptx_free (void *p, struct ptx_device *ptx_dev)
-  /* Assume callback context if this is null.  */
-  if (GOMP_PLUGIN_acc_thread () == NULL)
+  CUdeviceptr pb;
+  size_t ps;
+  CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps,
+				  (CUdeviceptr) p);
+      /* We assume that this error indicates we are in a CUDA callback context,
+	 where all CUDA calls are not allowed. Arrange to free this piece of
+	 device memory later.  */
       struct ptx_free_block *n
 	= GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block));
       n->ptr = p;
       pthread_mutex_lock (&ptx_dev->free_blocks_lock);
       n->next = ptx_dev->free_blocks;
       ptx_dev->free_blocks = n;
       pthread_mutex_unlock (&ptx_dev->free_blocks_lock);
       return true;
-  CUdeviceptr pb;
-  size_t ps;
-  CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p);
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r));
+      return false;
+    }
   if ((CUdeviceptr) p != pb)
       GOMP_PLUGIN_error ("invalid device address");
       return false;

More information about the Gcc-patches mailing list