[PATCH, og8] Add OpenACC 2.6 `acc_get_property' support

Chung-Lin Tang chunglin_tang@mentor.com
Wed Dec 5 10:12:00 GMT 2018


Hi Maciej, please see below:

On 2018/12/4 12:51 AM, Maciej W. Rozycki wrote:
> +module openacc_c_string
> +  implicit none
> +
> +  interface
> +    function strlen (s) bind (C, name = "strlen")
> +      use iso_c_binding, only: c_ptr, c_size_t
> +      type (c_ptr), intent(in), value :: s
> +      integer (c_size_t) :: strlen
> +    end function
> +  end interface
> +
> +end module

> +subroutine acc_get_property_string_h (n, d, p, s)
> +  use iso_c_binding, only: c_char, c_int, c_ptr, c_f_pointer
> +  use openacc_internal, only: acc_get_property_string_l
> +  use openacc_c_string, only: strlen
> +  use openacc_kinds
...> +  pint = int (p, c_int)
> +  cptr = acc_get_property_string_l (n, d, pint)
> +  clen = int (strlen (cptr))
> +  call c_f_pointer (cptr, sptr, [clen])

AFAIK, things like strlen are already available in iso_c_binding, in forms like "C_strlen".
Can you check again if that 'openacc_c_string' module is really necessary?

> +union gomp_device_property_value
> +GOMP_OFFLOAD_get_property (int n, int prop)
> +{
> +  union gomp_device_property_value propval = { .val = 0 };
> +
> +  pthread_mutex_lock (&ptx_dev_lock);
> +
> +  if (!nvptx_init () || n >= nvptx_get_num_devices ())
> +    {
> +      pthread_mutex_unlock (&ptx_dev_lock);
> +      return propval;
> +    }
> +
> +  switch (prop)
> +    {
> +    case GOMP_DEVICE_PROPERTY_MEMORY:
> +      {
> +	size_t total_mem;
> +	CUdevice dev;
> +
> +	CUDA_CALL_ERET (propval, cuDeviceGet, &dev, n);
> +	CUDA_CALL_ERET (propval, cuDeviceTotalMem, &total_mem, dev);
> +	propval.val = total_mem;
> +      }
> +      break;
> +    case GOMP_DEVICE_PROPERTY_FREE_MEMORY:
> +      {
> +	size_t total_mem;
> +	size_t free_mem;
> +	CUdevice ctxdev;
> +	CUdevice dev;
> +
> +	CUDA_CALL_ERET (propval, cuCtxGetDevice, &ctxdev);
> +	CUDA_CALL_ERET (propval, cuDeviceGet, &dev, n);
> +	if (dev == ctxdev)
> +	  CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem);
> +	else if (ptx_devices[n])
> +	  {
> +	    CUcontext old_ctx;
> +
> +	    CUDA_CALL_ERET (propval, cuCtxPushCurrent, ptx_devices[n]->ctx);
> +	    CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem);
> +	    CUDA_CALL_ASSERT (cuCtxPopCurrent, &old_ctx);
> +	  }
> +	else
> +	  {
> +	    CUcontext new_ctx;
> +
> +	    CUDA_CALL_ERET (propval, cuCtxCreate, &new_ctx, CU_CTX_SCHED_AUTO,
> +			    dev);
> +	    CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem);
> +	    CUDA_CALL_ASSERT (cuCtxDestroy, new_ctx);
> +	  }

(I'm CCing Tom here, as he is maintainer for these parts)

As we discussed earlier on our internal list, I think properly using GOMP_OFFLOAD_init_device
is the right way, instead of using the lower level CUDA context create/destroy.

I did not mean for you to first init the device and then immediately destroy it by
GOMP_OFFLOAD_fini_device, just to obtain the property, but for you to just take the opportunity to initialize
it for use, and leave it there until program exit. That should save resources overall.
(BTW, CUDA contexts should be quite expensive to create/destroy, using a cuCtxCreate/Destroy pair is probably
almost as slow)

Tom, do you have any comments on how to best write this part?

Thanks,
Chung-Lin



More information about the Gcc-patches mailing list