The OpenACC library uses the CUDA Driver API, and may interact with programs that use the Runtime library directly, or another library based on the Runtime library, e.g., CUBLAS3. This chapter describes the use cases and what changes are required in order to use both the OpenACC library and the CUBLAS and Runtime libraries within a program.
In this first use case (see below), a function in the CUBLAS library is called
prior to any of the functions in the OpenACC library. More specifically, the
function cublasCreate()
.
When invoked, the function initializes the library and allocates the hardware resources on the host and the device on behalf of the caller. Once the initialization and allocation has completed, a handle is returned to the caller. The OpenACC library also requires initialization and allocation of hardware resources. Since the CUBLAS library has already allocated the hardware resources for the device, all that is left to do is to initialize the OpenACC library and acquire the hardware resources on the host.
Prior to calling the OpenACC function that initializes the library and
allocate the host hardware resources, you need to acquire the device number
that was allocated during the call to cublasCreate()
. The invoking of the
runtime library function cudaGetDevice()
accomplishes this. Once
acquired, the device number is passed along with the device type as
parameters to the OpenACC library function acc_set_device_num()
.
Once the call to acc_set_device_num()
has completed, the OpenACC
library uses the context that was created during the call to
cublasCreate()
. In other words, both libraries share the
same context.
/* Create the handle */ s = cublasCreate(&h); if (s != CUBLAS_STATUS_SUCCESS) { fprintf(stderr, "cublasCreate failed %d\n", s); exit(EXIT_FAILURE); } /* Get the device number */ e = cudaGetDevice(&dev); if (e != cudaSuccess) { fprintf(stderr, "cudaGetDevice failed %d\n", e); exit(EXIT_FAILURE); } /* Initialize OpenACC library and use device 'dev' */ acc_set_device_num(dev, acc_device_nvidia);
In this second use case (see below), a function in the OpenACC library is
called prior to any of the functions in the CUBLAS library. More specifically,
the function acc_set_device_num()
.
In the use case presented here, the function acc_set_device_num()
is used to both initialize the OpenACC library and allocate the hardware
resources on the host and the device. In the call to the function, the
call parameters specify which device to use and what device
type to use, i.e., acc_device_nvidia
. It should be noted that this
is but one method to initialize the OpenACC library and allocate the
appropriate hardware resources. Other methods are available through the
use of environment variables and these is discussed in the next section.
Once the call to acc_set_device_num()
has completed, other OpenACC
functions can be called as seen with multiple calls being made to
acc_copyin()
. In addition, calls can be made to functions in the
CUBLAS library. In the use case a call to cublasCreate()
is made
subsequent to the calls to acc_copyin()
.
As seen in the previous use case, a call to cublasCreate()
initializes the CUBLAS library and allocates the hardware resources on the
host and the device. However, since the device has already been allocated,
cublasCreate()
only initializes the CUBLAS library and allocates
the appropriate hardware resources on the host. The context that was created
as part of the OpenACC initialization is shared with the CUBLAS library,
similarly to the first use case.
dev = 0; acc_set_device_num(dev, acc_device_nvidia); /* Copy the first set to the device */ d_X = acc_copyin(&h_X[0], N * sizeof (float)); if (d_X == NULL) { fprintf(stderr, "copyin error h_X\n"); exit(EXIT_FAILURE); } /* Copy the second set to the device */ d_Y = acc_copyin(&h_Y1[0], N * sizeof (float)); if (d_Y == NULL) { fprintf(stderr, "copyin error h_Y1\n"); exit(EXIT_FAILURE); } /* Create the handle */ s = cublasCreate(&h); if (s != CUBLAS_STATUS_SUCCESS) { fprintf(stderr, "cublasCreate failed %d\n", s); exit(EXIT_FAILURE); } /* Perform saxpy using CUBLAS library function */ s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1); if (s != CUBLAS_STATUS_SUCCESS) { fprintf(stderr, "cublasSaxpy failed %d\n", s); exit(EXIT_FAILURE); } /* Copy the results from the device */ acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
There are two environment variables associated with the OpenACC library
that may be used to control the device type and device number:
ACC_DEVICE_TYPE
and ACC_DEVICE_NUM
, respectively. These two
environment variables can be used as an alternative to calling
acc_set_device_num()
. As seen in the second use case, the device
type and device number were specified using acc_set_device_num()
.
If however, the aforementioned environment variables were set, then the
call to acc_set_device_num()
would not be required.
The use of the environment variables is only relevant when an OpenACC function
is called prior to a call to cudaCreate()
. If cudaCreate()
is called prior to a call to an OpenACC function, then you must call
acc_set_device_num()
4
See section 2.26, "Interactions with the CUDA Driver API" in "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, for additional information on library interoperability.
More complete information
about ACC_DEVICE_TYPE
and ACC_DEVICE_NUM
can be found in
sections 4.1 and 4.2 of the OpenACC
Application Programming Interface”, Version 2.6.