9 OpenACC Library Interoperability

9.1 Introduction

The OpenACC library uses the CUDA Driver API, and may interact with programs that use the Runtime library directly, or another library based on the Runtime library, e.g., CUBLAS3. This chapter describes the use cases and what changes are required in order to use both the OpenACC library and the CUBLAS and Runtime libraries within a program.

9.2 First invocation: NVIDIA CUBLAS library API

In this first use case (see below), a function in the CUBLAS library is called prior to any of the functions in the OpenACC library. More specifically, the function cublasCreate().

When invoked, the function initializes the library and allocates the hardware resources on the host and the device on behalf of the caller. Once the initialization and allocation has completed, a handle is returned to the caller. The OpenACC library also requires initialization and allocation of hardware resources. Since the CUBLAS library has already allocated the hardware resources for the device, all that is left to do is to initialize the OpenACC library and acquire the hardware resources on the host.

Prior to calling the OpenACC function that initializes the library and allocate the host hardware resources, you need to acquire the device number that was allocated during the call to cublasCreate(). The invoking of the runtime library function cudaGetDevice() accomplishes this. Once acquired, the device number is passed along with the device type as parameters to the OpenACC library function acc_set_device_num().

Once the call to acc_set_device_num() has completed, the OpenACC library uses the context that was created during the call to cublasCreate(). In other words, both libraries share the same context.

    /* Create the handle */
    s = cublasCreate(&h);
    if (s != CUBLAS_STATUS_SUCCESS)
    {
        fprintf(stderr, "cublasCreate failed %d\n", s);
        exit(EXIT_FAILURE);
    }

    /* Get the device number */
    e = cudaGetDevice(&dev);
    if (e != cudaSuccess)
    {
        fprintf(stderr, "cudaGetDevice failed %d\n", e);
        exit(EXIT_FAILURE);
    }

    /* Initialize OpenACC library and use device 'dev' */
    acc_set_device_num(dev, acc_device_nvidia);

Use Case 1

9.3 First invocation: OpenACC library API

In this second use case (see below), a function in the OpenACC library is called prior to any of the functions in the CUBLAS library. More specifically, the function acc_set_device_num().

In the use case presented here, the function acc_set_device_num() is used to both initialize the OpenACC library and allocate the hardware resources on the host and the device. In the call to the function, the call parameters specify which device to use and what device type to use, i.e., acc_device_nvidia. It should be noted that this is but one method to initialize the OpenACC library and allocate the appropriate hardware resources. Other methods are available through the use of environment variables and these is discussed in the next section.

Once the call to acc_set_device_num() has completed, other OpenACC functions can be called as seen with multiple calls being made to acc_copyin(). In addition, calls can be made to functions in the CUBLAS library. In the use case a call to cublasCreate() is made subsequent to the calls to acc_copyin(). As seen in the previous use case, a call to cublasCreate() initializes the CUBLAS library and allocates the hardware resources on the host and the device. However, since the device has already been allocated, cublasCreate() only initializes the CUBLAS library and allocates the appropriate hardware resources on the host. The context that was created as part of the OpenACC initialization is shared with the CUBLAS library, similarly to the first use case.

    dev = 0;

    acc_set_device_num(dev, acc_device_nvidia);

    /* Copy the first set to the device */
    d_X = acc_copyin(&h_X[0], N * sizeof (float));
    if (d_X == NULL)
    { 
        fprintf(stderr, "copyin error h_X\n");
        exit(EXIT_FAILURE);
    }

    /* Copy the second set to the device */
    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
    if (d_Y == NULL)
    { 
        fprintf(stderr, "copyin error h_Y1\n");
        exit(EXIT_FAILURE);
    }

    /* Create the handle */
    s = cublasCreate(&h);
    if (s != CUBLAS_STATUS_SUCCESS)
    {
        fprintf(stderr, "cublasCreate failed %d\n", s);
        exit(EXIT_FAILURE);
    }

    /* Perform saxpy using CUBLAS library function */
    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
    if (s != CUBLAS_STATUS_SUCCESS)
    {
        fprintf(stderr, "cublasSaxpy failed %d\n", s);
        exit(EXIT_FAILURE);
    }

    /* Copy the results from the device */
    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));

Use Case 2

9.4 OpenACC library and environment variables

There are two environment variables associated with the OpenACC library that may be used to control the device type and device number: ACC_DEVICE_TYPE and ACC_DEVICE_NUM, respectively. These two environment variables can be used as an alternative to calling acc_set_device_num(). As seen in the second use case, the device type and device number were specified using acc_set_device_num(). If however, the aforementioned environment variables were set, then the call to acc_set_device_num() would not be required.

The use of the environment variables is only relevant when an OpenACC function is called prior to a call to cudaCreate(). If cudaCreate() is called prior to a call to an OpenACC function, then you must call acc_set_device_num()4


Footnotes

(3)

See section 2.26, "Interactions with the CUDA Driver API" in "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, for additional information on library interoperability.

(4)

More complete information about ACC_DEVICE_TYPE and ACC_DEVICE_NUM can be found in sections 4.1 and 4.2 of the OpenACC Application Programming Interface”, Version 2.6.