This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Nvptx GPU offloading using OpenMP4 and GCC 7.2


Hi, 

I have been trying to install the GCC 7.2 compiler with offload capabilities for nvptx, but so far, unsuccesful. 

I mainly based myself on https://gcc.gnu.org/wiki/Offloading and https://kristerw.blogspot.nl/2017/04/building-gcc-with-support-for-nvidia.html. The script I finally used for compilation is attached (compileScript.sh) - for the sake of understanding the script, note that the module load cuda/8.0.61 sets the CUDA_PATH variable, you can ignore the #SBATCH lines, which are for compilation from a batch job. 

I've managed to compile the nvptx-tools, GCC-nvptx and GCC-host compilers are without errors. However, when I compile a minimal example of a for loop distributed using an openMP4 "#pragma omp target" statement (gcc -fopenmp -o openMP_GPU_minimal openMP_GPU_minimal.c), the compiler returns the following error: 

gcc: warning: '-x lto' after last input file has no effect 
gcc: fatal error: no input files 

Attached you'll find the compiler output I get with the -v option, it may be more informative than the rather vague warning above. 

I found this thread https://gcc.gnu.org/ml/gcc-help/2016-04/msg00111.html that deals with the exactly the same issue, but the suggestion to install everything (nvptx-tools, gcc-host and gcc-accelerator compilers) in the same <something>/install directory didn't help me: I already did that to begin with, as it is suggested by the script of kristerw. 

Another thing I noticed is that if I add the -flto option, the compilation completes without errors. However, when I then run omp_is_initial_device() inside the #pragma omp target region, it returns 'true', indicating that the code is running on the host device, and not on the accelerator (GPU), as intended. Note that omp_get_num_devices() correctly returns 2 (there are 2 GPUs in the system), but I don't think this tells me anything regarding if I can succesfully offload code: I believe omp_get_num_devices() is just host code, defined in the libgomp.so. So at best, it tells me that I'm using a libgomp.so that supports detecting these accelerators. 

For the sake of completeness, let me also include the output of gcc -v for the host and accelerator compilers, so you can check if that makes sense. 

For the host compiler (gcc or x86_64-pc-linux-gnu-gcc, both return the same): 
Using built-in specs. 
COLLECT_GCC=/home/casparl/GCC_with_nvptx/work/install/bin/gcc 
COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none 
Target: x86_64-pc-linux-gnu 
Configured with: ../gcc-7.2.0/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/home/casparl/GCC_with_nvptx/work/install --with-cuda-driver-include=/hpc/sw/cuda/8.0.61//include --with-cuda-driver-lib=/hpc/sw/cuda/8.0.61//lib64 --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install 
Thread model: posix 
gcc version 7.2.0 (GCC) 

For the accelerator compiler (x86_64-pc-linux-gnu-accel-nvptx-none-gcc -v): 
Using built-in specs. 
COLLECT_GCC=x86_64-pc-linux-gnu-accel-nvptx-none-gcc 
COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/accel/nvptx-none/lto-wrapper 
Target: nvptx-none 
Configured with: ../gcc-7.2.0/configure --target=nvptx-none --with-build-time-tools=/home/casparl/GCC_with_nvptx/work/install/nvptx-none/bin --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-sjlj-exceptions --enable-newlib-io-long-long --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install 
Thread model: single 
gcc version 7.2.0 (GCC) 

I'm afraid I don't have enough insight into what the gcc warning indicates (e.g. if the problem is with the options of my host or accelerator compilers, or with which compilers/linkers are used, etc). Any help to get me going is greatly appreciated, because I've exhausted all potential solutions I could think off (and I'd love to give openMP offloading a try!). 

Cheers, 

Caspar van Leeuwen

Attachment: compileScript.sh
Description: application/shellscript

Attachment: compilerOutput.txt
Description: Text document

#include <omp.h>

void gemm_OpenMP_GPU(float *A, float *B, float *C,
                 const int A_rows, const int A_cols, const int B_rows)
{
  int i, j, k;
  #pragma omp target teams distribute parallel for collapse(2) schedule(static,1) shared(A, B, C) private(i, j, k)
  for (i = 0; i < A_rows; i++)
  {
    for (k=0; k<A_cols; k++)
    {
      for (j = 0; j < B_rows; j++)
	  {
        C[i*B_rows + j] += A[i*A_cols+k] * B[k*B_rows+j];
      }
    }
  }
}


int main(int argc, char **argv)
{
  int A_rows=5;
  int A_cols=5;
  int B_rows=5;
  int B_cols=5;

  double dtime;

  float A[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} };
  float B[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} };
  float C[5][5] = { {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0} };

  dtime = omp_get_wtime();
  gemm_OpenMP_GPU(&A[0][0], &B[0][0], &C[0][0], A_rows, A_cols, B_cols);
  dtime = omp_get_wtime() - dtime;
//  std::cout << "Time with OpenMp: " << dtime << std::endl;

  return 0;
}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]