This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, I have been trying to install the GCC 7.2 compiler with offload capabilities for nvptx, but so far, unsuccesful. I mainly based myself on https://gcc.gnu.org/wiki/Offloading and https://kristerw.blogspot.nl/2017/04/building-gcc-with-support-for-nvidia.html. The script I finally used for compilation is attached (compileScript.sh) - for the sake of understanding the script, note that the module load cuda/8.0.61 sets the CUDA_PATH variable, you can ignore the #SBATCH lines, which are for compilation from a batch job. I've managed to compile the nvptx-tools, GCC-nvptx and GCC-host compilers are without errors. However, when I compile a minimal example of a for loop distributed using an openMP4 "#pragma omp target" statement (gcc -fopenmp -o openMP_GPU_minimal openMP_GPU_minimal.c), the compiler returns the following error: gcc: warning: '-x lto' after last input file has no effect gcc: fatal error: no input files Attached you'll find the compiler output I get with the -v option, it may be more informative than the rather vague warning above. I found this thread https://gcc.gnu.org/ml/gcc-help/2016-04/msg00111.html that deals with the exactly the same issue, but the suggestion to install everything (nvptx-tools, gcc-host and gcc-accelerator compilers) in the same <something>/install directory didn't help me: I already did that to begin with, as it is suggested by the script of kristerw. Another thing I noticed is that if I add the -flto option, the compilation completes without errors. However, when I then run omp_is_initial_device() inside the #pragma omp target region, it returns 'true', indicating that the code is running on the host device, and not on the accelerator (GPU), as intended. Note that omp_get_num_devices() correctly returns 2 (there are 2 GPUs in the system), but I don't think this tells me anything regarding if I can succesfully offload code: I believe omp_get_num_devices() is just host code, defined in the libgomp.so. So at best, it tells me that I'm using a libgomp.so that supports detecting these accelerators. For the sake of completeness, let me also include the output of gcc -v for the host and accelerator compilers, so you can check if that makes sense. For the host compiler (gcc or x86_64-pc-linux-gnu-gcc, both return the same): Using built-in specs. COLLECT_GCC=/home/casparl/GCC_with_nvptx/work/install/bin/gcc COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none Target: x86_64-pc-linux-gnu Configured with: ../gcc-7.2.0/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/home/casparl/GCC_with_nvptx/work/install --with-cuda-driver-include=/hpc/sw/cuda/8.0.61//include --with-cuda-driver-lib=/hpc/sw/cuda/8.0.61//lib64 --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install Thread model: posix gcc version 7.2.0 (GCC) For the accelerator compiler (x86_64-pc-linux-gnu-accel-nvptx-none-gcc -v): Using built-in specs. COLLECT_GCC=x86_64-pc-linux-gnu-accel-nvptx-none-gcc COLLECT_LTO_WRAPPER=/nfs/home4/casparl/GCC_with_nvptx/work/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/accel/nvptx-none/lto-wrapper Target: nvptx-none Configured with: ../gcc-7.2.0/configure --target=nvptx-none --with-build-time-tools=/home/casparl/GCC_with_nvptx/work/install/nvptx-none/bin --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-sjlj-exceptions --enable-newlib-io-long-long --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/home/casparl/GCC_with_nvptx/work/install Thread model: single gcc version 7.2.0 (GCC) I'm afraid I don't have enough insight into what the gcc warning indicates (e.g. if the problem is with the options of my host or accelerator compilers, or with which compilers/linkers are used, etc). Any help to get me going is greatly appreciated, because I've exhausted all potential solutions I could think off (and I'd love to give openMP offloading a try!). Cheers, Caspar van Leeuwen
Attachment:
compileScript.sh
Description: application/shellscript
Attachment:
compilerOutput.txt
Description: Text document
#include <omp.h> void gemm_OpenMP_GPU(float *A, float *B, float *C, const int A_rows, const int A_cols, const int B_rows) { int i, j, k; #pragma omp target teams distribute parallel for collapse(2) schedule(static,1) shared(A, B, C) private(i, j, k) for (i = 0; i < A_rows; i++) { for (k=0; k<A_cols; k++) { for (j = 0; j < B_rows; j++) { C[i*B_rows + j] += A[i*A_cols+k] * B[k*B_rows+j]; } } } } int main(int argc, char **argv) { int A_rows=5; int A_cols=5; int B_rows=5; int B_cols=5; double dtime; float A[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} }; float B[5][5] = { {0,0,1,4,5}, {1,2,7,8,3}, {2,4,1,7,8}, {3,6,2,5,6}, {4,8,7,2,1} }; float C[5][5] = { {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0}, {0,0,0,0,0} }; dtime = omp_get_wtime(); gemm_OpenMP_GPU(&A[0][0], &B[0][0], &C[0][0], A_rows, A_cols, B_cols); dtime = omp_get_wtime() - dtime; // std::cout << "Time with OpenMp: " << dtime << std::endl; return 0; }
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |