This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: OpenACC-Library-Interoperability
- From: Vikram Singh <vikramsingh001 at gmail dot com>
- To: Thomas Schwinge <thomas at codesourcery dot com>
- Cc: Salvatore Filippone <filippone dot salvatore at gmail dot com>, Vladimír Fuka <vladimir dot fuka at gmail dot com>, James Norris <jnorris at codesourcery dot com>, Chung-Lin Tang <cltang at codesourcery dot com>, Fortran List <fortran at gcc dot gnu dot org>
- Date: Wed, 3 Aug 2016 18:52:09 +0300
- Subject: Re: OpenACC-Library-Interoperability
- Authentication-results: sourceware.org; auth=none
- References: <CAD0gq3XehzKyOqvqWfmSErNiy8CKkHqNvqAg5nU5eCyV5gQ4RA@mail.gmail.com> <8737qn5rd4.fsf@kepler.schwinge.homeip.net> <CAD0gq3XgHLLWfY8BJWCc2vPC6S4E2tHBvBfdok0yK5Gs4K5Qzw@mail.gmail.com> <8760vj45bi.fsf@hertz.schwinge.homeip.net> <CAKe2itfWOXb11iRpVtJXsU9Eh-3RpdK4dt_WX+WRO6bLvgd-NQ@mail.gmail.com> <CANSzZf4vnc23kHpSGQ4mW8RmuAYeytZu8dYvPQ3xi_v9V9Q7iQ@mail.gmail.com> <2b4f59d5-be38-2814-27bb-73aa7ffb4b8f@codesourcery.com> <878u0o6wwj.fsf@kepler.schwinge.homeip.net> <CAD0gq3VX0x6qCkMWLJBqK=9WRYk0O_38yCctwaQRcmqdWU0jiQ@mail.gmail.com> <87inyjuw6b.fsf@kepler.schwinge.homeip.net> <CAD0gq3VoRWCiXRkgi-bnGLBfSjR-bFc0Mzp19LRr+yWP4MrYLg@mail.gmail.com>
Sorry, I realized I have to send it as plain text.
I finally found some time to repeat my previous attempts, but with gcc
6.1. I'll try to be a little detailed so that people don't have to
waste time in doing stuff I did.
I used the same script as before
https://github.com/olcf/OLCFHack15/blob/master/GCC5OffloadTest/auto-gcc5-offload-openacc-build-install.sh
And modified it for the paths for gcc6.1. For some reason
x86_64-pc-linux-gnu-accel-nvptx-none-gcc
that builds in install/bin
does not look for directories in $LD_LIBRARY_PATH. I found this using
the -print-search-dirs option.
x86_64-pc-linux-gnu-accel-nvptx-none-gcc -print-search-dirs
So, I got a few errors for libraries not found for libraries that were
present in install/lib64. I just copied those libraries to
install/nvptx-none/lib/.
With that I can now use gcc6.
Onto the next task. I made a simple testcase. I'll copy paste the
whole thing here. Lets call this test.f90
program example_dgemm
use iso_c_binding
implicit none
integer :: N = 8
real(c_double), allocatable :: A(:,:), B(:, :), C(:, :)
integer :: size_of_real, i, j
integer*8 :: devPtrA, devPtrB, devPtrC
size_of_real = 16
allocate(A(N, N))
allocate(B(N, N))
allocate(C(N, N))
!$ACC PARALLEL COPY(A)
do i = 1, N
do j = 1, N
A(i, j) = i + j
end do
end do
!$ACC END PARALLEL
!$ACC PARALLEL COPY(B)
do i = 1, N
do j = 1, N
B(i, j) = j
end do
end do
!$ACC END PARALLEL
call cublas_Alloc(N*N, size_of_real, devPtrA)
call cublas_Alloc(N*N, size_of_real, devPtrB)
call cublas_Alloc(N*N, size_of_real, devPtrC)
! Copy Fixed Data to the GPU
call cublas_Set_Matrix(N, N, size_of_real, A, N, devPtrA, N)
call cublas_Set_Matrix(N, N, size_of_real, C, N, devPtrC, N)
! Copy data to the GPU
call cublas_Set_Matrix(N, N, size_of_real, B, N, devPtrB, N)
! Do DGEMM on the GPU
call cublas_DGEMM('N', 'N', N, N, N, &
1.0_c_double, devPtrA, N, devPtrB, N, 0.0_c_double, devPtrC, N)
! Copy data from the GPU
call cublas_Get_matrix(N, N, size_of_real, devPtrC, N, C, N)
call cublas_Free(devPtrA)
call cublas_Free(devPtrB)
call cublas_Free(devPtrC)
deallocate(A)
deallocate(B)
deallocate(C)
end program example_dgemm
Basically, I can test both CUBLAS and OPENACC using this simple code.
But, using CUBLAS needs some extra stuff. So what I need to do is copy
fortran.h, fortran_common.h and fortran.c from /usr/local/cuda/src.
Now, I do
./rungcc6.sh gcc -Wall -g -I/usr/local/cuda/include
-I/usr/local/cuda/src -DCUBLAS_GFORTRAN -c fortran.c
Finally, I do
./rungcc6.sh gfortran -Wall -g test.f90 fortran.o -fopenacc
-foffload=nvptx-none -foffload=-O3 -O3 -o gpu.x
-L/usr/local/cuda/lib64 -lcublas -lcudart
And finally nvprof ./gpu.x gives output
==9155== Profiling application: ./gpu.x
==9155== Profiling result:
Time(%) Time Calls Avg Min Max Name
23.60% 13.952us 1 13.952us 13.952us 13.952us MAIN__$_omp_fn$0
23.55% 13.920us 1 13.920us 13.920us 13.920us MAIN__$_omp_fn$1
18.57% 10.976us 16 686ns 576ns 1.0880us [CUDA memcpy HtoD]
13.72% 8.1080us 2 4.0540us 2.0430us 6.0650us [CUDA memcpy HtoH]
12.07% 7.1360us 1 7.1360us 7.1360us 7.1360us void
gemm_kernel2x2_core<double, bool=0, bool=0, bool=0, bool=0,
bool=0>(double*, double const *, double const *, int, int, int, int,
int, int, double*, double*, double, double, int)
8.50% 5.0240us 3 1.6740us 1.5680us 1.8880us [CUDA memcpy DtoH]
Clearly openacc loops run on GPU and DGEMM kernels also run on GPU.
So the only thing left now, is to put host_data so that the whole
process of cublas_alloc, cublas_set_matrix, cublas_get_matrix need not
be done.
If you can tell me how the patch is applied, I can test it with this
simple example.
Regards,
Vikram
On Mon, Aug 1, 2016 at 6:58 PM, Vikram Singh <vikramsingh001@gmail.com> wrote:
> I finally found some time to repeat my previous attempts, but with gcc 6.1.
> I'll try to be a little detailed so that people don't have to waste time in
> doing stuff I did.
>
> I used the same script as before
>
> https://github.com/olcf/OLCFHack15/blob/master/GCC5OffloadTest/auto-gcc5-offload-openacc-build-install.sh
>
> And modified it for the paths for gcc6.1. For some reason
>
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc
>
> that builds in install/bin
>
> does not look for directories in $LD_LIBRARY_PATH. I found this using the
> -print-search-dirs option.
>
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc -print-search-dirs
>
> So, I got a few errors for libraries not found for libraries that were
> present in install/lib64. I just copied those libraries to
> install/nvptx-none/lib/.
>
> With that I can now use gcc6.
>
> Onto the next task. I made a simple testcase. I'll copy paste the whole
> thing here. Lets call this test.f90
>
>
> program example_dgemm
>
> use iso_c_binding
> implicit none
>
> integer :: N = 8
> real(c_double), allocatable :: A(:,:), B(:, :), C(:, :)
> integer :: size_of_real, i, j
> integer*8 :: devPtrA, devPtrB, devPtrC
>
> size_of_real = 16
>
> allocate(A(N, N))
> allocate(B(N, N))
> allocate(C(N, N))
>
> !$ACC PARALLEL COPY(A)
> do i = 1, N
> do j = 1, N
> A(i, j) = i + j
> end do
> end do
> !$ACC END PARALLEL
> !$ACC PARALLEL COPY(B)
> do i = 1, N
> do j = 1, N
> B(i, j) = j
> end do
> end do
> !$ACC END PARALLEL
>
> call cublas_Alloc(N*N, size_of_real, devPtrA)
> call cublas_Alloc(N*N, size_of_real, devPtrB)
> call cublas_Alloc(N*N, size_of_real, devPtrC)
>
> ! Copy Fixed Data to the GPU
> call cublas_Set_Matrix(N, N, size_of_real, A, N, devPtrA, N)
> call cublas_Set_Matrix(N, N, size_of_real, C, N, devPtrC, N)
>
> ! Copy data to the GPU
> call cublas_Set_Matrix(N, N, size_of_real, B, N, devPtrB, N)
>
> ! Do DGEMM on the GPU
> call cublas_DGEMM('N', 'N', N, N, N, &
> 1.0_c_double, devPtrA, N, devPtrB, N, 0.0_c_double, devPtrC, N)
>
> ! Copy data from the GPU
> call cublas_Get_matrix(N, N, size_of_real, devPtrC, N, C, N)
>
> call cublas_Free(devPtrA)
> call cublas_Free(devPtrB)
> call cublas_Free(devPtrC)
>
> deallocate(A)
> deallocate(B)
> deallocate(C)
>
> end program example_dgemm
>
> Basically, I can test both CUBLAS and OPENACC using this simple code. But,
> using CUBLAS needs some extra stuff. So what I need to do is copy fortran.h,
> fortran_common.h and fortran.c from /usr/local/cuda/src.
>
> Now, I do
>
> ./rungcc6.sh gcc -Wall -g -I/usr/local/cuda/include -I/usr/local/cuda/src
> -DCUBLAS_GFORTRAN -c fortran.c
>
> Finally, I do
>
> ./rungcc6.sh gfortran -Wall -g test.f90 fortran.o -fopenacc
> -foffload=nvptx-none -foffload=-O3 -O3 -o gpu.x -L/usr/local/cuda/lib64
> -lcublas -lcudart
>
> And finally nvprof ./gpu.x gives output
>
> ==9155== Profiling application: ./gpu.x
> ==9155== Profiling result:
> Time(%) Time Calls Avg Min Max Name
> 23.60% 13.952us 1 13.952us 13.952us 13.952us MAIN__$_omp_fn$0
> 23.55% 13.920us 1 13.920us 13.920us 13.920us MAIN__$_omp_fn$1
> 18.57% 10.976us 16 686ns 576ns 1.0880us [CUDA memcpy
> HtoD]
> 13.72% 8.1080us 2 4.0540us 2.0430us 6.0650us [CUDA memcpy
> HtoH]
> 12.07% 7.1360us 1 7.1360us 7.1360us 7.1360us void
> gemm_kernel2x2_core<double, bool=0, bool=0, bool=0, bool=0, bool=0>(double*,
> double const *, double const *, int, int, int, int, int, int, double*,
> double*, double, double, int)
> 8.50% 5.0240us 3 1.6740us 1.5680us 1.8880us [CUDA memcpy
> DtoH]
>
> Clearly openacc loops run on GPU and DGEMM kernels also run on GPU.
>
> So the only thing left now, is to put host_data so that the whole process of
> cublas_alloc, cublas_set_matrix, cublas_get_matrix need not be done.
>
> If you can tell me how the patch is applied, I can test it with this simple
> example.
>
> Regards,
> Vikram
>
>
> On Thu, May 12, 2016 at 6:34 PM, Thomas Schwinge <thomas@codesourcery.com>
> wrote:
>>
>> Hi!
>>
>> On Fri, 15 Apr 2016 13:59:53 +0300, Vikram Singh
>> <vikramsingh001@gmail.com> wrote:
>> > I checked libgomp.oacc-c-c++-common/context-*.c, and they seem to be
>> > exactly what I needed to start with.
>> >
>> > If I understand it correctly, I'll anyway be trying to implement
>> > something similar with PGI compilers, so I'll have something as a
>> > starting point.
>>
>> Any progress on that already? I had a very quick look myself, but it's
>> not as easy as I thought... A "courageous" use of "use cublas" in
>> Fortran code compiled with gfortran results in: "Fatal Error: Can't open
>> module file 'cublas.mod' for reading". The problem is: Fortran
>> interfacing to C libraries (which cuBLAS is).
>>
>> <http://docs.nvidia.com/cuda/cublas/index.html#appendix-b-cublas-fortran-bindings>
>> has some instructions how to do it. Anyone got that to work already?
>>
>> > But again, it will need the OpenACC host_data construct to be setup
>> > for gfortran to test.
>>
>> In
>>
>> <http://news.gmane.org/find-root.php?message_id=%3C2b4f59d5-be38-2814-27bb-73aa7ffb4b8f%40codesourcery.com%3E>,
>> Chung-Lin has now posted a patch (pending review) that should make the
>> OpenACC host_data construct usable in GCC Fortran. (Problem discussed in
>>
>> <http://news.gmane.org/find-root.php?message_id=%3C878u0o6wwj.fsf%40kepler.schwinge.homeip.net%3E>
>> before.)
>>
>> For reference:
>>
>> > On Fri, Apr 15, 2016 at 11:57 AM, Thomas Schwinge
>> > <thomas@codesourcery.com> wrote:
>> > > On Fri, 15 Apr 2016 11:35:06 +0300, Vikram Singh
>> > > <vikramsingh001@gmail.com> wrote:
>> > >> Yes, I came to the conclusion that host_data
>> > >> would be the only way to do it in fortran.
>> > >>
>> > >> On the other hand, I though there were no plans to implement it in
>> > >> gfortran 6 either
>> > >
>> > > I still hope we'll get this (that is, <https://gcc.gnu.org/PR70598>)
>> > > fixed in time for the GCC 6.1 release. I'll keep you posted.
>> > >
>> > >
>> > >> > I'm copying Jim, who is the author of this chapter in the
>> > >> > documentation
>> > >> > as well as the
>> > >> > libgomp/testsuite/libgomp.oacc-c-c++-common/context-*.c
>> > >> > test cases, and much of the relevant libgomp code, too, and who
>> > >> > should
>> > >> > please correct me if I'm wrong. I'll make a note for later, that
>> > >> > we
>> > >> > should translate the libgomp.oacc-c-c++-common/context-*.c test
>> > >> > cases to
>> > >> > Fortran, and also replicate them using the OpenACC host_data
>> > >> > construct
>> > >> > (like in
>> > >> > libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c), and
>> > >> > the same for the documentation you referenced. (Vikram, please
>> > >> > tell if
>> > >> > you're interested to work on these items.)
>> > >>
>> > >> I am not sure I understand what you want me to work on exactly. I am
>> > >> not really much of a C programmer, so I wouldn't be good at it. I
>> > >> would still like to help.
>> > >
>> > > Sorry for being unclear. My idea/question has been whether you're
>> > > interested in helping by translating the documentation as well as the
>> > > libgomp.oacc-c-c++-common/context-*.c test cases from C to Fortran
>> > > (using
>> > > the OpenACC host_data construct instead of the acc_* functions). If
>> > > yes,
>> > > then that's great, if not, then one of us will do it at some point.
>>
>>
>> Grüße
>> Thomas
>
>