This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 25 Apr 2018 12:58:47 +0200
- Subject: [nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90
Hi,
when running the libgomp tests with nvptx accelerator on an Nvidia Titan
V, we run into these failures:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O1 execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O2 execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -Os execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O1 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O2 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -Os execution test
...
These tests contain recursive functions, and the failures are due to the
fact that during execution it runs out of thread stack. The symptom is:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
which we can turn into this symptom:
...
libgomp: cuStreamSynchronize error: an illegal instruction was encountered
...
by using GOMP_NVPTX_JIT=-O0, which inserts a valid thread stack check
after the thread stack decrement at the start of each function.
The thread stack limit defaults to 1024 on all the boards that I've
checked, including Titan V. The tests have a recursion depth of ~25, so
when the frame size of the recursive function exceeds ~40, we can be
sure to run out off thread stack. [ It also may happen at a smaller
frame size, given that some thread stack space may have already been
consumed before calling the recursive function. ]
[ The nvptx libgomp port uses a 128k per-warp stack in the global
memory, avoiding the use of the .local directive in offloading
functions, which would be mapped onto thread stack. But doing so does
not eliminate the thread stack usage. F.i., device routine parameters
can be stored on thread stack. ]
Concluding, these tests run out thread stack on Nvidia Titan V because
the recursive functions have a larger frame size than we've seen for the
Nvidia architecture flavours that we've tested before.
The patch fixes this by reducing the recursion depth.
OK for stage4 trunk?
Thanks,
- Tom
[nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90
2018-04-25 Tom de Vries <tom@codesourcery.com>
PR target/85519
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth from 25 to 23.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
---
libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 | 4 +++-
libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 | 6 ++++--
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index df941ee..51de6b2 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -27,5 +27,7 @@ end module
program e_53_1
use e_53_1_mod, only : fib, fib_wrapper
if (fib (15) /= fib_wrapper (15)) STOP 1
- if (fib (25) /= fib_wrapper (25)) STOP 2
+ ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+ ! Nvidia Titan V.
+ if (fib (23) /= fib_wrapper (23)) STOP 2
end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index 9c31569..76cce01 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -4,9 +4,11 @@ program e_53_2
!$omp declare target (fib)
integer :: x, fib
!$omp target map(from: x)
- x = fib (25)
+ ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+ ! Nvidia Titan V.
+ x = fib (23)
!$omp end target
- if (x /= fib (25)) STOP 1
+ if (x /= fib (23)) STOP 1
end program
integer recursive function fib (n) result (f)