[nvptx, libgomp, testsuite, PR85519] Reduce recursion depth in declare_target-{1,2}.f90

Tom de Vries Tom_deVries@mentor.com
Wed Apr 25 11:03:00 GMT 2018


Hi,

when running the libgomp tests with nvptx accelerator on an Nvidia Titan 
V, we run into these failures:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -Os  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -Os  execution test
...

These tests contain recursive functions, and the failures are due to the 
fact that during execution it runs out of thread stack. The symptom is:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
which we can turn into this symptom:
...
libgomp: cuStreamSynchronize error: an illegal instruction was encountered
...
by using GOMP_NVPTX_JIT=-O0, which inserts a valid thread stack check 
after the thread stack decrement at the start of each function.

The thread stack limit defaults to 1024 on all the boards that I've 
checked, including Titan V. The tests have a recursion depth of ~25, so 
when the frame size of the recursive function exceeds ~40, we can be 
sure to run out off thread stack. [ It also may happen at a smaller 
frame size, given that some thread stack space may have already been 
consumed before calling the recursive function. ]

[ The nvptx libgomp port uses a 128k per-warp stack in the global 
memory, avoiding the use of the .local directive in offloading 
functions, which would be mapped onto thread stack. But doing so does 
not eliminate the thread stack usage. F.i., device routine parameters 
can be stored on thread stack. ]


Concluding, these tests run out thread stack on Nvidia Titan V because 
the recursive functions have a larger frame size than we've seen for the 
Nvidia architecture flavours that we've tested before.

The patch fixes this by reducing the recursion depth.

OK for stage4 trunk?

Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-nvptx-libgomp-testsuite-Reduce-recursion-depth-in-declare_target-1-2-.f90.patch
Type: text/x-patch
Size: 1887 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180425/f3f01009/attachment.bin>


More information about the Gcc-patches mailing list