[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

vries at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Mar 25 09:40:12 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Doesn't whatever driver/library API we use from libgomp to invoke workloads
> report actual errors?  Maybe we need to improve there.

This:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
...
seems to be the string for cudaErrorLaunchTimeout, which AFAICT is dedicated to
this situation, so we could treat that error code specially in cuda_error in
plugin-nvptx.c and emit a custom message.

Say:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated (5
second time-out caused by launching on a device running a display manager)
...

Alternatively, we could detect cudaDeviceProp::kernelExecTimeoutEnabled and
emit a warning when initializing or before launching the first kernel.


More information about the Gcc-bugs mailing list