[Bug target/88939] [nvptx, openacc, libgomp] Assertion `!s->map->active' failed for synchronous parallel with abort

vries at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Jan 21 09:53:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88939

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
Furthermore, because of running into the cudaMemFree error, the process hangs
indefinitely with this callstack:
...
libgomp: cuStreamSynchronize error: an illegal instruction was encountered

libgomp: cuMemFree error: an illegal instruction was encountered
^C
Thread 1 "abort-1.exe" received signal SIGINT, Interrupt.
futex_wait (val=-1, addr=0x60ec28)
    at /home/vries/oacc/trunk/source-gcc/libgomp/config/linux/x86/futex.h:44
44        if (__builtin_expect (res == -ENOSYS, 0))
(gdb) bt
#0  futex_wait (val=-1, addr=0x60ec28)
    at /home/vries/oacc/trunk/source-gcc/libgomp/config/linux/x86/futex.h:44
#1  gomp_mutex_lock_slow (mutex=mutex@entry=0x60ec28, oldval=<optimized out>)
    at /home/vries/oacc/trunk/source-gcc/libgomp/config/linux/mutex.c:47
#2  0x00007ffff78baa90 in gomp_mutex_lock (mutex=0x60ec28)
    at /home/vries/oacc/trunk/source-gcc/libgomp/config/linux/mutex.h:57
#3  GOMP_offload_unregister_ver (version=<optimized out>, host_table=<optimized
out>, 
    target_type=<optimized out>, target_data=0x400cc0 <target_data>)
    at /home/vries/oacc/trunk/source-gcc/libgomp/target.c:1397
#4  0x000000000040084f in fini ()
#5  0x00007ffff7de7de7 in _dl_fini () at dl-fini.c:235
#6  0x00007ffff72e9ff8 in __run_exit_handlers (status=status@entry=1, 
    listp=0x7ffff76745f8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#7  0x00007ffff72ea045 in __GI_exit (status=status@entry=1) at exit.c:104
#8  0x00007ffff78a2b93 in gomp_vfatal (fmt=<optimized out>,
list=list@entry=0x7fffffffd668)
    at /home/vries/oacc/trunk/source-gcc/libgomp/error.c:80
#9  0x00007ffff78bcdfc in GOMP_PLUGIN_fatal (msg=msg@entry=0x7ffff6ea9953
"cuMemFree error: %s")
    at /home/vries/oacc/trunk/source-gcc/libgomp/libgomp-plugin.c:78
#10 0x00007ffff6ea63f2 in cuda_map_destroy (map=0xa2b4b0)
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:240
#11 0x00007ffff6ea6996 in map_fini (s=<optimized out>)
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:273
#12 0x00007ffff6ea807c in fini_streams_for_device (ptx_dev=0x60ea40)
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:503
#13 nvptx_close_device (ptx_dev=0x60ea40)
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:828
#14 GOMP_OFFLOAD_fini_device (n=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:1885
#15 0x00007ffff78b73a5 in gomp_target_fini ()
    at /home/vries/oacc/trunk/source-gcc/libgomp/target.c:2704
#16 0x00007ffff72e9ff8 in __run_exit_handlers (status=status@entry=1, 
    listp=0x7ffff76745f8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#17 0x00007ffff72ea045 in __GI_exit (status=status@entry=1) at exit.c:104
#18 0x00007ffff78a2b93 in gomp_vfatal (fmt=<optimized out>,
list=list@entry=0x7fffffffd838)
    at /home/vries/oacc/trunk/source-gcc/libgomp/error.c:80
#19 0x00007ffff78bcdfc in GOMP_PLUGIN_fatal (
    msg=msg@entry=0x7ffff6ea9af1 "cuStreamSynchronize error: %s")
    at /home/vries/oacc/trunk/source-gcc/libgomp/libgomp-plugin.c:78
#20 0x00007ffff6ea6f3c in nvptx_exec (fn=0xa574c0, mapnum=0,
devaddrs=<optimized out>, async=-2, 
    dims=0x7fffffffd9fc, targ_mem_desc=<optimized out>, hostaddrs=<optimized
out>)
    at /home/vries/oacc/trunk/source-gcc/libgomp/plugin/plugin-nvptx.c:1373
#21 0x00007ffff78bd1a9 in GOACC_parallel_keyed (flags_m=<optimized out>,
fn=<optimized out>, 
    mapnum=0, hostaddrs=0x0, sizes=<optimized out>, kinds=<optimized out>)
    at /home/vries/oacc/trunk/source-gcc/libgomp/oacc-parallel.c:249
#22 0x0000000000400784 in main ()
... 
The hang is due to &devicep->lock being locked twice, once in gomp_target_fini,
and once in GOMP_offload_unregister_ver.

Note btw that GOMP_PLUGIN_fatal calls exit, to allow the atexit handlers to
run, and that GOMP_PLUGIN_fatal is called in an exit handler, which calls exit
again. So, we're calling exit from an atexit handler. This is handled robustly
by glibc, so that as such doesn't cause the hang.

The hang is similar to what I described here (
http://gcc.gnu.org/ml/gcc-patches/2017-11/msg01444.html ), and the RFC patch
fixes the hang, as well as the cuMemFree error (though the patch may not be a
precise and minimal fix).


More information about the Gcc-bugs mailing list