On Linux/x86-64 with 96 cores, r11-5227 with -m32 compiles30_threads/latch/3.cc into: 2850299 hjl 20 0 23804 2364 2192 R 99.7 0.0 240:37.77 3.exe (gdb) bt #0 0xf7f9155d in __kernel_vsyscall () #1 0xf7babd7b in syscall () from /lib/libc.so.6 #2 0x08049566 in std::__detail::__platform_wait<int> (__val=<optimized out>, __addr=<optimized out>) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/32/libstdc++-v3/include/bits/atomic_wait.h:99 #3 std::__atomic_wait<int, std::latch::wait() const::{lambda()#1}>(int const*, int, std::latch::wait() const::{lambda()#1}) (__addr=0xff85ea30, __old=1, __pred=...) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/32/libstdc++-v3/include/bits/atomic_wait.h:276 #4 0x08049883 in std::latch::wait (this=0xff85ea30) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/32/libstdc++-v3/include/latch:75 #5 std::latch::arrive_and_wait (__update=1, this=0xff85ea30) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/32/libstdc++-v3/include/latch:82 #6 test01 () at /export/gnu/import/git/gcc-test-master-intel64-native/src-master/libstdc++-v3/testsuite/30_threads/latch/3.cc:43 #7 0x080492ef in main () at /export/gnu/import/git/gcc-test-master-intel64-native/src-master/libstdc++-v3/testsuite/30_threads/latch/3.cc:66 (gdb) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xff85ea30, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable)
Also: FAIL: 29_atomics/atomic_integral/wait_notify.cc
Also 30_threads/semaphore/try_acquire_until.cc.
(gdb) bt #0 0x00007f6b5984330d in syscall () from /lib64/libc.so.6 #1 0x0000000000401429 in std::__detail::__platform_wait<int> ( __addr=__addr@entry=0x7ffc848e7014, __val=__val@entry=1) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/atomic_wait.h:99 #2 0x000000000040150b in std::__atomic_wait<int, std::__atomic_base<int>::wait(int, std::memory_order) const::{lambda()#1}>(int const*, int, std::__atomic_base<int>::wait(int, std::memory_order) const::{lambda()#1}) ( __addr=__addr@entry=0x7ffc848e7014, __old=__old@entry=1, __pred=...) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/atomic_wait.h:276 #3 0x0000000000401788 in std::__atomic_base<int>::wait ( __m=std::memory_order::seq_cst, __old=1, this=0x7ffc848e7014) at /export/gnu/import/git/gcc-test-master-intel64-native/bld/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/atomic_base.h:607 #4 test02 () at /export/gnu/import/git/gcc-test-master-intel64-native/src-master/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc:87 #5 0x00000000004012c2 in main () at /export/gnu/import/git/gcc-test-master-intel64-native/src-master/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc:93 (gdb)
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:a3313a2214a6253672ab4fa37a2dcf57fd0f8dce commit r11-5326-ga3313a2214a6253672ab4fa37a2dcf57fd0f8dce Author: Jonathan Wakely <jwakely@redhat.com> Date: Tue Nov 24 23:22:01 2020 +0000 libstdc++: Disable failing tests [PR 97936] These tests are unstable and causing failures due to timeouts. Disable them until the cause can be found, so that testing doesn't have to wait for them to timeout. libstdc++-v3/ChangeLog: PR libstdc++/97936 PR libstdc++/97944 * testsuite/29_atomics/atomic_integral/wait_notify.cc: Disable. Do not require pthreads, but add -pthread when appropriate. * testsuite/30_threads/jthread/95989.cc: Likewise. * testsuite/30_threads/latch/3.cc: Likewise. * testsuite/30_threads/semaphore/try_acquire_until.cc: Likewise.
(In reply to H.J. Lu from comment #1) > Also: > > FAIL: 29_atomics/atomic_integral/wait_notify.cc This looks like a bug in the test: std::atomic<Tp> a(val1); std::thread t([&] { cv.notify_one(); a.wait(val1); if (a.load() != val2) a = val1; }); std::unique_lock<std::mutex> l(m); cv.wait(l); The new thread might run cv.notify_one() before cv.wait(l) so we get a missed notification and block forever.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:ad9cbcee543ecccd79fa49dafcd925532d2ce210 commit r11-5330-gad9cbcee543ecccd79fa49dafcd925532d2ce210 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Nov 25 10:26:09 2020 +0000 libstdc++: Fix handling of futex wake [PR 97936] The __platform_wait function is supposed to wait until *addr != old. The futex syscall checks the initial value and returns EAGAIN if *addr != old is already true, which should cause __platform_wait to return. Instead it loops and keeps doing a futex wait, which keeps returning EAGAIN. libstdc++-v3/ChangeLog: PR libstdc++/97936 * include/bits/atomic_wait.h (__platform_wait): Return if futex sets EAGAIN. * testsuite/30_threads/latch/3.cc: Re-enable test. * testsuite/30_threads/semaphore/try_acquire_until.cc: Likewise.
This one was failing too: WARNING: 30_threads/semaphore/try_acquire_for.cc execution test program timed out. FAIL: 30_threads/semaphore/try_acquire_for.cc execution test I think that should be fixed by r11-5330.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:a5ccfd04605d940daded7e95474389f1c7dfad61 commit r11-5331-ga5ccfd04605d940daded7e95474389f1c7dfad61 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Nov 25 12:16:07 2020 +0000 libstdc++: Fix silly typos [PR 97936] libstdc++-v3/ChangeLog: PR libstdc++/97936 * include/bits/atomic_wait.h (__platform_wait): Check errno, not just the value of EAGAIN. (__waiters::__waiters()): Fix name of data member.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:f76cad692a62d44ed32d010200bad74f36c73092 commit r11-5383-gf76cad692a62d44ed32d010200bad74f36c73092 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Nov 25 14:39:54 2020 +0000 libstdc++: Fix testsuite helper functions [PR 97936] This fixes a race condition in the util/atomic/wait_notify_util.h header used by several tests, which should make the tests work properly. libstdc++-v3/ChangeLog: PR libstdc++/97936 * testsuite/29_atomics/atomic/wait_notify/bool.cc: Re-eneable test. * testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise. * testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise. * testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise. * testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise. * testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise. * testsuite/util/atomic/wait_notify_util.h: Fix missed notifications by making the new thread wait until the parent thread is waiting on the condition variable.
I hope this is fixed now.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:10522ed1089277e2aa6cd708205aa5c730179cf0 commit r11-5447-g10522ed1089277e2aa6cd708205aa5c730179cf0 Author: Jonathan Wakely <jwakely@redhat.com> Date: Thu Nov 26 12:55:47 2020 +0000 libstdc++: Fix some more deadlocks in tests [PR 97936] The missed notifications fixed in r11-5383 also happen in some other tests which have similar code. libstdc++-v3/ChangeLog: PR libstdc++/97936 * testsuite/29_atomics/atomic/wait_notify/bool.cc: Fix missed notifications by making the new thread wait until the parent thread is waiting on the condition variable. * testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise. * testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise. * testsuite/29_atomics/atomic_ref/wait_notify.cc: Likewise.
This one still fails sometimes on Solaris: FAIL: 30_threads/semaphore/try_acquire_until.cc execution test
Created attachment 49987 [details] a test case that fails in the similar way Running this program with argument 16, spawns 16 threads but it seems some get stuck at the latch despite other threads completing (and latch::wait returning). Program output, under gdb: [New Thread 0x7ffff7a5a700 (LWP 1112060)] [New Thread 0x7ffff7259700 (LWP 1112061)] [New Thread 0x7ffff6a58700 (LWP 1112062)] [New Thread 0x7ffff6257700 (LWP 1112063)] [New Thread 0x7ffff5a56700 (LWP 1112064)] [New Thread 0x7ffff5255700 (LWP 1112065)] [New Thread 0x7ffff4a54700 (LWP 1112066)] [New Thread 0x7ffff4253700 (LWP 1112067)] [New Thread 0x7ffff3a52700 (LWP 1112068)] [New Thread 0x7ffff3251700 (LWP 1112069)] [New Thread 0x7ffff2a50700 (LWP 1112070)] [New Thread 0x7ffff224f700 (LWP 1112071)] [New Thread 0x7ffff1a4e700 (LWP 1112072)] [New Thread 0x7ffff124d700 (LWP 1112073)] [New Thread 0x7ffff0a4c700 (LWP 1112074)] [New Thread 0x7ffff024b700 (LWP 1112075)] All 16 threads started Thread a73f34e0d4ae6a49 started Thread cb3cc899c037915d started Thread ade34b17621c1815 started Thread 7245e72e408faec2 started Thread a7089aace412318c started Thread 75736a54d10fb0e9 started Thread 995f48c2014cc0ec started Thread 5302d144ddde4d68 started Thread b20c0c361ed65c90 started Thread ae85377ab49db7c7 started All threads notified Thread b20c0c361ed65c90 stopped Thread ade34b17621c1815 stopped Thread a73f34e0d4ae6a49 stopped Thread 5302d144ddde4d68 stopped Thread a7089aace412318c stopped Thread ae85377ab49db7c7 stopped Thread 995f48c2014cc0ec stopped Thread 7245e72e408faec2 stopped Thread cb3cc899c037915d stopped Thread 75736a54d10fb0e9 stopped [Thread 0x7ffff4253700 (LWP 1112067) exited] [Thread 0x7ffff3251700 (LWP 1112069) exited] [Thread 0x7ffff3a52700 (LWP 1112068) exited] [Thread 0x7ffff4a54700 (LWP 1112066) exited] [Thread 0x7ffff5255700 (LWP 1112065) exited] [Thread 0x7ffff5a56700 (LWP 1112064) exited] [Thread 0x7ffff6257700 (LWP 1112063) exited] [Thread 0x7ffff6a58700 (LWP 1112062) exited] [Thread 0x7ffff7259700 (LWP 1112061) exited] [Thread 0x7ffff7a5a700 (LWP 1112060) exited] # at this point... the program hangs, but threads are still chewing CPU # Ctrl-C Thread 1 "synchro" received signal SIGINT, Interrupt. __pthread_clockjoin_ex (threadid=140737264289536, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 145 pthread_join_common.c: No such file or directory. (gdb) threads Undefined command: "threads". Try "help". (gdb) info thread Id Target Id Frame * 1 Thread 0x7ffff7a5b740 (LWP 1112056) "synchro" __pthread_clockjoin_ex (threadid=140737264289536, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 12 Thread 0x7ffff2a50700 (LWP 1112070) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 13 Thread 0x7ffff224f700 (LWP 1112071) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 14 Thread 0x7ffff1a4e700 (LWP 1112072) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 15 Thread 0x7ffff124d700 (LWP 1112073) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 16 Thread 0x7ffff0a4c700 (LWP 1112074) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 17 Thread 0x7ffff024b700 (LWP 1112075) "synchro" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 (gdb) thread 13 [Switching to thread 13 (Thread 0x7ffff224f700 (LWP 1112071))] #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory. (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x000055555555a676 in std::__detail::__platform_wait<int> ( __addr=0x5555555a8440 <std::__detail::__waiters::_S_for(void const*)::__w+1152>, __val=0) at /opt/gcc11/include/c++/11.0.0/bits/atomic_wait.h:98 #2 0x000055555555967a in std::__detail::__waiters::_M_do_wait ( this=0x5555555a8440 <std::__detail::__waiters::_S_for(void const*)::__w+1152>, __version=0) at /opt/gcc11/include/c++/11.0.0/bits/atomic_wait.h:150 #3 0x0000555555559748 in std::__detail::__waiter::_M_do_wait (this=0x7ffff224ed30) at /opt/gcc11/include/c++/11.0.0/bits/atomic_wait.h:213 #4 0x000055555555a7c7 in std::__atomic_wait<long, std::latch::wait() const::{lambda()#1}>(long const*, long, std::latch::wait() const::{lambda()#1}) (__addr=0x7fffffffe5b8, __old=4, __pred=...) at /opt/gcc11/include/c++/11.0.0/bits/atomic_wait.h:271 #5 0x000055555555865e in std::latch::wait (this=0x7fffffffe5b8) at /opt/gcc11/include/c++/11.0.0/latch:77 #6 std::latch::arrive_and_wait (__update=1, this=0x7fffffffe5b8) at /opt/gcc11/include/c++/11.0.0/latch:84 #7 operator() (__closure=0x5555555bc1a0, st=...) at /home/florin/work/sandbox2/src/synchro.cpp:64 #8 0x00005555555592d8 in std::__invoke_impl<void, start(int)::<lambda(std::stop_token)>, std::stop_token>(std::__invoke_other, struct {...} &&) (__f=...) at /opt/gcc11/include/c++/11.0.0/bits/invoke.h:60 #9 0x000055555555933d in std::__invoke<start(int)::<lambda(std::stop_token)>, std::stop_token>(struct {...} &&) (__fn=...) at /opt/gcc11/include/c++/11.0.0/bits/invoke.h:95 #10 0x0000555555559461 in std::thread::_Invoker<std::tuple<start(int)::<lambda(std::stop_token)>, std::stop_token> >::_M_invoke<0, 1>(std::_Index_tuple<0, 1>) (this=0x5555555bc198) at /opt/gcc11/include/c++/11.0.0/bits/std_thread.h:253 #11 0x0000555555559288 in std::thread::_Invoker<std::tuple<start(int)::<lambda(std::stop_token)>, std::stop_token> >::operator()(void) ( this=0x5555555bc198) at /opt/gcc11/include/c++/11.0.0/bits/std_thread.h:260 #12 0x000055555555926c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<start(int)::<lambda(std::stop_token)>, std::stop_token> > >::_M_run(void) (this=0x5555555bc190) at /opt/gcc11/include/c++/11.0.0/bits/std_thread.h:211 #13 0x00007ffff7e57824 in std::execute_native_thread_routine (__p=0x5555555bc190) at ../../../../../gcc/libstdc++-v3/src/c++11/thread.cc:82 #14 0x00007ffff7f98ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 #15 0x00007ffff7b5ddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb)
For reference: $ /opt/gcc11/bin/g++-11 -v Using built-in specs. COLLECT_GCC=/opt/gcc11/bin/g++-11 COLLECT_LTO_WRAPPER=/opt/gcc11/libexec/gcc/x86_64-linux-gnu/11.0.0/lto-wrapper Target: x86_64-linux-gnu Configured with: ../gcc/configure --prefix=/opt/gcc11 --with-local-prefix=/opt/gcc11 --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --program-suffix=-11 --with-abi=m64 --with-default-libstdcxx-abi=new --with-linker-hash-style=gnu --with-tune=generic --disable-multilib --disable-nls --disable-vtable-verify --enable-c99 --enable-checking=release --enable-__cxa_atexit --enable-default-pie --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id --enable-long-long --enable-shared --enable-threads=posix Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.0 20210110 (experimental) (GCC)
What's the status on this PR?
I believe it is addressed in the most recent patch I have submitted for the atomic wait/notify, barriers, latches, semaphores functionality.
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
(In reply to Thomas Rodgers from comment #16) > I believe it is addressed in the most recent patch I have submitted for the > atomic wait/notify, barriers, latches, semaphores functionality. Closing as fixed. Please reopen if you're still seeing it.
I believe I still see the issue (these tests randomly fail like this): === libstdc++ tests === Running target unix FAIL: 27_io/filesystem/iterators/error_reporting.cc (test for excess errors) WARNING: program timed out. FAIL: 29_atomics/atomic_float/wait_notify.cc execution test WARNING: program timed out. FAIL: 29_atomics/atomic_integral/wait_notify.cc execution test WARNING: program timed out. FAIL: 29_atomics/atomic_ref/wait_notify.cc execution test WARNING: program timed out. FAIL: 30_threads/latch/3.cc execution test WARNING: program timed out. == This is Solaris with GCC 12.1.0 and 11.3.0.
Solaris uses the non-futex wait/notify path. There has been a recent PR opened indicating a likely algorithmic issue with the non-futex implementation. See - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106183 I am going to re-open this issue while I investigate the new report.
Reclosing, the problem with the non-futex path was dealt with as PR 106183