Created attachment 27947 [details] a possible patch to solve the problem The c11 standard (draft 3337, paragraph 30.5.1.5) states that condition_variable may be destructed even if not all wait() calls have returned, so long as all of those calls are blocking on the associated lock rather than on *this. However, in gcc 4.7.1 the following snippets sometimes fail with a segfault in thread A or simply inconsistent results: condition_variable *volatile cond = new condition_variable; // in thread A cond->wait(); // in thread B, later cond->notify_all(); delete cond; It appears that the underlying libpthread also allows the destruction of cond immediately after all blocking threads have been notified, by means of a block in __pthread_cond_destroy while all of the wait() calls wake and begin to reacquire their locks. However, the current implementation in condition_variable.cc uses a default destructor: condition_variable::~condition_variable() noexcept = default; rather than calling __gthread_cond_destroy, and therefore the pthread cond object is deleted without the block provided by __pthread_cond_destroy. The attached patch seems to fix the problem with gcc-4.7.1 (verified on my system, x86_64-pc-linux-gnu). The patch undoes the replacement (in the presence of __GTHREAD_COND_INIT) of ~condition_variable with "= default" from http://gcc.gnu.org/viewcvs?view=revision&revision=180411, which seems to have been inadvertent.
please provide a proper testcase, not snippets of code
Created attachment 27952 [details] testcase
Because the problem is in thread-handling code, it's impossible to create a completely deterministic testcase; however, the newly-attached testcase.cc (compiled with 'g++-4.7.1 --std=c++11 -pthread') consistently behaves as described for me. (Adjusting the NUM_THREADS seems to help reproduce the problem on different machines.) The testcase fails with a segfault in pthread_cond_wait using the unpatched libstdc++ and completes normally with the patched lib.
Thanks very much. For this sort of test it doesn't need to fail every time, the GCC testsuite is run often enough on a wide enough variety of systems that even if the test only fails occasionally it will be good enough to catch regressions.
Can you use sleep() to induce the failure reliably?
I don't have this_thread::sleep_for in my build, but presumably it wouldn't help as the segfault comes inside the pthread_cond_wait function. The testcase as written already ensures that NUM_THREADS-1 threads are actually waiting at cond->wait when notify_all is called (thanks to the lock); the problem seems to be a race between the final thread doing 'delete cc' and the other threads clearing the part of pthread_cond_wait which assumes the condition variable's address still points to a valid object. To be clear, this isn't a bug with pthread, just the way which libstdc++ uses it. If/when ~condition_variable (via delete) calls gthread_cond_destroy, libpthread properly avoids the race.
I need a ChangeLog entry before I can commit the change, which needs your name, could you provide a ChangeLog entry? Thanks. Unless I'm mistaken the volatile qualifiers in the testcase don't serve any purpose, so I've removed them from the patch I plan to commit. I have seen it fail, but can't make it fail now, but I think it's useful to commit anyway.
Created attachment 28005 [details] proposed changelog I wasn't sure about the testcase file name, so I just guessed.
Perfect - thanks. I'll get it committed tonight.
Author: redi Date: Mon Aug 13 19:56:50 2012 New Revision: 190356 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190356 Log: 2012-08-13 David Adler <d.adler.s@gmail.com> PR libstdc++/54185 * src/c++11/condition_variable.cc (condition_variable): Always destroy native type in destructor. * testsuite/30_threads/condition_variable/54185.cc: New. Added: trunk/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/src/c++11/condition_variable.cc
Author: redi Date: Mon Aug 13 19:57:31 2012 New Revision: 190357 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190357 Log: 2012-08-13 David Adler <d.adler.s@gmail.com> PR libstdc++/54185 * src/c++11/condition_variable.cc (condition_variable): Always destroy native type in destructor. * testsuite/30_threads/condition_variable/54185.cc: New. Added: branches/gcc-4_7-branch/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc Modified: branches/gcc-4_7-branch/libstdc++-v3/ChangeLog branches/gcc-4_7-branch/libstdc++-v3/src/c++11/condition_variable.cc
fixed for 4.7.2
Created attachment 43228 [details] Buggy sampler output from inspecting 54185.exe This testcase was running for like 3 hours in my last run of the testsuite; I had to kill it manually to get the rest of it to complete. It also made my system's process sampler glitch when I tried to inspect it; I've attached the output from it.
Created attachment 49061 [details] Fix for random pthread_create() failures Hello- I have problems with the test case for this bug (libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc) failing randomly. It happens more often with parallel make, and especially if running the testsuite as a whole more than once in parallel. On my system at least (x86-64, Ubuntu 18), it seems that pthread_create() will return EAGAIN sometimes in any loop like the one this testcase uses: for(int i = 0; i != 1000; ++i) {pthread_create(...); pthread_join(...);} If such a loop is running in 3 or more processes in parallel it happens more than 50% of the time. I am not sure what global resource gets used up, it happens even if ulimit -a shows nothing being limited like processes or memory. The attached patch fixes the failures for me; if thread creation fails, it just gives up and moves on to the next iteration instead. I had to convert the cond->wait() to the predicate form since it becomes possible for the notify to take place prior to all threads calling wait. I can submit this to gcc-patches if it makes sense to you? Thanks... -Lewis
(In reply to Lewis Hyatt from comment #14) > I have problems with the test case for this bug > (libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc) failing > randomly. It fails reliably on AIX too. > I can submit this to gcc-patches if it makes sense to you? Thanks... Yes please, CCing the libstdc++ list.
(In reply to Jonathan Wakely) > It fails reliably on AIX too. I think that is because of non-conformant behavior of AIX's pthread_mutex_destroy. It will return EBUSY as long as there are still threads executing a wait function, even if they're already blocked on re-acquiring the mutex. If you ignore the error and simply releases/unmap the memory of the CV, you have a good chance of getting a SIGSEGV in the wait function later (after it has re-acquired the mutex and presumably proceeds to mark the CV "non-busy"). In our tests, we get the EBUSY every time if we delete the CV while holding the mutex on the same thread. The SIGSEGV also reproduces quite well, although for some reason only with 32 bit builds. (AIX 7.1, oslevel 7100-05-03-1837)
Yes, I've raised this with the AIX pthreads team, but they disagreed with my interpretation of POSIX. They said my reduced example was incorrect because pthread_cond_destroy() and pthread_mutex_destroy() can return EBUSY, and so libstdc++ is buggy for not handling that case. Quoting from my replies to them ... -- >8 -- I disagree with that interpretation. EBUSY is allowed if threads are still waiting, but my test wakes them up before doing pthread_cond_destroy. See the (admittedly informative) EXAMPLES section at https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_destroy.html which says: "A condition variable can be destroyed immediately after all the threads that are blocked on it are awakened." The example doesn't bother to check the return code, because it is correct by construction (the destroy follows a broadcast) and so can't fail. In any case, that error code isn't allowed in the latest version of POSIX. The changes for Issue 7 say: "The [EBUSY] error for a condition variable already in use or an already initialized condition variable is removed; this condition results in undefined behavior." This confirms that EBUSY is not an acceptable result for correct uses of pthread_cond_destroy (i.e. ones where there are no waiters, as in my example). -- >8 -- I'm not sure if I convinced them.