Bug 54185 - [4.7/4.8 Regression] condition_variable not properly destructed
Summary: [4.7/4.8 Regression] condition_variable not properly destructed
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.7.1
: P3 normal
Target Milestone: 4.7.2
Assignee: Jonathan Wakely
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-06 04:19 UTC by David Adler
Modified: 2022-09-21 20:52 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-08-06 00:00:00


Attachments
a possible patch to solve the problem (348 bytes, application/octet-stream)
2012-08-06 04:19 UTC, David Adler
Details
testcase (373 bytes, text/plain)
2012-08-06 14:19 UTC, David Adler
Details
proposed changelog (169 bytes, text/plain)
2012-08-13 14:09 UTC, David Adler
Details
Buggy sampler output from inspecting 54185.exe (1.67 KB, application/x-bzip2)
2018-01-23 23:58 UTC, Eric Gallager
Details
Fix for random pthread_create() failures (827 bytes, patch)
2020-08-13 21:37 UTC, Lewis Hyatt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Adler 2012-08-06 04:19:16 UTC
Created attachment 27947 [details]
a possible patch to solve the problem

The c11 standard (draft 3337, paragraph 30.5.1.5) states that condition_variable may be destructed even if not all wait() calls have returned, so long as all of those calls are blocking on the associated lock rather than on *this. However, in gcc 4.7.1 the following snippets sometimes fail with a segfault in thread A or simply inconsistent results:
  condition_variable *volatile cond = new condition_variable;
  // in thread A
    cond->wait();
  // in thread B, later
    cond->notify_all();
    delete cond;

It appears that the underlying libpthread also allows the destruction of cond immediately after all blocking threads have been notified, by means of a block in __pthread_cond_destroy while all of the wait() calls wake and begin to reacquire their locks.

However, the current implementation in condition_variable.cc uses a default destructor:
  condition_variable::~condition_variable() noexcept = default;
rather than calling __gthread_cond_destroy, and therefore the pthread cond object is deleted without the block provided by __pthread_cond_destroy.

The attached patch seems to fix the problem with gcc-4.7.1 (verified on my system, x86_64-pc-linux-gnu).

The patch undoes the replacement (in the presence of __GTHREAD_COND_INIT) of ~condition_variable with "= default" from http://gcc.gnu.org/viewcvs?view=revision&revision=180411, which seems to have been inadvertent.
Comment 1 Jonathan Wakely 2012-08-06 07:41:14 UTC
please provide a proper testcase, not snippets of code
Comment 2 David Adler 2012-08-06 14:19:16 UTC
Created attachment 27952 [details]
testcase
Comment 3 David Adler 2012-08-06 14:20:03 UTC
Because the problem is in thread-handling code, it's impossible to create a completely deterministic testcase; however, the newly-attached testcase.cc (compiled with 'g++-4.7.1 --std=c++11 -pthread') consistently behaves as described for me. (Adjusting the NUM_THREADS seems to help reproduce the problem on different machines.)

The testcase fails with a segfault in pthread_cond_wait using the unpatched libstdc++ and completes normally with the patched lib.
Comment 4 Jonathan Wakely 2012-08-06 14:30:31 UTC
Thanks very much. For this sort of test it doesn't need to fail every time, the GCC testsuite is run often enough on a wide enough variety of systems that even if the test only fails occasionally it will be good enough to catch regressions.
Comment 5 Alexander Monakov 2012-08-06 15:29:28 UTC
Can you use sleep() to induce the failure reliably?
Comment 6 David Adler 2012-08-06 17:18:26 UTC
I don't have this_thread::sleep_for in my build, but presumably it wouldn't help as the segfault comes inside the pthread_cond_wait function. The testcase as written already ensures that NUM_THREADS-1 threads are actually waiting at cond->wait when notify_all is called (thanks to the lock); the problem seems to be a race between the final thread doing 'delete cc' and the other threads clearing the part of pthread_cond_wait which assumes the condition variable's address still points to a valid object.

To be clear, this isn't a bug with pthread, just the way which libstdc++ uses it. If/when ~condition_variable (via delete) calls gthread_cond_destroy, libpthread properly avoids the race.
Comment 7 Jonathan Wakely 2012-08-12 16:22:28 UTC
I need a ChangeLog entry before I can commit the change, which needs your name, could you provide a ChangeLog entry?  Thanks.

Unless I'm mistaken the volatile qualifiers in the testcase don't serve any purpose, so I've removed them from the patch I plan to commit.  I have seen it fail, but can't make it fail now, but I think it's useful to commit anyway.
Comment 8 David Adler 2012-08-13 14:09:16 UTC
Created attachment 28005 [details]
proposed changelog

I wasn't sure about the testcase file name, so I just guessed.
Comment 9 Jonathan Wakely 2012-08-13 14:35:21 UTC
Perfect - thanks. I'll get it committed tonight.
Comment 10 Jonathan Wakely 2012-08-13 19:56:55 UTC
Author: redi
Date: Mon Aug 13 19:56:50 2012
New Revision: 190356

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190356
Log:
2012-08-13  David Adler  <d.adler.s@gmail.com>

	PR libstdc++/54185
	* src/c++11/condition_variable.cc (condition_variable): Always
	destroy native type in destructor.
	* testsuite/30_threads/condition_variable/54185.cc: New.

Added:
    trunk/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/src/c++11/condition_variable.cc
Comment 11 Jonathan Wakely 2012-08-13 19:57:36 UTC
Author: redi
Date: Mon Aug 13 19:57:31 2012
New Revision: 190357

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190357
Log:
2012-08-13  David Adler  <d.adler.s@gmail.com>

	PR libstdc++/54185
	* src/c++11/condition_variable.cc (condition_variable): Always
	destroy native type in destructor.
	* testsuite/30_threads/condition_variable/54185.cc: New.

Added:
    branches/gcc-4_7-branch/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
Modified:
    branches/gcc-4_7-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_7-branch/libstdc++-v3/src/c++11/condition_variable.cc
Comment 12 Jonathan Wakely 2012-08-13 20:00:40 UTC
fixed for 4.7.2
Comment 13 Eric Gallager 2018-01-23 23:58:03 UTC
Created attachment 43228 [details]
Buggy sampler output from inspecting 54185.exe

This testcase was running for like 3 hours in my last run of the testsuite; I had to kill it manually to get the rest of it to complete. It also made my system's process sampler glitch when I tried to inspect it; I've attached the output from it.
Comment 14 Lewis Hyatt 2020-08-13 21:37:16 UTC
Created attachment 49061 [details]
Fix for random pthread_create() failures

Hello-

I have problems with the test case for this bug (libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc) failing randomly. It happens more often with parallel make, and especially if running the testsuite as a whole more than once in parallel. On my system at least (x86-64, Ubuntu 18), it seems that pthread_create() will return EAGAIN sometimes in any loop like the one this testcase uses:

for(int i = 0; i != 1000; ++i) {pthread_create(...); pthread_join(...);}

If such a loop is running in 3 or more processes in parallel it happens more than 50% of the time. I am not sure what global resource gets used up, it happens even if ulimit -a shows nothing being limited like processes or memory.

The attached patch fixes the failures for me; if thread creation fails, it just gives up and moves on to the next iteration instead. I had to convert the cond->wait() to the predicate form since it becomes possible for the notify to take place prior to all threads calling wait.

I can submit this to gcc-patches if it makes sense to you? Thanks...

-Lewis
Comment 15 Jonathan Wakely 2020-08-13 21:55:04 UTC
(In reply to Lewis Hyatt from comment #14)
> I have problems with the test case for this bug
> (libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc) failing
> randomly. 

It fails reliably on AIX too.

> I can submit this to gcc-patches if it makes sense to you? Thanks...

Yes please, CCing the libstdc++ list.
Comment 16 Paul Groke 2022-09-21 18:30:06 UTC
(In reply to  Jonathan Wakely)

> It fails reliably on AIX too.

I think that is because of non-conformant behavior of AIX's pthread_mutex_destroy. It will return EBUSY as long as there are still threads executing a wait function, even if they're already blocked on re-acquiring the mutex. If you ignore the error and simply releases/unmap the memory of the CV, you have a good chance of getting a SIGSEGV in the wait function later (after it has re-acquired the mutex and presumably proceeds to mark the CV "non-busy").

In our tests, we get the EBUSY every time if we delete the CV while holding the mutex on the same thread. The SIGSEGV also reproduces quite well, although for some reason only with 32 bit builds. (AIX 7.1, oslevel 7100-05-03-1837)
Comment 17 Jonathan Wakely 2022-09-21 20:52:50 UTC
Yes, I've raised this with the AIX pthreads team, but they disagreed with my interpretation of POSIX. They said my reduced example was incorrect because pthread_cond_destroy() and pthread_mutex_destroy() can return EBUSY, and so libstdc++ is buggy for not handling that case.

Quoting from my replies to them ...

-- >8 --
I disagree with that interpretation. EBUSY is allowed if threads are
still waiting, but my test wakes them up before doing
pthread_cond_destroy. See the (admittedly informative) EXAMPLES
section at https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_destroy.html which says:

"A condition variable can be destroyed immediately after all the
threads that are blocked on it are awakened."

The example doesn't bother to check the return code, because it is
correct by construction (the destroy follows a broadcast) and so can't
fail.

In any case, that error code isn't allowed in the latest version of
POSIX.

The changes for Issue 7 say:
"The [EBUSY] error for a condition variable already in use or an
already initialized condition variable is removed; this condition
results in undefined behavior."

This confirms that EBUSY is not an acceptable result for correct uses
of pthread_cond_destroy (i.e. ones where there are no waiters, as in
my example).
-- >8 --

I'm not sure if I convinced them.