The '-fno-threadsafe-statics' is purely a pessimization for POSIX pthreads code. Since the POSIX standard already requires a mutex around operations that modify data, and the C++ standard specifies lazy initialization, POSIX code will already have mutexes to make static initialization thread safe. The code is, therefore, already thread safe and cannot be made more so. This option does allow non-compliant code to work by relaxing some of the requirements in the POSIX standard. That's a good option to have, but it should not be on by default. Code should, by default, get the benefit of the optimizations the standard allows. In summary, an option that is a pure pessimization for compliant code, and is only required to make non-compliant code work, should not be on by default when that standard is invoked. This is the case for other options in this category (writable strings, relaxed aliasing, and so on). So, '-pthreads' should imply '-fno-threadsafe-statics', and an option like '-fthreadsafe-statics' should be created to request this option.
Note -fthreadsafe-statics, never made it into the FSF's 3.4.x, only 4.0.0. Also note this is required by the C++ ABI.
Actually this code was added because well it is hard to deal with mutexies in these cases.
Do we agree that this is a pure pessimization for POSIX-compliant code? Do we agree that POSIX already requires a mutex to protect code that might modify an object?
No, see the following code: class A { A(); int t; }; void f() { static A a; } So how can someone know that A will modify memory? In fact the mutex is made sure so that a is only initialized once.
In your example: class A { A(); int t; }; void f() { static A a; } I don't get it. What's the problem with this? Obviously, if you plan to call 'f()' from multiple threads, you must do it while holding a mutex because it might modify 'a'. This is just like any other function that modifies (or might modify) data.
You say, "how can someone know that A will modify memory"? The answer is, the C++ standard says so, section 7.1.2. They simply read that section of the standard, and they know that the function might modify memory. If they know for sure that it won't in a particular case, they are safe. If they don't know, and it might modify memory, then POSIX requires them to put in a mutex. You cannot create code that works with this option and doesn't work without it except by violating the POSIX standard. So POSIX code should not have this option enabled by default -- it's a pure pessimization.
No, the point is if you expose an API to the user, why should the user of the API know that you modify the memory. Let the compiler do the work for you instead of doing the work in this case.
> You cannot create code that works with this option and doesn't work without it > except by violating the POSIX standard. So POSIX code should not have this > option enabled by default -- it's a pure pessimization. POSIX doesn't say anything about C++, and C++ doesn't say anything about threads - we are already outside the scopes of these standards. A reasonable extension of C++ to multithreading should give static locals the semantics analogous to pthread_once. This allows to write code which uses lazy initialization, is MT-safe, yet doesn't use a threading library explicitly - is portable to C++ without threads. Note that this doesn't break any correct program which doesn't make use of this.
There is certainly the eternal argument whether a class should implement its own locks internally or whether the caller should implement them. The first case simplifies calling at the expense of overhead when you don't need it. The second makes callers have more knowledge of how the functions they're calling work, but make possible optimizations in cases where the locks aren't needed, and only the caller knows this. POSIX made this decision. The POSIX memory visibility rules specifically require a mutex to be held in cases where an object may be modified by another concurrent thread. So POSIX code *must* already contain such mutexes. This requirement in POSIX is as clear as anything can be. The rationale is that on some platforms, locks may be very expensive, so requiring them implicitly was carefully avoided. POSIX requires shared data to be explicitly locked by application code when data might be modified in one thread and read in another concurrently. While the C++ standard doesn't say anything about threads, it says a lot about how static objects are initialized. It specifies initialization on first use, thus first use is a modification. Use that might be first use might be a modification. There is no other sensible reading of the POSIX standard. There is no other sensible reading of the C++ standard. The arguments presented are disingenous. They would equally well defend the decision to initialize all static objects before calling 'main' in multithreaded code. After all, C++ doesn't say anything about multithreaded code and POSIX doesn't say anything about initializating C++ static objects. In fact, the arguments would defend crashing on any multithreaded C++ code, which is obviously not what anyone wants. If there is going to be C++/POSIX code, the C++ standard and the POSIX standard will have to be made to coexist. POSIX requires the application to figure out when objects may be accessed concurrently, and does so for good reason. In a non-POSIX context, it may make sense to have the compiler try to automatically insert mutexes, but it definitely doesn't in the POSIX case. The compiler should, by default, make the optimizations that the standards permit. This is the way every other similar option has been implemented.
> There is certainly the eternal argument whether a class should implement its own > locks internally or whether the caller should implement them. And my guideline is as follows: it should implement its own locks if it provides a single global object, and it should leave locking to the callers if it provides objects created dynamically. The reasoning is that dynamically created objects are often used locally in one thread, in which case locking would be unnecessary, while a singleton is always accessible to all threads. > POSIX made this decision. The POSIX memory visibility rules specifically require > a mutex to be held in cases where an object may be modified by another > concurrent thread. So POSIX code *must* already contain such mutexes. POSIX gave mechanisms for synchronization. In particular pthread_once can be used to guard initialization, such that the object doesn't need an additional mutex if it's not modified later. pthread_once provides the semantics "initialize before first use (in a MT-safe way)". Static locals in C++ are an equivalent to pthread_once in C/POSIX.
>The reasoning is that dynamically created objects are often used locally in one >thread, in which case locking would be unnecessary, while a singleton is always >accessible to all threads. Accessible, but not necessarily accessed. In fact, there is no way you can know whether this synchronization overhead is necessary, and for POSIX it's only needed if the code violates the memory visibility rules. >Static locals in C++ are an equivalent to pthread_once in C/POSIX. Even in the single-threaded case, C++ leaves it undefined what happens if you reenter a function that invokes a static initializer from that static initializer. To argue that this means it should be defined for the multi-threaded case is absurd. C++ requires the initialization to be complete before you are allowed to pass the intializer again. POSIX requires locks when data that might be shared might be modified. Bluntly, this is totally opposite to the entire philosophy of the POSIX standard and the wording of the C++ standard.
> >Static locals in C++ are an equivalent to pthread_once in C/POSIX. > > Even in the single-threaded case, C++ leaves it undefined what happens if you > reenter a function that invokes a static initializer from that static > initializer. To argue that this means it should be defined for the > multi-threaded case is absurd. There is a fundamental difference between the second access made from the same or from a different thread. Access from the same thread means that there is a contradiction in the program: the data depends on itself. This is always a bug. The implementation may be helpful and detect this (gcc4 throws exception recursive_init), but in general undefined behavior is consistent with the philosophy of C/C++. The same applies to recursive pthread_once and to a recursive mutex (except that with mutexes you have an option to allow recursive locking). Access from a different thread most often means that the timing was unfortunate and two threads happened to try to perform initialization on first use almost simultaneously, closer in time than the time needed for the initialization. While there is a remote possibility that the threads were not independent, the second thread was a worker spawned by the first one, the first one will wait for the second one to finish its work, the second one will wait for the first one to complete the initialization, and we will have a deadlock - almost always this is not the case, this is just unfortunate timing, and the first thread will complete initialization while the second one is waiting. It makes no sense to optimize for the case of a bug which causes undefined behavior, so the language assumes that there is no bug and arranges the program to run according to the obvious intended semantics of independent threads simultaneously accessing a lazily initialized variable: the second thread waits for the first to complete the initialization, as if it got there some time later. The same applies to pthread_once and to a mutex. Obviously the C++ standard sees only the first scenario, a single thread, so it just declares reentering a static local initializer invalid. But there is no conceptual problem with "reentering" the initializer from a different thread - it just have to wait until the first one finishes. A hypothetical multithreaded C++ should state this as the semantics of static locals. Look at the semantics of Glasgow Haskell which combines lazy evaluation with threads. When the given lazy variable is reentered, if this is the same thread which started computing its value, we have a cyclic dependency and the runtime detects this as an error; and if this is a different thread, the second thread waits for the first one to complete computing the value, because almost always this is just an unfortunate timing and this is the obvious way to resolve it.
This comment is relevant regarding the C++ ABI closure on which this idea has been derived: http://groups.google.it/groups?q=g:thl3050439784d&dq=&hl=it&lr=&client=firefox-a&rls=org.mozilla:it-IT:official_s&selm=95e4efda.0502191504.315f1f3%40posting.google.com That thing must be reopened and fixed asap. Also, please read the thread carefully. Locking arbitrarily any section of code, for any reason, outside the control of the user is just wrong, because: 1) it prevents the code to use more efficient schemes if needed. 2) it prevents the code to forfait any locking if not needed. 3) it may even force different approaches and encapsulations into the final code, which is not what C++ is meant. This holds even if the code is inside a static initializer. Bests, Giancarlo Niccolai.
What you keep ignoring is that the POSIX standard explicitly declares it a bug to access data in one thread while it may be modified in another. It's not "unfortunate timing", it's failure to use the proper code to ensure correct timing. You say a hypothetical multithreaded C++ should state this as the semantics for static locals, and I don't disagree with you, provided we are not talking about one based on POSIX threads. POSIX specifically made the design decision to always require explicit locks and to always require the programmer to find, document, and lock cases where concurrent accesses might occur.
> > You say a hypothetical multithreaded C++ should state this as the semantics for > static locals, and I don't disagree with you, provided we are not talking about > one based on POSIX threads. POSIX specifically made the design decision to > always require explicit locks and to always require the programmer to find, > document, and lock cases where concurrent accesses might occur. I think I wrote a rationale that explains and justifies this choice at http://www.niccolai.ws/works/articoli/art-multithreading-en-1a.html I had not been able to find an official rationale from open group; the nearest thing seems to be: http://www.unix.org/whitepapers/reentrant.html If more precise and authoritative ones are available, please point them out.
The '-pthreads' flag should imply '-fno-threadsafe-statics'. For every other similar flag I can find, the default is to permit the compiler to make the optimizations that standard allows and specific flags are needed to disable the optimization.
> The '-pthreads' flag should imply '-fno-threadsafe-statics'. For every other > similar flag I can find, the default is to permit the compiler to make the > optimizations that standard allows and specific flags are needed to disable the > optimization. It's not the question of optimization but correctness. Having an "initialize on first use" semantics which breaks when multiple threads use it is as wrong as using global variables for passing data between functions and forcing clients to introduce locks to make them reentrant. It is usable when clients are careful, but it's a bad default when we can't expect all calls to be done from a single thread.
I think this is a waste really for this bug to be open as non of the GCC people commented on it and when the orginal bug was filed, there was a huge opportunity to talk over this and nothing was done. So I am going to close as will not fix.
> It's not the question of optimization but correctness. Exactly, and not locking objects that may be modified from another thread is not correct. Only the programmer knows whether an object may be modified from another thread. > Having an "initialize on first use" semantics which breaks when > multiple threads use it is as wrong as using global variables > for passing data between functions and forcing clients to > introduce locks to make them reentrant. It doesn't break when multiple threads use it, it breaks when multiple threads use it in a way where one thread might modify an object while another thread access it without locking. POSIX specifies that everything breaks when misused this way. You seem to have an attitude that it's difficult to track data dependencies and ensure correct locking. You talk about "unfortunate timing". What you're missing is that this is what people who write multithreaded programs do all day long. Those who are good at writing multithreaded programs are good at doing exactly this. And this change does not remove any of the burden -- it simply removes a choice (one can trivially add these locks when one knows they're needed or doesn't know they're not needed). > It is usable when > clients are careful, but it's a bad default when we can't > expect all calls to be done from a single thread. Any function that might be called from multiple threads concurrently will require special coding to deal with this situation. You would have to go out of your way to create a situation where just protecting the initialization is sufficient. You have to be careful when you write multithreaded code, period. Functions may require you to hold certain locks when you call them, prohibit you from holding other locks, or require the callers to impose synchronization. This is not some obscure detail unique to COFU, it's the bread and butter of multithreaded programming. POSIX puts the responsibility fully on the application programmer to place locks where there might be concurrent access. In exchange for this effort, the programmer gets the performance benefit of there not being locks when they are not needed. Some hypothetical multithreaded C++ standard might choose a different route. But it would not be POSIX. Again, it is this simple: POSIX prohibits an object from being modified in one thread and accessed in another. C++ specifies initialization on first use. Initialization is modification. Thus first use requires a lock if there can be a concurrent access. The language cannot tell when there can and cannot be a concurrent access, the language cannot tell when a call cannot possibly be first use. I could perhaps ignore all of this if the behavior discussed made a reasonably significant category of code "just work" or removed some of the work involved in getting synchronized code correct, but it does not. In the vast majority of cases, the static object may be modified and locks will be needed anyway. In many cases, it will be known that the static object has already been intialized and will not be modified, and the locks added will be wasted.
(In reply to comment #18) > I think this is a waste really for this bug to be open as non of the GCC people commented on it and > when the orginal bug was filed, there was a huge opportunity to talk over this and nothing was done. > > So I am going to close as will not fix. Is there any of the GCC people in the POSIX committee working at the PTHREAD sub-standard? Because David Butenhof (member of the above and writer of the book "Programming with POSIX® Threads") has strongly argued against this feature: <cite> Obviously, (excessive) locking does NOT allow for simultaneous execution. A lot of people are used to write old good ST code and "add some locks to make it thread-safe" afterwards. The realworld result is always frustrating: the application is terribly slow and doesn't scale. Sure, we need some synchronization primitives (such as mutexes) to implement inter-thread communication. As a rule, thoroughly designed MT application is a set of fully independent threads which communicate via the message queues. The synchronization primitives are used inside these queues and virtually no locks are supposed outside the queues. But what we see in real life is herds of mutexes spread across the code. "MT-aware classes" try to lock almost every method... And even more, some "industrial-strength" PLs have the built-in mutex per every object! IMHO there is no excuse for this madness. </cite> I have found no other competent opinion arguing the other way around (that is, the thing that you want to do); every competent opinion is in this direction. The ABI closure is actually based on a non-standard solution, and on a "lack of interest" (and presumabily competence) in this field. Also, a bug cannot be considered closed "because no-one protested yet". BTW; I am going to see Stallman today and speak with him about this fact. Is in Milan today. Bests, Giancarlo Niccolai
(In reply to comment #20) > Sorry, the correct citation is: <cite> "multithreading is defined by an application design that ALLOWS FOR concurrent or simultaneous execution" </cite> The rest is cited from Sergey P. Derevyago COMMENTING this sentence. (I missed the "" in the first reading of his message).
> "multithreading is defined by an application design that ALLOWS FOR concurrent > or simultaneous execution" Initializers of static locals cannot execute concurrently, no matter whether they are automatically locked or not. The only thing which would change when you remove the automatically inserted locking is that some programs which used to work are now broken, and that some other programs which used to deadlock now invoke undefined behavior from entering an initializer recursively.
>The only thing which would change when you remove the automatically >inserted locking is that some programs which used to work are now >broken, and that some other programs which used to deadlock now >invoke undefined behavior from entering an initializer recursively. First, if we're talking about pthreads programs, which is the only case I'm suggesting removing the locking for, then those programs are already broken. The pthreads standard requires locks when data may be changed in one thread and accessed in another. Entering an initializer recursively has always been undefined behavior. Any code that might do that is broken. In any event, the entire thrust of this argument is bogus. If GCC/G++ are going to have non-portable features that make code work when they're enabled and break when they're disabled, they definitely should not be on by default. That they hide bugs in code that claims standards compliance on one platform, but allow them to fail on another, is an argument *against* the option, not for it. (Or are you seriously arguing that the C++ standard and the POSIX standard *require* this behavior?)
> First, if we're talking about pthreads programs, which is the only case I'm > suggesting removing the locking for, then those programs are already broken. They are non-portable no matter how static initializers are done: C++ doesn't include threads and POSIX doesn't include C++. > If GCC/G++ are going to have non-portable features that make code work > when they're enabled and break when they're disabled, they definitely > should not be on by default. Taking portability aside (as they are already non-portable), this is a wonderful quote when taken out of context. Yeah, if an option makes more code working and its negation makes more code break, let's make the breaking variant the default :-) > (Or are you seriously arguing that the C++ standard and the > POSIX standard *require* this behavior?) Of course not. Not yet anyhow. For me static locals in C++ are the equivalent of pthread_once in C/POSIX. A hypothetical C++/POSIX should make them MT-safe.
> They are non-portable no matter how static initializers are done: > C++ doesn't include threads and POSIX doesn't include C++. That's a bogus argument. There are no conflicts between the two standards. > Taking portability aside (as they are already non-portable), this is > a wonderful quote when taken out of context. Yeah, if an option makes > more code working and its negation makes more code break, let's make > the breaking variant the default :-) In any context, I stand by this quote. Code that does not conform to the standards it requests during compilation should break. Compatability options that make code work are fine, they just should not be on by default, especially when they have a performance cost. Broken code should break. > For me static locals in C++ are the equivalent of pthread_once in C/POSIX. > A hypothetical C++/POSIX should make them MT-safe. They are MT-safe, provided you lock shared data that may be accessed concurrently. POSIX never permits you to modify shared data in one thread when it may be accessed by another without a lock. By your definition of MT-safe, almost no function is MT-safe. Is 'strchr' MT-safe if you call it on a string while another function might modify that string? No, so lets put automatic locks around 'strchr' and any function that might modify a string. A function is MT-safe if it does the same thing in an MT context that it does in a UT context, provided the caller uses locks to prevent access to data in one thread while another thread might access it. Static initialization is already MT safe and these locks do not make it more so.
> By your definition of MT-safe, almost no function is MT-safe. Is 'strchr' > MT-safe if you call it on a string while another function might modify that > string? No, so lets put automatic locks around 'strchr' and any function that > might modify a string. The difference is that almost all uses of strchr are done on data which is not shared between threads, while many static local initializers are used on objects which are accessed from multiple threads (when multiple threads are used at all). By your reasoning malloc should be unsafe to use from multiple threads if the user did not explicitly put a lock around it, because it modifies shared data (the heap). And errno should not be automatically made thread-local, because it hurts performance in case thread-locality is not needed. After all, "POSIX puts the responsibility fully on the application programmer to place locks where there might be concurrent access. In exchange for this effort, the programmer gets the performance benefit of there not being locks when they are not needed". Right?
> The difference is that almost all uses of strchr are done on data > which is not shared between threads, while many static local > initializers are used on objects which are accessed from multiple > threads (when multiple threads are used at all). In both cases, some are, some aren't. Some uses of 'strchr' are on data that's used by multiple threads. And some object intialized statically are in functions that are only called by a single thread. The cases are not as different as you seem to think they are. > By your reasoning malloc should be unsafe to use from multiple threads > if the user did not explicitly put a lock around it, because it modifies > shared data (the heap). The difference is that 'malloc' is a function supplied by the system and the user, in principle, has no knowledge of its internals. In principle, the user has no idea that there is a "heap" or that it is a shared structure. To support your argument, you would have to argue that if the user tried to implement his own memory allocation function, the language should somehow detect that it manipulates a shared structure (like the heap) and apply locks around that structure. > And errno should not be automatically made > thread-local, because it hurts performance in case thread-locality is not > needed. Again, to make your argument work, you would have to argue that the system should detect that new user-written code accesses some static variable akin to 'errno' and automatically change that variable to be thread local. > After all, "POSIX puts the responsibility fully on the application > programmer to place locks where there might be concurrent access. In > exchange for this effort, the programmer gets the performance benefit > of there not being locks when they are not needed". > Right? Right. User code should not get automatic locks put around it. Locks are needed where the programmer couldn't possibly know that shared data is manipulated, and the locks should be coded in the function, not magically added by the language. None of your cases even remotely support the argument that the implementation should detect possible cases where data is modified in one thread while it's used in another thread and automatically insert locks to make this access safe. (In the context of POSIX threads.)