20099 – -pthreads should imply -fno-threadsafe-statics

Bug 20099 - -pthreads should imply -fno-threadsafe-statics

Summary: -pthreads should imply -fno-threadsafe-statics

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	4.0.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-02-20 00:46 UTC by davids
Modified:	2005-07-23 22:49 UTC (History)
CC List:	2 users (show)

See Also:
Host:	i386-redhat-linux
Target:	i386-redhat-linux
Build:	i386-redhat-linux
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description davids 2005-02-20 00:46:08 UTC

The '-fno-threadsafe-statics' is purely a pessimization for POSIX pthreads code.
Since the POSIX standard already requires a mutex around operations that modify
data, and the C++ standard specifies lazy initialization, POSIX code will
already have mutexes to make static initialization thread safe. The code is,
therefore, already thread safe and cannot be made more so.

This option does allow non-compliant code to work by relaxing some of the
requirements in the POSIX standard. That's a good option to have, but it should
not be on by default. Code should, by default, get the benefit of the
optimizations the standard allows.

In summary, an option that is a pure pessimization for compliant code, and is
only required to make non-compliant code work, should not be on by default when
that standard is invoked. This is the case for other options in this category
(writable strings, relaxed aliasing, and so on).

So, '-pthreads' should imply '-fno-threadsafe-statics', and an option like
'-fthreadsafe-statics' should be created to request this option.

Comment 1 Andrew Pinski 2005-02-20 00:52:13 UTC

Note -fthreadsafe-statics, never made it into the FSF's 3.4.x, only 4.0.0.

Also note this is required by the C++ ABI.

Comment 2 Andrew Pinski 2005-02-20 01:01:49 UTC

Actually this code was added because well it is hard to deal with mutexies in these cases.

Comment 3 davids 2005-02-20 01:09:40 UTC

Do we agree that this is a pure pessimization for POSIX-compliant code? Do we
agree that POSIX already requires a mutex to protect code that might modify an
object?

Comment 4 Andrew Pinski 2005-02-20 01:14:50 UTC

No, see the following code:

class A
{
  A();
  int t;
};

void f()
{
  static A a;
}

So how can someone know that A will modify memory?  In fact the mutex is made sure so that a is only 
initialized once.

Comment 5 davids 2005-02-20 02:46:32 UTC

In your example:

class A
{
  A();
  int t;
};

void f()
{
  static A a;
}

I don't get it. What's the problem with this? Obviously, if you plan to call
'f()' from multiple threads, you must do it while holding a mutex because it
might modify 'a'. This is just like any other function that modifies (or might
modify) data.

Comment 6 davids 2005-02-20 02:50:54 UTC

You say, "how can someone know that A will modify memory"? The answer is, the
C++ standard says so, section 7.1.2. They simply read that section of the
standard, and they know that the function might modify memory. If they know for
sure that it won't in a particular case, they are safe. If they don't know, and
it might modify memory, then POSIX requires them to put in a mutex.

You cannot create code that works with this option and doesn't work without it
except by violating the POSIX standard. So POSIX code should not have this
option enabled by default -- it's a pure pessimization.

Comment 7 Andrew Pinski 2005-02-20 06:21:05 UTC

No, the point is if you expose an API to the user, why should the user of the API know that you modify 
the memory.

Let the compiler do the work for you instead of doing the work in this case.

Comment 8 Marcin 'Qrczak' Kowalczyk 2005-02-20 10:16:36 UTC

> You cannot create code that works with this option and doesn't work without it
> except by violating the POSIX standard. So POSIX code should not have this
> option enabled by default -- it's a pure pessimization.

POSIX doesn't say anything about C++, and C++ doesn't say anything about threads
- we are already outside the scopes of these standards.

A reasonable extension of C++ to multithreading should give static locals the
semantics analogous to pthread_once. This allows to write code which uses lazy
initialization, is MT-safe, yet doesn't use a threading library explicitly - is
portable to C++ without threads. Note that this doesn't break any correct
program which doesn't make use of this.

Comment 9 davids 2005-02-20 10:48:02 UTC

There is certainly the eternal argument whether a class should implement its own
locks internally or whether the caller should implement them. The first case
simplifies calling at the expense of overhead when you don't need it. The second
makes callers have more knowledge of how the functions they're calling work, but
make possible optimizations in cases where the locks aren't needed, and only the
caller knows this.

POSIX made this decision. The POSIX memory visibility rules specifically require
a mutex to be held in cases where an object may be modified by another
concurrent thread. So POSIX code *must* already contain such mutexes. This
requirement in POSIX is as clear as anything can be. The rationale is that on
some platforms, locks may be very expensive, so requiring them implicitly was
carefully avoided. POSIX requires shared data to be explicitly locked by
application code when data might be modified in one thread and read in another
concurrently.

While the C++ standard doesn't say anything about threads, it says a lot about
how static objects are initialized. It specifies initialization on first use,
thus first use is a modification. Use that might be first use might be a
modification.

There is no other sensible reading of the POSIX standard. There is no other
sensible reading of the C++ standard.

The arguments presented are disingenous. They would equally well defend the
decision to initialize all static objects before calling 'main' in multithreaded
code. After all, C++ doesn't say anything about multithreaded code and POSIX
doesn't say anything about initializating C++ static objects. In fact, the
arguments would defend crashing on any multithreaded C++ code, which is
obviously not what anyone wants.

If there is going to be C++/POSIX code, the C++ standard and the POSIX standard
will have to be made to coexist. POSIX requires the application to figure out
when objects may be accessed concurrently, and does so for good reason. In a
non-POSIX context, it may make sense to have the compiler try to automatically
insert mutexes, but it definitely doesn't in the POSIX case.

The compiler should, by default, make the optimizations that the standards
permit. This is the way every other similar option has been implemented.

Comment 10 Marcin 'Qrczak' Kowalczyk 2005-02-20 13:00:50 UTC

> There is certainly the eternal argument whether a class should implement its own
> locks internally or whether the caller should implement them.

And my guideline is as follows: it should implement its own locks if it provides
a single global object, and it should leave locking to the callers if it
provides objects created dynamically.

The reasoning is that dynamically created objects are often used locally in one
thread, in which case locking would be unnecessary, while a singleton is always
accessible to all threads.

> POSIX made this decision. The POSIX memory visibility rules specifically require
> a mutex to be held in cases where an object may be modified by another
> concurrent thread. So POSIX code *must* already contain such mutexes.

POSIX gave mechanisms for synchronization. In particular pthread_once can be
used to guard initialization, such that the object doesn't need an additional
mutex if it's not modified later. pthread_once provides the semantics
"initialize before first use (in a MT-safe way)".

Static locals in C++ are an equivalent to pthread_once in C/POSIX.

Comment 11 davids 2005-02-20 19:43:46 UTC

>The reasoning is that dynamically created objects are often used locally in one
>thread, in which case locking would be unnecessary, while a singleton is always
>accessible to all threads.

Accessible, but not necessarily accessed. In fact, there is no way you can know
whether this synchronization overhead is necessary, and for POSIX it's only
needed if the code violates the memory visibility rules.

>Static locals in C++ are an equivalent to pthread_once in C/POSIX.

Even in the single-threaded case, C++ leaves it undefined what happens if you
reenter a function that invokes a static initializer from that static
initializer. To argue that this means it should be defined for the
multi-threaded case is absurd.

C++ requires the initialization to be complete before you are allowed to pass
the intializer again. POSIX requires locks when data that might be shared might
be modified.

Bluntly, this is totally opposite to the entire philosophy of the POSIX standard
and the wording of the C++ standard.

Comment 12 Marcin 'Qrczak' Kowalczyk 2005-02-20 21:43:08 UTC

> >Static locals in C++ are an equivalent to pthread_once in C/POSIX.
> 
> Even in the single-threaded case, C++ leaves it undefined what happens if you
> reenter a function that invokes a static initializer from that static
> initializer. To argue that this means it should be defined for the
> multi-threaded case is absurd.

There is a fundamental difference between the second access made from the same
or from a different thread.

Access from the same thread means that there is a contradiction in the program:
the data depends on itself. This is always a bug. The implementation may be
helpful and detect this (gcc4 throws exception recursive_init), but in general
undefined behavior is consistent with the philosophy of C/C++. The same applies
to recursive pthread_once and to a recursive mutex (except that with mutexes you
have an option to allow recursive locking).

Access from a different thread most often means that the timing was unfortunate
and two threads happened to try to perform initialization on first use almost
simultaneously, closer in time than the time needed for the initialization.
While there is a remote possibility that the threads were not independent, the
second thread was a worker spawned by the first one, the first one will wait for
the second one to finish its work, the second one will wait for the first one to
complete the initialization, and we will have a deadlock - almost always this is
not the case, this is just unfortunate timing, and the first thread will
complete initialization while the second one is waiting. It makes no sense to
optimize for the case of a bug which causes undefined behavior, so the language
assumes that there is no bug and arranges the program to run according to the
obvious intended semantics of independent threads simultaneously accessing a
lazily initialized variable: the second thread waits for the first to complete
the initialization, as if it got there some time later. The same applies to
pthread_once and to a mutex.

Obviously the C++ standard sees only the first scenario, a single thread, so it
just declares reentering a static local initializer invalid. But there is no
conceptual problem with "reentering" the initializer from a different thread -
it just have to wait until the first one finishes. A hypothetical multithreaded
C++ should state this as the semantics of static locals.

Look at the semantics of Glasgow Haskell which combines lazy evaluation with
threads. When the given lazy variable is reentered, if this is the same thread
which started computing its value, we have a cyclic dependency and the runtime
detects this as an error; and if this is a different thread, the second thread
waits for the first one to complete computing the value, because almost always
this is just an unfortunate timing and this is the obvious way to resolve it.

Comment 13 Giancarlo Niccolai 2005-02-20 23:36:31 UTC

This comment is relevant regarding the C++ ABI closure on which this idea has
been derived:

http://groups.google.it/groups?q=g:thl3050439784d&dq=&hl=it&lr=&client=firefox-a&rls=org.mozilla:it-IT:official_s&selm=95e4efda.0502191504.315f1f3%40posting.google.com

That thing must be reopened and fixed asap.

Also, please read the thread carefully.

Locking arbitrarily any section of code, for any reason, outside the control of
the user is just wrong, because:
1) it prevents the code to use more efficient schemes if needed.
2) it prevents the code to forfait any locking if not needed.
3) it may even force different approaches and encapsulations into the final
code, which is not what C++ is meant.

This holds even if the code is inside a static initializer. 

Bests,
Giancarlo Niccolai.

Comment 14 davids 2005-02-20 23:59:48 UTC

What you keep ignoring is that the POSIX standard explicitly declares it a bug
to access data in one thread while it may be modified in another. It's not
"unfortunate timing", it's failure to use the proper code to ensure correct timing.

You say a hypothetical multithreaded C++ should state this as the semantics for
static locals, and I don't disagree with you, provided we are not talking about
one based on POSIX threads. POSIX specifically made the design decision to
always require explicit locks and to always require the programmer to find,
document, and lock cases where concurrent accesses might occur.

Comment 15 Giancarlo Niccolai 2005-02-22 10:27:32 UTC

> 
> You say a hypothetical multithreaded C++ should state this as the semantics for
> static locals, and I don't disagree with you, provided we are not talking about
> one based on POSIX threads. POSIX specifically made the design decision to
> always require explicit locks and to always require the programmer to find,
> document, and lock cases where concurrent accesses might occur.

I think I wrote a rationale that explains and justifies this choice at 

http://www.niccolai.ws/works/articoli/art-multithreading-en-1a.html

I had not been able to find an official rationale from open group; the nearest
thing seems to be:

http://www.unix.org/whitepapers/reentrant.html

If more precise and authoritative ones are available, please point them out.

Comment 16 davids 2005-02-23 01:30:12 UTC

The '-pthreads' flag should imply '-fno-threadsafe-statics'. For every other
similar flag I can find, the default is to permit the compiler to make the
optimizations that standard allows and specific flags are needed to disable the
optimization.

Comment 17 Marcin 'Qrczak' Kowalczyk 2005-02-23 01:53:53 UTC

> The '-pthreads' flag should imply '-fno-threadsafe-statics'. For every other
> similar flag I can find, the default is to permit the compiler to make the
> optimizations that standard allows and specific flags are needed to disable the
> optimization.

It's not the question of optimization but correctness. Having an "initialize on
first use" semantics which breaks when multiple threads use it is as wrong as
using global variables for passing data between functions and forcing clients to
introduce locks to make them reentrant. It is usable when clients are careful,
but it's a bad default when we can't expect all calls to be done from a single
thread.

Comment 18 Andrew Pinski 2005-02-23 02:00:33 UTC

I think this is a waste really for this bug to be open as non of the GCC people commented on it and 
when the orginal bug was filed, there was a huge opportunity to talk over this and nothing was done.

So I am going to close as will not fix.

Comment 19 davids 2005-02-23 02:19:19 UTC

> It's not the question of optimization but correctness.

Exactly, and not locking objects that may be modified from another thread is 
not correct. Only the programmer knows whether an object may be modified from 
another thread.

> Having an "initialize on first use" semantics which breaks when
> multiple threads use it is as wrong as using global variables
> for passing data between functions and forcing clients to
> introduce locks to make them reentrant.

It doesn't break when multiple threads use it, it breaks when multiple threads 
use it in a way where one thread might modify an object while another thread 
access it without locking. POSIX specifies that everything breaks when misused 
this way.

You seem to have an attitude that it's difficult to track data dependencies 
and ensure correct locking. You talk about "unfortunate timing". What you're 
missing is that this is what people who write multithreaded programs do all 
day long. Those who are good at writing multithreaded programs are good at 
doing exactly this. And this change does not remove any of the burden -- it 
simply removes a choice (one can trivially add these locks when one knows 
they're needed or doesn't know they're not needed).

> It is usable when
> clients are careful, but it's a bad default when we can't
> expect all calls to be done from a single thread.

Any function that might be called from multiple threads concurrently will 
require special coding to deal with this situation. You would have to go out 
of your way to create a situation where just protecting the initialization is 
sufficient.

You have to be careful when you write multithreaded code, period. Functions 
may require you to hold certain locks when you call them, prohibit you from 
holding other locks, or require the callers to impose synchronization. This is 
not some obscure detail unique to COFU, it's the bread and butter of 
multithreaded programming.

POSIX puts the responsibility fully on the application programmer to place 
locks where there might be concurrent access. In exchange for this effort, the 
programmer gets the performance benefit of there not being locks when they are 
not needed.

Some hypothetical multithreaded C++ standard might choose a different route. 
But it would not be POSIX.

Again, it is this simple: POSIX prohibits an object from being modified in one 
thread and accessed in another. C++ specifies initialization on first use. 
Initialization is modification. Thus first use requires a lock if there can be 
a concurrent access. The language cannot tell when there can and cannot be a 
concurrent access, the language cannot tell when a call cannot possibly be 
first use.

I could perhaps ignore all of this if the behavior discussed made a reasonably 
significant category of code "just work" or removed some of the work involved 
in getting synchronized code correct, but it does not. In the vast majority of 
cases, the static object may be modified and locks will be needed anyway. In 
many cases, it will be known that the static object has already been 
intialized and will not be modified, and the locks added will be wasted.

Comment 20 Giancarlo Niccolai 2005-02-24 11:00:02 UTC

(In reply to comment #18)
> I think this is a waste really for this bug to be open as non of the GCC
people commented on it and 
> when the orginal bug was filed, there was a huge opportunity to talk over this
and nothing was done.
> 
> So I am going to close as will not fix.

Is there any of the GCC people in the POSIX committee working at the PTHREAD
sub-standard? 
Because David Butenhof (member of the above and writer of the book "Programming
with POSIX® Threads") has strongly argued against this feature:

<cite>

Obviously, (excessive) locking does NOT allow for simultaneous execution.

        A lot of people are used to write old good ST code and "add some locks to
make it thread-safe" afterwards. The realworld result is always frustrating:
the application is terribly slow and doesn't scale.
        Sure, we need some synchronization primitives (such as mutexes) to implement
inter-thread communication. As a rule, thoroughly designed MT application is a
set of fully independent threads which communicate via the message queues. The
synchronization primitives are used inside these queues and virtually no locks
are supposed outside the queues.
        But what we see in real life is herds of mutexes spread across the code.
"MT-aware classes" try to lock almost every method... And even more, some
"industrial-strength" PLs have the built-in mutex per every object! IMHO there
is no excuse for this madness.

</cite>

I have found no other competent opinion arguing the other way around (that is,
the thing that you want to do); every competent opinion is in this direction.

The ABI closure is actually based on a non-standard solution, and on a "lack of
interest" (and presumabily competence) in this field. Also, a bug cannot be
considered closed "because no-one protested yet". 

BTW; I am going to see Stallman today and speak with him about this fact. Is in
Milan today.

Bests,
Giancarlo Niccolai

Comment 21 Giancarlo Niccolai 2005-02-24 16:04:46 UTC

(In reply to comment #20)
> 

Sorry, the correct citation is:
 
<cite>
"multithreading is defined by an application design that ALLOWS FOR concurrent
or simultaneous execution"
</cite>

The rest is cited from Sergey P. Derevyago COMMENTING this sentence.

(I missed the "" in the first reading of his message).

Comment 22 Marcin 'Qrczak' Kowalczyk 2005-02-24 17:31:30 UTC

> "multithreading is defined by an application design that ALLOWS FOR concurrent
> or simultaneous execution"

Initializers of static locals cannot execute concurrently, no matter whether
they are automatically locked or not.

The only thing which would change when you remove the automatically inserted
locking is that some programs which used to work are now broken, and that some
other programs which used to deadlock now invoke undefined behavior from
entering an initializer recursively.

Comment 23 davids 2005-02-24 20:42:38 UTC

>The only thing which would change when you remove the automatically
>inserted locking is that some programs which used to work are now
>broken, and that some other programs which used to deadlock now
>invoke undefined behavior from entering an initializer recursively.

First, if we're talking about pthreads programs, which is the only case I'm
suggesting removing the locking for, then those programs are already broken. The
pthreads standard requires locks when data may be changed in one thread and
accessed in another. Entering an initializer recursively has always been
undefined behavior. Any code that might do that is broken.

In any event, the entire thrust of this argument is bogus. If GCC/G++ are
going to have non-portable features that make code work when they're enabled
and break when they're disabled, they definitely should not be on by
default. That they hide bugs in code that claims standards compliance on
one platform, but allow them to fail on another, is an argument *against* the
option, not for it. (Or are you seriously arguing that the C++ standard and the
POSIX standard *require* this behavior?)

Comment 24 Marcin 'Qrczak' Kowalczyk 2005-02-24 21:06:42 UTC

> First, if we're talking about pthreads programs, which is the only case I'm
> suggesting removing the locking for, then those programs are already broken.

They are non-portable no matter how static initializers are done: C++ doesn't
include threads and POSIX doesn't include C++.

> If GCC/G++ are going to have non-portable features that make code work
> when they're enabled and break when they're disabled, they definitely
> should not be on by default.

Taking portability aside (as they are already non-portable), this is a wonderful
quote when taken out of context. Yeah, if an option makes more code working and
its negation makes more code break, let's make the breaking variant the default :-)

> (Or are you seriously arguing that the C++ standard and the
> POSIX standard *require* this behavior?)

Of course not. Not yet anyhow.

For me static locals in C++ are the equivalent of pthread_once in C/POSIX. A
hypothetical C++/POSIX should make them MT-safe.

Comment 25 davids 2005-02-24 22:22:04 UTC

> They are non-portable no matter how static initializers are done:
> C++ doesn't include threads and POSIX doesn't include C++.

That's a bogus argument. There are no conflicts between the two standards.

> Taking portability aside (as they are already non-portable), this is
> a wonderful quote when taken out of context. Yeah, if an option makes
> more code working and its negation makes more code break, let's make
> the breaking variant the default :-)

In any context, I stand by this quote. Code that does not conform to the
standards it requests during compilation should break. Compatability options
that make code work are fine, they just should not be on by default, especially
when they have a performance cost. Broken code should break.

> For me static locals in C++ are the equivalent of pthread_once in C/POSIX.
> A hypothetical C++/POSIX should make them MT-safe.

They are MT-safe, provided you lock shared data that may be accessed
concurrently. POSIX never permits you to modify shared data in one thread
when it may be accessed by another without a lock.

By your definition of MT-safe, almost no function is MT-safe. Is 'strchr'
MT-safe if you call it on a string while another function might modify that
string? No, so lets put automatic locks around 'strchr' and any function that
might modify a string.

A function is MT-safe if it does the same thing in an MT context that it does in
a UT context, provided the caller uses locks to prevent access to data in one
thread while another thread might access it. Static initialization is already MT
safe and these locks do not make it more so.

Comment 26 Marcin 'Qrczak' Kowalczyk 2005-02-24 22:37:22 UTC

> By your definition of MT-safe, almost no function is MT-safe. Is 'strchr'
> MT-safe if you call it on a string while another function might modify that
> string? No, so lets put automatic locks around 'strchr' and any function that
> might modify a string.

The difference is that almost all uses of strchr are done on data which is not
shared between threads, while many static local initializers are used on objects
which are accessed from multiple threads (when multiple threads are used at all).

By your reasoning malloc should be unsafe to use from multiple threads if the
user did not explicitly put a lock around it, because it modifies shared data
(the heap). And errno should not be automatically made thread-local, because it
hurts performance in case thread-locality is not needed. After all, "POSIX puts
the responsibility fully on the application programmer to place locks where
there might be concurrent access. In exchange for this effort, the programmer
gets the performance benefit of there not being locks when they are not needed".
Right?

Comment 27 davids 2005-02-24 23:45:42 UTC

> The difference is that almost all uses of strchr are done on data
> which is not shared between threads, while many static local
> initializers are used on objects which are accessed from multiple
> threads (when multiple threads are used at all).

In both cases, some are, some aren't. Some uses of 'strchr' are on data that's
used by multiple threads. And some object intialized statically are in functions
that are only called by a single thread. The cases are not as different as you
seem to think they are.

> By your reasoning malloc should be unsafe to use from multiple threads
> if the user did not explicitly put a lock around it, because it modifies
> shared data (the heap).

The difference is that 'malloc' is a function supplied by the system and the
user, in principle, has no knowledge of its internals. In principle, the user
has no idea that there is a "heap" or that it is a shared structure.

To support your argument, you would have to argue that if the user tried to
implement his own memory allocation function, the language should somehow detect
that it manipulates a shared structure (like the heap) and apply locks around
that structure.

> And errno should not be automatically made
> thread-local, because it hurts performance in case thread-locality is not
> needed.

Again, to make your argument work, you would have to argue that the system
should detect that new user-written code accesses some static variable akin to
'errno' and automatically change that variable to be thread local.

> After all, "POSIX puts the responsibility fully on the application
> programmer to place locks where there might be concurrent access. In
> exchange for this effort, the programmer gets the performance benefit
> of there not being locks when they are not needed".
> Right?

Right. User code should not get automatic locks put around it. Locks are needed
where the programmer couldn't possibly know that shared data is manipulated, and
the locks should be coded in the function, not magically added by the language.

None of your cases even remotely support the argument that the implementation
should detect possible cases where data is modified in one thread while it's
used in another thread and automatically insert locks to make this access safe.
(In the context of POSIX threads.)