[Bug libstdc++/21334] Lack of Posix compliant thread safety in std::basic_string

Wed May 4 09:14:00 GMT 2005

------- Additional Comments From jkanze at cheuvreux dot com  2005-05-04 09:14 -------
Subject: Re:  Lack of Posix compliant thread safety in std::basic_string

|> >|> Secondly, it is clear that your bug report is hypothetical.  The
|> >|> library maintainers do not typically deal in hypotheticals.

|> > I guess it is really a question of whether you want quality or
|> > not.  In general, all code is supposed incorrect until proven
|> > otherwise -- in the case of threading issues, this is
|> > particularly important, because of the random factors involved.

|> Of course, we want quality (and we want performance) which is
|> why we insist upon non-hypothetical bug reports (either way,
|> I've already explained what is wrong with your posted code
|> w.r.t. the cited FAQ entry)

I know what was wrong with my code with regards to the FAQ.  I
know that the code didn't conform to *your* threading model.  My
point is that your threading model is, in a certain sense,
wrong.  It isn't what people expect, at least not people who are
familiar with the Posix threading model.

The documentation is in fact very misleading, with phrases like
"like for any other shared resource", when you don't need
external synchronization for other shared resources, *unless*
someone is actively modifying it.

Now, I'm not saying that the Posix model is the only legitimate
one.  But it is the one I, and most people working on Unix
platforms, are familiar with, and expect.  And it is a standard
on those platforms.  You can choose to ignore it, and to ignore
what SGI has done in this regard.  But in that case, you have to
be aware that you are trying to create an island, in which you
are different from everyone else.

|> and why we insist that library users
|> must write correct locking sequences outside the context of the
|> library.  As you well know, there is nothing in POSIX which
|> dictates that a library internalize the locking vs. implying
|> that the user code must hold a mutex lock.

Posix only defines its own functions (most of which are either
thread-safe, or have a thread-safe variant), and it only defines
a binding for C for them.  Other than that, we are left with
general, language neutral statements like the one I quoted.

Obviously, a library not specified by Posix can do what ever it
wants.  In practice, however, user expectations are based on
what Posix does, and that leaves two alteratives:

 -- Model your library on the actual Posix functions, with
    full internal locking -- I can call write on the same file
    descriptor from two different threads, for example, without
    problems.

    A lot of people (although none of the experts I know) seem
    to expect this.  Rogue Wave tries to do it, for example.
    IMHO, however, it isn't reasonable for a library which lets
    references (in the general sense) escape to the user --
    Posix itself makes an exception for functions which return
    pointers to internal data structures.  (It doesn't use those
    words, but if you look at the list of functions which are
    not guaranteed thread-safe, it includes exactly those
    functions, and no others.)

 -- Base your model on the general, language neutral statements.
    This is what SGI does, for example.  The general statement
    is that I need a lock if more than one thread is accessing
    the "memory", and any thread is modifying it.  Since the
    abstraction model of C++ prevents me from actually seeing
    the memory, one is lead to exterpolate "memory" to "object".
    This seems a reasonable interpretation to me.  At least some
    experts agree -- the authors of the the SGI threading model,
    for example.  (I consider the people who were working at SGI
    when that model was defined to be among the best threading
    experts in the world.  That's just my opinion, but every
    time I've had a discussion with them, I've come out knowing
    more than when I went in.)

You can, obviously, take the position that you have defined the
threading model you wish to use, that you know better than
everyone else, and that your decisions cannot be questioned.
And I can take the position that you're just wrong, you don't
know anything about threading, that g++ is worthless in a
multithreaded environment, and publicize that opinion in various
newsgroups.  Such attitudes won't help either of us, however.

This bug report came about because of a discussion in a news
group.  Basically, I said to watch out for std::string with g++
if you are in a multithreaded environment.  I don't remember my
exact words, but I'm pretty sure that the gist would have been
that the g++ implementation of std::string does not behave as
one might expect.  I said it in a newsgroup, rather than making
a bug report, because I knew of the text in the FAQ (or
something similar), and I was convinced that no one here would
consider it an error.  Gaby suggested otherwise; that if I could
describe a case where the code could fail, although no thread
modified the string, I should report it as a bug.  So we're
here, and I'm getting hounded because my email contains trailers
which I can do nothing about:-).  I was inordinately pleased by
his suggestion -- I sometimes get the impression that the g++
developers live in a world of their own, and do not care about
the expectations of "just users", and was particularly pleased
to find that my impression might be wrong.

I'd like to use g++ professionally.  Most of my development is
on Sparcs, under Solaris, and the "offical" compiler is Sun CC.
Which, quite frankly, isn't supported very well.  But how can I
explain to my collegues that they have to deal with two
different threading models, depending on where the library came
from?  Whatever the theoretical issues, practical considerations
suggest that you align yourself with the conventions of the
platform whenever possible.

|> You observe that we should make string<> act exactly like char[]
|> from the user's perspective.

Not necessarily exactly.  I'm looking for a model to define
reasonable expectations.  If we claim that users should use
std::string instead of char[], we shouldn't make it more
difficult to do so, at least not for the common cases.

|> Neither the cited FAQ (which I
|> just reviewed in detail, esp. the full text of section 5.6) nor
|> POSIX modified as you'd wish implies that at all.  char[] is a
|> raw chunk of memory.  In the cited FAQ, we explain that any
|> method call on any shared object must be mutex-protected unless
|> we specially document the method or the entire class.  We are
|> stating that we do nothing inside the library which would make
|> it impossible for a user to write threaded application with our
|> library.  As I've already observed to you, string<> may or may
|> not be specially documented.  If it is, then you need to cite
|> that documentation in your bug report, not the general FAQ
|> section 5.6.  This is basically my point.  The view expressed in
|> the FAQ section 5.6 is entirely consistent with the POSIX view
|> (as conveyed to me by Butenhof's book).  POSIX doesn't care what
|> portions of the C or C++ application have to apply mutex locks
|> to ensure it happens.

|> The prime issue is visibility of the object.  We take
|> responsibility for locking access to internally hidden shared
|> objects whenever the holding reference was not itself visibility
|> shared in the application code.  When the holding reference was
|> visibility shared in the application code, then the user is
|> responsible for locking access.

This is exactly my point.  The only visible parts of std::string
are the individual char's (which are, by the way, "memory
locations", as per the Posix definition).  As long as I do
nothing which modifies the visible parts of an std::string
object, I should be able to access it from multiple threads,
without external locking.  The fact that there are, internally,
other memory locations which may be modified, should be
transparent to me.

|> In your posted code, the user
|> has direct visibility to the shared object thus he must hold a
|> lock under our general model (also the SGI STL general model).

>From the SGI document: "The SGI implementation of STL is
thread-safe only in the sense that simultaneous accesses to
distinct containers are safe, and simultaneous read accesses to
to shared containers are safe."

That sounds pretty clear to me: simultaneous read accesses to
shared containers are safe.  That's exactly what I do in my
program: two threads access simultaneously a shared object,
without modifying it.

|> If the shared object was invisible to the user (i.e. a shared
|> cache not implied by the standard which is indirectly accessed
|> by two independently declared library objects) and we failed to
|> hold a mutex internal to the library, then I would be more
|> interested in your report...

|> Our general recommendations precisely covers this case.  I will
|> go farther: Any locking model under C++ which fails to account
|> for object model visibility will likely perform like a real dog.
|> Bow Wow.

I've not noticed performance problems in the SGI
implementations.  Which do give the guarantee I expect.

|> BTW, I noticed that you had no response to my point that you are
|> using an incorrect locking idiom in your posted code w.r.t. our
|> general FAQ.

Because I've never disputed the fact.  The FAQ is highly
misleading, but if you ignore the extraeneous comments, and
restrict yourself to what is actually guaranteed, you find that
it is NOT what is usually guaranteed under Posix, despite claims
to the contrary.

|> If you would like to object that your posted code fails w.r.t.
|> another section of the FAQ which offers special guarantees, then
|> I will likely not complain if you close this PR and open a new
|> one.

|> > Well, there's certainly some confusion, because Posix says one
|> > thing, and your documentation says another.

|> Huh?  POSIX dictates when code must do or not do something to
|> ensure that the threading model is sound.

Right.  It says that if more than one thread is accessing a
memory location, and one or more threads modifies that memor
location, then no external synchronization is necessary.  In my
example, nobody modified any memory locations which were visible
to the user, so according to Posix, no external synchronization
should be necessary.

|> We dictate when user
|> code must do something in line with the POSIX model.

You dictate something more.  If I am to apply the Posix model to
your implementation of std::string, then I must know something
about the internal details to know that a memory location is
being modified, even if that location is not visible to me.

|> We tell
|> the user precisely when they must account for locks when working
|> with library objects.  Is it complex (in C++)?  Yes.

The problem isn't the complexity; writing correct multi-threaded
applications requires some thought, regardless of the model.
The problem is having to deal with a different model for certain
types of objects.

|> > When you say "treat library objects like any other shared
|> > resources', you are contradicting yourself for a system using the
|> > Posix model.

|> There is no contradiction.

Posix says that I do not have to use external synchronization if
I do not modify the objects.  You say I have to.  That is a
contradiction.

|> We made it clear what you have to do
|> when you use the library to ensure that your entire program is
|> thread-safe.  If you insist on cutting a corner, what can we do?

|> If you think other words would be better in our FAQ, then I will
|> take no offense as the last author of that section if you
|> propose a change.

Well, I did suggest that if you can't change the code
immediately, the text should be something along the lines
"unlike the normal case under Posix, you must use external
synchronization even when not modifying objects".

|> > [...] You can use it, but you have to take additional
|> > precautions that Posix says shouldn't be necessary, and that
|> > experienced Posix users don't expect to need.

|> Humm...  If you would like to fix the library to match your
|> expectations instead of crafting application-level code as we
|> (and SGI) suggest, then please be our guest.  Your patch will
|> only be accepted if you don't kill library performance.

My code is conform to the SGI requirements.  And in fact, if I
replace your std::string with any of the SGI containers, it
works.

|> > An implementation, of course, is free to decide what it wants to
|> > guarantee, and what it doesn't.  If it is decided, however, that
|> > the Posix guarantees do not extend to the library, then it is
|> > important to document this fact; i.e. to indicate somewhere that
|> > it cannot be missed that configuring the compiler to support
|> > pthreads does NOT mean what it seems to mean.

|> I think our guarantee is now well-qualified and surely better
|> than what was stated pre-gcc 3.0.  I think you actually don't
|> like our qualification.

It is, of course, senseless to compare with pre-gcc 3.0.  Before
3.0, there was no effort made to be thread safe, and the
compiler simply couldn't be used in a multithreaded application.
(I know.  Twice, I've had to pick up the pieces.)

|> I think I can safely assure you that if you construct your C++
|> code precisely as we document in section 5.6, you will never
|> violate a requirement of POSIX nor cause the library itself
|> internally to violate a requirement of POSIX.  If you cut a
|> corner, what can we do?

If I conform to your requirements, obviously, I will conform to
those of Posix, since yours are more strict.  The reverse is not
true, and that is, in some contexts, a problem.

If the documentation is reworded to stress the difference, and
the fact that there is an unexpected additional restriction, I
can live with it -- the problem only occurs in corner cases, and
is fairly easy to work around, IF one is aware of it.

(There is a more general problem, in that the documentation is
not always easy to find.  This is at least partically true for
almost all complers, however, and I suspect that many don't even
mention what they do or don't guarantee in this regard.  But
finding particular information in the gcc documentation seems
particularly difficult.  There is also an even more general
problem that people don't read the documentation, even when they
find it, but I don't think there is anything you can do about
that.)

--
James Kanze

"This message, including any attachments may contain confidential and
privileged material; it is intended only for the person to whom it is
addressed. Its contents do not constitute a commitment by Credit Agricole
Cheuvreux except where provided for in a written agreement. Credit Agricole
Cheuvreux assumes no liability or responsibility for the consequences
arising out of a delay and/or loss in transit of this message, or for
corruption or other error(s) arising in its transmission and for any misuse
or fraudulent use which may be made thereof. If you are not the intended
recipient, please contact us and abstain from any disclosure, use or
dissemination. To the extent that this message contains research
information and/or recommendations, these are provided on the same basis as
Credit Agricole Cheuvreux's published research and the recipient must have
regard to all disclosures and disclaimers contained therein."

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334