This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Fw: [patch] Make std::tr1::shared_ptr thread-safe.
- From: Alexander Terekhov <alexander dot terekhov at gmail dot com>
- To: Peter Dimov <pdimov at mmltd dot net>
- Cc: Paolo Carlini <pcarlini at suse dot de>, Gabriel Dos Reis <gdr at integrable-solutions dot net>, libstdc++ at gcc dot gnu dot org, Jonathan Wakely <cow at compsoc dot man dot ac dot uk>
- Date: Thu, 31 Mar 2005 15:21:01 +0200
- Subject: Re: Fw: [patch] Make std::tr1::shared_ptr thread-safe.
- References: <OF24DE43AB.98214558-ONC1256FD5.0044D61E-C1256FD5.0044BD18@de.ibm.com>
- Reply-to: Alexander Terekhov <alexander dot terekhov at gmail dot com>
"Peter Dimov" <pdimov@mmltd.net> on 03/30/2005 11:33:56 PM:
[...]
> A separate issue is that there needs to exist proper memory ordering
> between some of the operations, for example, destroy() needs to happen
> after dispose() (but there are other visibility issues, some of them more
> subtle.)
>
> This can be fixed in one of the following ways:
>
> 1. Empty lock/unlock regions on the same mutex as outlined in my and
> Alexander's posts.
>
> 2. Full bidirectional memory barriers at the same two places.
>
> 3. Making __exchange_and_add offer the following memory synchronization
> guarantees:
>
> - acquire memory semantics when the result is not zero;
> - release memory semantics when the result is zero
release memory semantics when the result is not zero;
acquire memory semantics when the result is zero
And "result" means new value of count stored by __exchange_and_add(),
not return value (its old value). BTW, I had a typo in my previous
message when I wrote "result > 1".
>
> or introducing a separate __atomic_decrement primitive with these
> properties.
refcount<> is better. And it shall be based on full bloat atomic<> with
things like validate() for hazard pointers stuff,
http://www.google.de/groups?selm=4242A922.EF87094A%40web.de
and etc.
class sp_counted_base {
/* ... */
typedef refcount<std::size_t, basic> count;
count use_count_, self_count_;
/* ... */
public:
/* ... */
sp_counted_base() : use_count_(1), self_count_(1) { }
void add_ref() throw() {
use_count_.increment(); // naked
}
bool lock() throw() {
return use_count_.increment_if_not_min(); // naked
}
void weak_add_ref() throw() {
self_count_.increment(); // naked
}
void weak_release() throw() {
if (!self_count_.decrement(msync::acq, count::may_not_store_min))
destruct();
}
void release() throw() {
if (!use_count_.decrement()) { // "full" release-acquire protocol
dispose();
if (!self_count_.decrement(msync::rel, count::may_not_store_min))
destruct();
}
}
/* ... */
};
http://www.terekhov.de/pthread_refcount_t/experimental/refcount.cpp
[...]
> The next issue on our list is that reference-counted mutable synchronized
> objects would need an empty lock/unlock region at the start of the
> destructor if (3) is not chosen. I consider this unfortunate but
> acceptable.
I disagree.
void g(shared_ptr<std::something> p); // can modify *p
void f() {
shared_ptr<std::something> p(...);
new_thread(g, p);
}
The problem is that synchronizing deleter must lock *the same* lock
that was used to synchronize access to mutable object. But in the
case above there isn't any client locking at all!
>
> The final issue is that the reference counted objects in libstdc++
> (std::string, std::locale::facet) may also be broken *unless*
> __exchange_and_add has some memory visibility guarantees that I don't
> understand,
It's really simple. See release/acquire requirement above.
For immutable stuff we need
msync::slb (instead of full conventional release) memory semantics
when the result is not zero;
msync::hsb (instead of full conventional acquire) memory semantics
when the result is zero.
slb stands for "sink load barrier".
hsb stands for "hoist store barrier".
Full conventional release is compound msync::slb + msync::ssb
("sink store barrier).
Full conventional acquire is compound msync::hlb ("hoist load
barrier) + msync::hsb.
> but Alexander does. (It needs to establish ordering of prior
> loads WRT following stores IIUC). This will be fixed by (3) and may or may
> not be a problem in practice; I'm not an expert in this area and can't
> review the current __exchange_and_add implementations. In any event,
> shared_ptr with the above fixes applied would be no more broken than the
> rest of the library.
The rest of the library is currently totally broken in this respect
I'm afraid.
It works on IA32 (compiler reordering aside for a moment) because on
IA32 loads have acquire semantics, stores have release semantics, and
interlocked stuff is fully-fenced (compound acquire+release semantics).
regards,
alexander.