This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Tweaking atomics and memory barriers in shared_ptr

From: "Jonathan Wakely" <jwakely dot gcc at gmail dot com>
To: libstdc++ <libstdc++ at gcc dot gnu dot org>, gcc-patches at gcc dot gnu dot org
Date: Sat, 27 Oct 2007 13:29:01 +0100
Subject: Tweaking atomics and memory barriers in shared_ptr

Hi,

This patch addresses a possible correctness issue, two performance
issues and two minor doxygen fixes.  I'll be sending more patches to
add n2351 enhancements to shared_ptr so I've bundled this lot together
first.  They're all pretty small, but touching these atomics needs a
thorough review and I'm no expert on memory barrier placement.

Correctness:  _Sp_counted_base::_M_get_use_count() carries a comment I
left, asking if it is MT safe. I'm pretty sure the answer's no :-)
It should really be a synchronised atomic load (load acquire, I think)
so that updates by other threads are visible. The current code assumes
reading an _Atomic_word is atomic (should be ok) but it has no memory
synchronisation at all.  In the absence of an atomic load in GCC I've
used the same hack as Boost's shared_ptr, casting to volatile. IIUC
this isn't really enough as it only prevents compiler re-orderings,
not hardware effects, but it's an improvement over the status quo.
Better ideas would be welcome.

The performance issues are:
1) The memory barriers in _Sp_counted_base are suboptimal in the
common case. The __sync builtins are already fully-fenced, so the
explicit memory barriers are redundant when the lock-policy is
_S_atomic (which happens when we know we have a builtin CAS).
This is mentioned in http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
My suggestion is to make the memory barriers depend on the lock policy
rather than the __GTHREADS macro. The memory barriers are only needed
for the _S_mutex policy, because that policy is used when we don't
have __sync_* builtins, which means the atomic ops don't have any
memory synchronisation guarantees.

2) When lock-policy is _S_single the atomic *_dispatch functions in
ext/atomicity.h are used. For single-threaded code this doesn't
matter, but if __shared_ptr<T, _S_single> is used by multithreaded
programs then the lock-policy is not honoured, since fully-fenced
atomic ops will be used.
Let's pretend GCC supports template aliases and I do this

template<typename T>
  using shared_ptr_unsafe
      = __gnu_cxx::__shared_ptr<T, __gnu_cxx::_S_single>;

I wanted a shared_ptr without atomics and associated memory
synchronisation, for objects that are never shared between threads.
Unfortunately I find the performance is exactly the same as the
thread-safe std::shared_ptr, because the _S_single policy calls
__atomic_add_dispatch and __exchange_and_add_dispatch, which means the
locking method used is really being dictated by the result of
__gthread_active_p() not by the _S_single policy I chose.
For a very artifical benchmark involving several threads manipulating
shared_ptrs (but with no sharing between threads), the existing
_S_single performance is indistinguishable from _S_atomic. With a few
extra specialisations for _S_single I see a 20-25% speed increase on
my single-cpu x86-64.  I naively expect better figures on a
multi-processor machine with relaxed memory-ordering.
This change affects all users of _S_single, but only explicit uses of
__shared_ptr<T, _S_single> in MT programs get the performance benefit.
I think that use case is valid and worth supporting.

Tested x86-64/linux, both --enable-threads and --disable-threads (so
all shared/weak_ptr tests use _S_single.) Do I need to write a test
that exercises the _S_single policy so you don't have to configure
with --enable-threads=single to test it?
I'd like to see all these changes in 4.3 if stage 3 isn't too late for
this.  In any case I'm grateful for any feedback and especially for
any testing on more interesting hardware.

2007-10-27  Jonathan Wakely  <...>

        * include/tr1_impl/boost_shared_ptr.h: Avoid unnecessary atomics
        and memory barriers.

Jon

Index: include/tr1_impl/boost_shared_ptr.h
===================================================================
--- include/tr1_impl/boost_shared_ptr.h	(revision 129664)
+++ include/tr1_impl/boost_shared_ptr.h	(working copy)
@@ -94,12 +94,22 @@
   // Empty helper class except when the template argument is _S_mutex.
   template<_Lock_policy _Lp>
     class _Mutex_base
-    { };
+    {
+    protected:
+      // The atomic policy uses fully-fenced builtins, single doesn't care.
+      enum { _S_need_barriers = 0 };
+    };
 
   template<>
     class _Mutex_base<_S_mutex>
     : public __gnu_cxx::__mutex
-    { };
+    {
+    protected:
+      // This policy is used when atomic builtins are not available.
+      // The replacement atomic operations might not have the necessary
+      // memory barriers.
+      enum { _S_need_barriers = 1 };
+    };
 
   template<_Lock_policy _Lp = __default_lock_policy>
     class _Sp_counted_base
@@ -136,14 +146,19 @@
       void
       _M_release() // nothrow
       {
-	if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count,
-						   -1) == 1)
+	if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
 	  {
 	    _M_dispose();
-#ifdef __GTHREADS
-	    _GLIBCXX_READ_MEM_BARRIER;
-	    _GLIBCXX_WRITE_MEM_BARRIER;
-#endif
+	    // There must be a memory barrier between dispose() and destroy()
+	    // to ensure that the effects of dispose() are observed in the
+	    // thread that runs destroy().
+	    // See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
+	    if (_Mutex_base<_Lp>::_S_need_barriers)
+	      {
+	        _GLIBCXX_READ_MEM_BARRIER;
+	        _GLIBCXX_WRITE_MEM_BARRIER;
+	      }
+
 	    if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
 						       -1) == 1)
 	      _M_destroy();
@@ -159,18 +174,25 @@
       {
 	if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, -1) == 1)
 	  {
-#ifdef __GTHREADS
-	    _GLIBCXX_READ_MEM_BARRIER;
-	    _GLIBCXX_WRITE_MEM_BARRIER;
-#endif
+	    if (_Mutex_base<_Lp>::_S_need_barriers)
+	      {
+	        // See _M_release(),
+	        // destroy() must observe results of dispose()
+	        _GLIBCXX_READ_MEM_BARRIER;
+	        _GLIBCXX_WRITE_MEM_BARRIER;
+	      }
 	    _M_destroy();
 	  }
       }
   
       long
       _M_get_use_count() const // nothrow
-      { return _M_use_count; }  // XXX is this MT safe? 
-      
+      {
+        // We really need an atomic load here, volatile doesn't ensure
+        // writes in other threads synchronize with this read.
+        return const_cast<const volatile _Atomic_word&>(_M_use_count);
+      }
+
     private:  
       _Sp_counted_base(_Sp_counted_base const&);
       _Sp_counted_base& operator=(_Sp_counted_base const&);
@@ -181,18 +203,6 @@
 
   template<>
     inline void
-    _Sp_counted_base<_S_single>::
-    _M_add_ref_lock()
-    {
-      if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1) == 0)
-	{
-	  _M_use_count = 0;
-	  __throw_bad_weak_ptr();
-	}
-    }
-
-  template<>
-    inline void
     _Sp_counted_base<_S_mutex>::
     _M_add_ref_lock()
     {
@@ -224,6 +234,56 @@
 					   __count + 1));
     }
 
+  template<>
+    inline void
+    _Sp_counted_base<_S_single>::
+    _M_add_ref_lock()
+    {
+      if (_M_use_count == 0)
+        __throw_bad_weak_ptr();
+      ++_M_use_count;
+    }
+
+  template<>
+    inline void
+    _Sp_counted_base<_S_single>::
+    _M_add_ref_copy()
+    { ++_M_use_count; }
+    
+  template<>
+    inline void
+    _Sp_counted_base<_S_single>::
+    _M_release()
+    {
+      if (_M_use_count-- == 1)
+        {
+          _M_dispose();
+          if (_M_weak_count-- == 1)
+            _M_destroy();
+        }
+    }
+
+  template<>
+    inline void
+    _Sp_counted_base<_S_single>::
+    _M_weak_add_ref()
+    { ++_M_weak_count; }
+
+  template<>
+    inline void
+    _Sp_counted_base<_S_single>::
+    _M_weak_release()
+    {
+      if (_M_weak_count-- == 1)
+        _M_destroy();
+    }
+
+  template<>
+    inline long
+    _Sp_counted_base<_S_single>::
+    _M_get_use_count() const // nothrow
+    { return _M_use_count; }
+
   template<typename _Ptr, typename _Deleter, _Lock_policy _Lp>
     class _Sp_counted_base_impl
     : public _Sp_counted_base<_Lp>
@@ -523,7 +583,7 @@
 	}
 
       //
-      // Requirements: _Deleter' copy constructor and destructor must not throw
+      // Requirements: _Deleter's copy constructor and destructor must not throw
       //
       // __shared_ptr will release __p by calling __d(__p)
       //
@@ -550,7 +610,6 @@
        *          with @a __r.
        *  @param  __r  A %__shared_ptr.
        *  @post   get() == __r.get() && use_count() == __r.use_count()
-       *  @throw  std::bad_alloc, in which case 
        */
       template<typename _Tp1>
         __shared_ptr(const __shared_ptr<_Tp1, _Lp>& __r)

Follow-Ups:
- Re: Tweaking atomics and memory barriers in shared_ptr
  - From: Peter Dimov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]