Memory barriers vs lock/unlock

Tue Nov 8 19:21:00 GMT 2005

Paolo Carlini wrote:
> Hi,
>
> in our simple port of boost_shared_ptr we have some naively puzzling
> things like:
>
>  void
>  release() // nothrow
>  {
>    if (__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1)
>    {
>      dispose();
>      __glibcxx_mutex_lock(_M_mutex);
>      __glibcxx_mutex_unlock(_M_mutex);
>      weak_release();
>    }
>  }

  void
  weak_release() // nothrow
  {
    if (__gnu_cxx::__exchange_and_add(&_M_weak_count, -1) == 1)
    {
      __glibcxx_mutex_lock(_M_mutex);
      __glibcxx_mutex_unlock(_M_mutex);
      destroy();
    }
  }

> I'm currently investigating that, and I'm not an expert of this area,
> and I'd like to have some help. If I remember correctly some old
> exchanges those "weird" empty lock/unlock stem from the need to add
> memory barriers, nothing more.

Correct. The sequence is thread A invoking release(), dropping the last 
strong reference, then thread B invoking weak_release(), dropping the last 
weak reference. Thread B's destroy() needs to observe the effects of thread 
A's dispose().

It's possible to optimize release a bit by inlining weak_release by hand:

  void
  release() // nothrow
  {
    if (__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1)
    {
      dispose();

      __glibcxx_mutex_lock(_M_mutex);
      __glibcxx_mutex_unlock(_M_mutex);

      if (__gnu_cxx::__exchange_and_add(&_M_weak_count, -1) == 1)
      {
        destroy();
      }
    }
  }

There's no need to lock the mutex twice.

> Therefore, I'm wondering whether we
> wouldn't be best off using right away _GLIBCXX_READ_MEM_BARRIER and
> _GLIBCXX_WRITE_MEM_BARRIER like, for instance, libsupc++/guard.cc is
> already doing.

_GLIBCXX_READ_MEM_BARRIER is a #loadLoad barrier; _GLIBCXX_WRITE_MEM_BARRIER 
is #storeStore. Neither is a full barrier. However, since __exchange_and_add 
is a read-modify-write operation (both a load and a store), a combination of 
_GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER can be used instead 
of the lock/unlock pair. I think. :-)

When __exchange_and_add is implemented in terms of __sync_fetch_and_add, 
which seems to guarantee full ordering, there'll be no need for lock+unlock 
or explicit barriers.