debug container mutex association

Thu Sep 15 08:51:00 GMT 2016

On 14/09/16 22:17 +0200, FranÃ§ois Dumont wrote:
>On 14/09/2016 11:00, Jonathan Wakely wrote:
>>On 13/09/16 22:37 +0200, FranÃ§ois Dumont wrote:
>>>Hi
>>>
>>>   When I proposed to change std::hash for pointers my plan was to 
>>>use this approach for the debug containers. So here is the patch 
>>>leveraging on this technique to avoid going through _Hash_impl to 
>>>hash address and get mutex index from it. I think it is obious 
>>>that new access is faster so I didn't wrote a performance test to 
>>>demonstrate it.
>>
>>Is this actually a bottleneck that needs to be made faster?
>>
>>I know it's nice if Debug Mode isn't very slow, but you've already
>>made it much better, and performance is not the primary goal of Debug
>>Mode.
>
>Sure but my approach is that if I can make something faster then I 
>just try. I considered that making this mode faster will allow its 
>usage in an even more extended way.
>
>>
>>>diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
>>>b/libstdc++-v3/config/abi/pre/gnu.ver
>>>index 9b5bb23..c9a253e 100644
>>>--- a/libstdc++-v3/config/abi/pre/gnu.ver
>>>+++ b/libstdc++-v3/config/abi/pre/gnu.ver
>>>@@ -1947,6 +1947,21 @@ GLIBCXX_3.4.23 {
>>>    _ZNSsC[12]ERKSs[jmy]RKSaIcE;
>>>    _ZNSbIwSt11char_traitsIwESaIwEEC[12]ERKS2_mRKS1_;
>>>
>>>+    # __gnu_debug::_Safe_sequence_base
>>>+ _ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEm;
>>
>>The 'm' here should be [jmy] because std::size_t mangles differently
>>on different targets.
>>
>>>+ _ZN11__gnu_debug19_Safe_sequence_base22_M_revalidate_singularEm;
>>>+    _ZN11__gnu_debug19_Safe_sequence_base12_M_get_mutexEm;
>>>+    _ZN11__gnu_debug19_Safe_sequence_base7_M_swapERS0_m;
>>>+
>>>+    # __gnu_debug::_Safe_iterator_base
>>>+ _ZN11__gnu_debug19_Safe_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
>>>+    _ZN11__gnu_debug19_Safe_iterator_base9_M_detachEm;
>>>+
>>>+    # __gnu_debug::_Safe_unordered_container_base and 
>>>_Safe_local_iterator_base
>>>+ _ZN11__gnu_debug30_Safe_unordered_container_base7_M_swapERS0_m;
>>>+ _ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
>>>+    _ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEm;
>>>+
>>>} GLIBCXX_3.4.22;
>>
>>It's unfortunate to have to export and maintain new symbols when we
>>don't offer the same level of ABI guarantee for Debug Mode  :-(
>
>Yes, I have been told that Debug ABI compatibility was not guaranteed 
>but at the same time if we don't do so abi-check is failing. Couldn't 
>we have a special section in gnu.ver for unversioned symbols like 
>those ?

I we could, but making them unversioned wouldn't really be useful - we
still need to keep them in the library forever.

>>>@@ -174,6 +191,18 @@ namespace __gnu_debug
>>>            const _Safe_iterator_base* __last)
>>>  { return __first->_M_can_compare(*__last); }
>>>
>>>+  // Compute power of 2 not greater than __n.
>>>+  template<size_t __n>
>>>+    struct _Lowest_power_of_two
>>>+    {
>>>+      static const size_t __val
>>>+        = _Lowest_power_of_two< (__n >> 1) >::__val + 1;
>>>+    };
>>>+
>>>+  template<>
>>>+    struct _Lowest_power_of_two<1>
>>>+    { static const size_t __val = 1; };
>>
>>The lowest power of two not greater than 1 is 2^0, but this gives the
>>result 1.
>>
>>Similarly, the lowest power of two not greater than 2 is 2^1, i.e. 1.
>
>Indeed, when writing the test case I decided to return 1 rather than 0 
>for the _Lowest_power_of_two<1> specialization.It gave a better 
>container instance - mutex mapping. But I forgot to change its name.
>
>>
>>And the name is misleading, it doesn't compute a power of two, it
>>computes the base-2 logarithm (then adds one), so it's a compile-time
>>version of floor(log2(n))+1.
>>
>>>@@ -123,6 +125,36 @@ namespace __gnu_debug
>>>      template<typename _Predicate>
>>>    void
>>>    _M_transfer_from_if(_Safe_sequence& __from, _Predicate __pred);
>>>+
>>>+      static _GLIBCXX_CONSTEXPR std::size_t
>>>+      _S_alignment() _GLIBCXX_NOEXCEPT
>>>+      { return _Lowest_power_of_two<__alignof(_Sequence)>::__val; }
>>
>>This function is called "alignment" but it returns log2(alignof(T))+1
>
>Yes, I wasn't proud about the naming.
>
>>
>>Does it need to be constexpr? Is that for std::array?
>
>No, it is not used in std::array context. I just considered that it 
>could be constexpr.

OK. Sometimes that's useful, I agree. But it's never used in a
constexpr context, and it's only called by the library directly, so
there's no advantage here (it just makes it harder to write because it
can only use code that's valid in a constant expression).

Anyway, you can write it without a template, and still be constexpr:

>>
>>You can get the same result without _Lowest_power_of_two:
>>
>> static _GLIBCXX_CONSTEXPR size_t
>> _S_alignbits() _GLIBCXX_NOEXCEPT
>> { return __builtin_ctz(__alignof(_Sequence)) + 1; }
>>
>>But I'm not convinced the +1 is correct, see below.
>>
>>>@@ -46,15 +47,30 @@ namespace
>>>  /** Returns different instances of __mutex depending on the 
>>>passed address
>>>   *  in order to limit contention without breaking current 
>>>library binary
>>>   *  compatibility. */
>>>+  const size_t mask = 0xf;
>>>+
>>>  __gnu_cxx::__mutex&
>>>-  get_safe_base_mutex(void* __address)
>>>+  get_mutex(size_t index)
>>>  {
>>>-    const size_t mask = 0xf;
>>>    static __gnu_cxx::__mutex safe_base_mutex[mask + 1];
>>>-    const size_t index = _Hash_impl::hash(__address) & mask;
>>>    return safe_base_mutex[index];
>>>  }
>>
>>N.B. https://gcc.gnu.org/PR71312
>Ok, debug mode is also impacted. Shouldn't the alignment be set on the 
>__mutex type directly ?

No, definitely not. Apart from the fact that it would be an ABI
change, it would mean that struct X { __mutex mx; int i; } would not
place the int right next to the mutex, it would force it onto a
different cacheline. We want our arrays of mutexes to be on separate
cachelines, but most uses of __mutex are not in an array.

We probably can't fix this properly yet, because we don't have the
hardware_destructive_interference_size value. We could just make a
conservative estimate of 64 bytes though.

>>
>>
>>>
>>>+  __gnu_cxx::__mutex&
>>>+  get_safe_base_mutex(void* address)
>>>+  {
>>>+    const size_t index = _Hash_impl::hash(address) & mask;
>>>+    return get_mutex(index);
>>>+  }
>>>+
>>>+  __gnu_cxx::__mutex&
>>>+  get_safe_base_mutex(void* address, std::size_t alignment)
>>>+  {
>>>+    const size_t index
>>>+      = (reinterpret_cast<size_t>(address) >> alignment) & mask;
>>>+    return get_mutex(index);
>>>+  }
>>
>>Since the "alignment" value (which is really log2(align)+1) is always
>>a positive value that means we always shift the address. But that
>>means we drop the least significant bit of all pointers, even when the
>>type has alignof(T)==1 and so all bits in the pointer are significant.
>>Is that intentional? Is the idea to only drop bits that are always
>>zero?
>
>Yes, to make sure we get use of all available mutexes.

OK, so the +1 is not needed?

>>
>>By changing the algorithm we use to map an address to a mutex do we
>>allow programs to get two different mutexes for the same object? That
>>can only happen when some objects are compiled with an older GCC and
>>so still use the old _M_get_mutex() function that doesn't take an
>>argument. We require all objects to be rebuilt with the same compiler
>>for Debug Mode (but maybe not if using __gnu_debug::vector etc.
>>explicitly in normal mode?) so that's not a big deal. But it's a pity
>>we have to keep the old symbols around if using them is unsafe.
>
>Yes, it could happen even if very unlikely. I would also prefer to get 
>rid of former symbols but can we ?

No. Objects built against an older libstdc++ would fail to run when
the dynamic linker finds a new libstdc++.so.

We can't remove exported symbols from the DSO.

>>
>>>  void
>>>  swap_its(__gnu_debug::_Safe_sequence_base& __lhs,
>>>       __gnu_debug::_Safe_iterator_base*& __lhs_its,
>>>@@ -70,8 +86,8 @@ namespace
>>>  }
>>>
>>>  void
>>>-  swap_seq(__gnu_debug::_Safe_sequence_base& __lhs,
>>>-       __gnu_debug::_Safe_sequence_base& __rhs)
>>>+  swap_seq_single(__gnu_debug::_Safe_sequence_base& __lhs,
>>>+          __gnu_debug::_Safe_sequence_base& __rhs)
>>>  {
>>>    swap(__lhs._M_version, __rhs._M_version);
>>>    swap_its(__lhs, __lhs._M_iterators,
>>>@@ -81,10 +97,41 @@ namespace
>>>  }
>>>
>>>  void
>>>-  swap_ucont(__gnu_debug::_Safe_unordered_container_base& __lhs,
>>>-        __gnu_debug::_Safe_unordered_container_base& __rhs)
>>>+  lock_and_run(__gnu_cxx::__mutex& lhs_mutex,
>>>+           __gnu_cxx::__mutex& rhs_mutex,
>>>+           std::function<void()> action)
>>
>>Do all calls to this fit into the std::function small-object buffer,
>>so we don't need to allocate memory for it?
>>
>I didn't consider this. I thought it was a nice way to avoid making 
>lock_and_run a template function to take the lambda.
>
>Reading this I realize that we could perhaps acheive the same without 
>touching as much code. __alignof of container types will always give 
>the same value on a give target no ? Does it depends on the type used 
>to instantiate the container ? Isn't there some macro giving the 
>target alignment that I could use ?

Alignment isn't a single value for a target, e.g. for some targets
alignof(void*) != alignof(int), so which "target alignment" do you
want?

It seems likely that all the debug containers will be aligned to at
least alignof(void*), because of the _Safe_base::_M_sequence pointer
member. It should be OK to assume alignof(*this) == alignof(void*).