[PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Jonathan Wakely
jwakely.gcc@gmail.com
Wed Aug 4 15:32:36 GMT 2021
On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote:
>
> On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote:
> >
> > This is the right patch. The previous one is missing noexcept. Sorry.
> >
> >
> > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com> wrote:
> >>
> >> Please find attached an updated patch after incorporating Jonathan's suggestions.
> >>
> >> Changes from the last patch include:
> >> - Add a TSAN macro to bits/c++config.
> >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment.
> >> - Move the code in the optimized path to a separate function _M_release_double_width_cas.
>
> Thanks for the updated patch. At a quick glance it looks great. I'll
> apply it locally and test it tomorrow.
It occurs to me now that _M_release_double_width_cas is the wrong
name, because we're not doing a DWCAS, just a double-width load. What
do you think about renaming it to _M_release_combined instead? Since
it does a combined load of the two counts.
I'm no longer confident in the alignof suggestion I gave you.
+ constexpr bool __double_word
+ = sizeof(long long) == 2 * sizeof(_Atomic_word);
+ // The ref-count members follow the vptr, so are aligned to
+ // alignof(void*).
+ constexpr bool __aligned = alignof(long long) <= alignof(void*);
For IA32 (i.e. 32-bit x86) this constant will be true, because
alignof(long long) and alignof(void*) are both 4, even though
sizeof(long long) is 8. So in theory the _M_use_count and
_M_weak_count members could be in different cache lines, and the
atomic load will not be atomic. I think we want to use __alignof(long
long) here to get the "natural" alignment, not the smaller 4B
alignment allowed by SysV ABI. That means that we won't do this
optimization on IA32, but I think that's acceptable.
Alternatively, we could keep using alignof, but with an additional
run-time check something like (uintptr_t)&_M_use_count / 64 ==
(uintptr_t)&_M_weak_count / 64 to check they're on the same cache
line. I think for now we should just use __alignof and live without
the optimization on IA32.
Secondly:
+ void
+ __attribute__ ((noinline))
This needs to be __noinline__ because noinline is not a reserved word,
so users can do:
#define noinline 1
#include <memory>
Was it performance profiling, or code-size measurements (or something
else?) that showed this should be non-inline?
I'd like to add a comment explaining why we want it to be noinline.
In that function ...
+ _M_release_last_use() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _M_dispose();
+ if (_Mutex_base<_Lp>::_S_need_barriers)
+ {
+ __atomic_thread_fence (__ATOMIC_ACQ_REL);
+ }
I think we can remove this fence. The _S_need_barriers constant is
only true for the _S_mutex policy, and that just calls
_M_release_orig(), so never uses this function. I'll remove it and add
a comment noting that the lack of barrier is intentional.
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
+ -1) == 1)
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_destroy();
+ }
+ }
Alternatively, we could keep the fence in _M_release_last_use() and
refactor the other _M_release functions, so that we have explicit
specializations for each of:
_Sp_counted_base<_S_single>::_M_release() (already present)
_Sp_counted_base<_S_mutex>::_M_release()
_Sp_counted_base<_S_atomic>::_M_release()
The second and third would be new, as currently they both use the
definition in the primary template. The _S_mutex one would just
decrement _M_use_count then call _M_release_last_use() (which still
has the barrier needed for the _S_mutex policy). The _S_atomic one
would have your new optimization. See the attached patch showing what
I mean. I find this version a bit simpler to understand, as we just
have _M_release and _M_release_last_use, without
_M_release_double_width_cas and _M_release_orig. What do you think of
this version? Does it lose any important properties of your version
which I've failed to notice?
-------------- next part --------------
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 32b8957f814..07465f7ecd5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -143,6 +143,15 @@
# define _GLIBCXX_NODISCARD
#endif
+// Macro for TSAN.
+#if __SANITIZE_THREAD__
+# define _GLIBCXX_TSAN 1
+#elif defined __has_feature
+# if __has_feature(thread_sanitizer)
+# define _GLIBCXX_TSAN 1
+# endif
+#endif
+
#if __cplusplus
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 5be935d174d..b2397c8fddb 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -143,10 +143,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
virtual void*
_M_get_deleter(const std::type_info&) noexcept = 0;
+ // Increment the use count (used when the count is greater than zero).
void
_M_add_ref_copy()
{ __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
+ // Increment the use count if it is non-zero, throw otherwise.
void
_M_add_ref_lock()
{
@@ -154,15 +156,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__throw_bad_weak_ptr();
}
+ // Increment the use count if it is non-zero.
bool
_M_add_ref_lock_nothrow() noexcept;
+ // Decrement the use count.
void
- _M_release() noexcept
- {
- // Be race-detector-friendly. For more info see bits/c++config.
- _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
- if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ _M_release() noexcept;
+
+ // Called by _M_release() when the use count reaches zero.
+ __attribute__((__noinline__))
+ void
+ _M_release_last_use() noexcept
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
@@ -184,12 +189,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_destroy();
}
}
- }
+ // Increment the weak count.
void
_M_weak_add_ref() noexcept
{ __gnu_cxx::__atomic_add_dispatch(&_M_weak_count, 1); }
+ // Decrement the weak count.
void
_M_weak_release() noexcept
{
@@ -286,6 +292,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
+ template<>
+ inline void
+ _Sp_counted_base<_S_mutex>::_M_release() noexcept
+ {
+ // Be race-detector-friendly. For more info see bits/c++config.
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ {
+ _M_release_last_use();
+ }
+ }
+
+ template<>
+ inline void
+ _Sp_counted_base<_S_atomic>::_M_release() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+#if ! _GLIBCXX_TSAN
+ constexpr bool __lock_free
+ = __atomic_always_lock_free(sizeof(long long), 0)
+ && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
+ constexpr bool __double_word
+ = sizeof(long long) == 2 * sizeof(_Atomic_word);
+ // The ref-count members follow the vptr, so are aligned to
+ // alignof(void*).
+ constexpr bool __aligned = __alignof(long long) <= alignof(void*);
+ if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
+ {
+ constexpr long long __unique_ref
+ = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
+ auto __both_counts = reinterpret_cast<long long*>(&_M_use_count);
+
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__atomic_load_n(__both_counts, __ATOMIC_ACQUIRE) == __unique_ref)
+ {
+ // Both counts are 1, so there are no weak references and
+ // we are releasing the last strong reference. No other
+ // threads can observe the effects of this _M_release()
+ // call (e.g. calling use_count()) without a data race.
+ *(long long*)(&_M_use_count) = 0;
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_dispose();
+ _M_destroy();
+ __builtin_puts("double double bye bye");
+ return;
+ }
+ }
+#endif
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ {
+ _M_release_last_use();
+ }
+ }
+
template<>
inline void
_Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept
More information about the Libstdc++
mailing list