Consider // https://godbolt.org/z/efTW6MoEh void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; } vs. // https://godbolt.org/z/3aoGq1f9P void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; } In the first cast, over 70 lines of assembler are emitted, in the second, around 30. This seems to be in large part because in _Sp_counted_base::_M_add_ref_copy(), you're using __atomic_add_dispatch() even if _Lp is _S_atomic. It seems to me that a specialisation of this function template for _S_atomic calling just __atomic_add() is missing: https://godbolt.org/z/crPz9hGe7 Probably same for the deref case, too.
Comparison to the other two major standard library implementations: https://godbolt.org/z/crPo44rxo
(In reply to Marc Mutz from comment #0) > Consider > > // https://godbolt.org/z/efTW6MoEh N.B. -DNDEBUG has no effect on anything in libstdc++, it's not allowed to use <assert.h> anywhere. > void test_copy(const std::shared_ptr<int> &sp) > { auto copy = sp; } > > vs. > > // https://godbolt.org/z/3aoGq1f9P > void test_copy(const boost::shared_ptr<int> &sp) > { auto copy = sp; } > > In the first cast, over 70 lines of assembler are emitted, in the second, > around 30. This seems to be in large part because in > _Sp_counted_base::_M_add_ref_copy(), you're using __atomic_add_dispatch() > even if _Lp is _S_atomic. That's by design: https://gcc.gnu.org/onlinedocs/libstdc++/manual/memory.html#shared_ptr.policy "For all three policies, reference count increments and decrements are done via the functions in ext/atomicity.h, which detect if the program is multi-threaded. If only one thread of execution exists in the program then less expensive non-atomic operations are used." > It seems to me that a specialisation of this > function template for _S_atomic calling just __atomic_add() is missing: No, the _S_atomic policy doesn't mean "use atomics unconditionally". We still go through __atomic_add_dispatch so that we avoid the atomics in single-threaded programs. With a new glibc the check for multiple threads is cheaper and should optimize better.
Ok, it doesn't seem to affect execution speed much, but oh my, the code size :/