Bug 101406 - poor codegen for shared_ptr copy & destory, compared to boost::shared_ptr and libc++'s implementation
Summary: poor codegen for shared_ptr copy & destory, compared to boost::shared_ptr and...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 11.1.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-10 20:51 UTC by Marc Mutz
Modified: 2021-07-12 14:21 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Mutz 2021-07-10 20:51:00 UTC
Consider

     // https://godbolt.org/z/efTW6MoEh
     void test_copy(const std::shared_ptr<int> &sp)
     { auto copy = sp; }

vs.

     // https://godbolt.org/z/3aoGq1f9P
     void test_copy(const boost::shared_ptr<int> &sp)
     { auto copy = sp; }

In the first cast, over 70 lines of assembler are emitted, in the second, around 30. This seems to be in large part because in _Sp_counted_base::_M_add_ref_copy(), you're using __atomic_add_dispatch() even if _Lp is _S_atomic. It seems to me that a specialisation of this function template for _S_atomic calling just __atomic_add() is missing: https://godbolt.org/z/crPz9hGe7

Probably same for the deref case, too.
Comment 1 Marc Mutz 2021-07-11 08:52:40 UTC
Comparison to the other two major standard library implementations: https://godbolt.org/z/crPo44rxo
Comment 2 Jonathan Wakely 2021-07-12 10:44:50 UTC
(In reply to Marc Mutz from comment #0)
> Consider
> 
>      // https://godbolt.org/z/efTW6MoEh

N.B. -DNDEBUG has no effect on anything in libstdc++, it's not allowed to use <assert.h> anywhere.

>      void test_copy(const std::shared_ptr<int> &sp)
>      { auto copy = sp; }
> 
> vs.
> 
>      // https://godbolt.org/z/3aoGq1f9P
>      void test_copy(const boost::shared_ptr<int> &sp)
>      { auto copy = sp; }
> 
> In the first cast, over 70 lines of assembler are emitted, in the second,
> around 30. This seems to be in large part because in
> _Sp_counted_base::_M_add_ref_copy(), you're using __atomic_add_dispatch()
> even if _Lp is _S_atomic.

That's by design:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/memory.html#shared_ptr.policy
"For all three policies, reference count increments and decrements are done via the functions in ext/atomicity.h, which detect if the program is multi-threaded. If only one thread of execution exists in the program then less expensive non-atomic operations are used."

> It seems to me that a specialisation of this
> function template for _S_atomic calling just __atomic_add() is missing:

No, the _S_atomic policy doesn't mean "use atomics unconditionally". We still go through __atomic_add_dispatch so that we avoid the atomics in single-threaded programs.

With a new glibc the check for multiple threads is cheaper and should optimize better.
Comment 3 Marc Mutz 2021-07-12 14:21:54 UTC
Ok, it doesn't seem to affect execution speed much, but oh my, the code size :/