[PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

Jonathan Wakely jwakely.gcc@gmail.com
Wed Aug 4 15:52:02 GMT 2021


On Wed, 4 Aug 2021 at 16:47, Maged Michael <maged.michael@gmail.com> wrote:
>
> Thanks, Jonathan!
>
> On Wed, Aug 4, 2021 at 11:32 AM Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>
>> On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote:
>> >
>> > On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote:
>> > >
>> > > This is the right patch. The previous one is missing noexcept. Sorry.
>> > >
>> > >
>> > > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com> wrote:
>> > >>
>> > >> Please find attached an updated patch after incorporating Jonathan's suggestions.
>> > >>
>> > >> Changes from the last patch include:
>> > >> - Add a TSAN macro to bits/c++config.
>> > >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment.
>> > >> - Move the code in the optimized path to a separate function _M_release_double_width_cas.
>> >
>> > Thanks for the updated patch. At a quick glance it looks great. I'll
>> > apply it locally and test it tomorrow.
>>
>>
>> It occurs to me now that _M_release_double_width_cas is the wrong
>> name, because we're not doing a DWCAS, just a double-width load. What
>> do you think about renaming it to _M_release_combined instead? Since
>> it does a combined load of the two counts.
>
>
> Yes definitely _M_release_combined makes sense. Will do.

See the patch I attached to my previous mail, which has a refactored
version that gets rid of that function entirely.

>
>>
>> I'm no longer confident in the alignof suggestion I gave you.
>>
>> +        constexpr bool __double_word
>> +          = sizeof(long long) == 2 * sizeof(_Atomic_word);
>> +        // The ref-count members follow the vptr, so are aligned to
>> +        // alignof(void*).
>> +        constexpr bool __aligned = alignof(long long) <= alignof(void*);
>>
>> For IA32 (i.e. 32-bit x86) this constant will be true, because
>> alignof(long long) and alignof(void*) are both 4, even though
>> sizeof(long long) is 8. So in theory the _M_use_count and
>> _M_weak_count members could be in different cache lines, and the
>> atomic load will not be atomic. I think we want to use __alignof(long
>> long) here to get the "natural" alignment, not the smaller 4B
>> alignment allowed by SysV ABI. That means that we won't do this
>> optimization on IA32, but I think that's acceptable.
>>
>> Alternatively, we could keep using alignof, but with an additional
>> run-time check something like (uintptr_t)&_M_use_count / 64 ==
>> (uintptr_t)&_M_weak_count / 64 to check they're on the same cache
>> line. I think for now we should just use __alignof and live without
>> the optimization on IA32.
>>
> I'd rather avoid any runtime checks because they may negate the speed rationale for doing this optimization.
> I'd be OK with not doing this optimization for any 32-bit architecture.
>
> Is it OK to change the __align condition to the following?
> constexpr bool __aligned =
>   (alignof(long long) <= alignof(void*))
>   && (sizeof(long long) == sizeof(void*));

Yes, that will work fine.


More information about the Gcc-patches mailing list