Bug 84323 - call_once uses TLS even when once_flag is set
Summary: call_once uses TLS even when once_flag is set
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 7.3.1
: P3 normal
Target Milestone: 11.0
Assignee: Jonathan Wakely
URL:
Keywords: missed-optimization
Depends on:
Blocks: 66146 55394
  Show dependency treegraph
 
Reported: 2018-02-11 15:41 UTC by Antony Polukhin
Modified: 2020-11-03 18:46 UTC (History)
4 users (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-02-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antony Polukhin 2018-02-11 15:41:28 UTC
Disassembly of the following code:

#include <mutex>

std::once_flag once;

int* foo() {
    static int* p{};
    std::call_once(once,[](){
        p = 0;
    });

    return p;
}

shows that a lot of work is going on the hot path. TLS is used (twice) and there is a function call:

  mov QWORD PTR [rsp+8], rax
  mov rax, QWORD PTR std::__once_callable@gottpoff[rip]
  mov QWORD PTR fs:[rax], rdx
  mov rax, QWORD PTR std::__once_call@gottpoff[rip]
  mov QWORD PTR fs:[rax], OFFSET FLAT:void std::call_once<foo()::{lambda()#1}>(std::once_flag&, foo()::{lambda()#1}&&)::{lambda()#2}::_FUN()
  mov eax, OFFSET FLAT:__gthrw___pthread_key_create(unsigned int*, void (*)(void*))
  test rax, rax
  je .L6
  mov esi, OFFSET FLAT:__once_proxy
  mov edi, OFFSET FLAT:once
  call __gthrw_pthread_once(int*, void (*)())

This seems to be suboptimal, as double-checked-like locking could be used without TLS + 'call' usage on a hot path. std::call_once could be implemented just like thread safe static local variables resulting in a much better disassembly on a hot path:

  movzx eax, BYTE PTR once
  test al, al
  je .L9    ; not called
Comment 1 Antony Polukhin 2018-02-12 08:58:49 UTC
Fixing this will also resolve Bug 55394, because there'll be no need in linking with pthread.
Comment 2 Jonathan Wakely 2018-02-12 13:09:44 UTC
More importantly it would fix PR 66146, which requires us to rewrite call_once anyway.
Comment 3 Antony Polukhin 2018-10-31 13:58:39 UTC
Just noted that libc++ already does this optimization: https://godbolt.org/z/alw1sq

libc++ directly accesses the content of std::once_flag and skips all the thread local accesses if call_once previously succeeded.
Comment 4 GCC Commits 2020-11-03 18:45:06 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:93e79ed391b9c636f087e6eb7e70f14963cd10ad

commit r11-4691-g93e79ed391b9c636f087e6eb7e70f14963cd10ad
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Nov 3 18:44:32 2020 +0000

    libstdc++: Rewrite std::call_once to use futexes [PR 66146]
    
    The current implementation of std::call_once uses pthread_once, which
    only meets the C++ requirements when compiled with support for
    exceptions. For most glibc targets and all non-glibc targets,
    pthread_once does not work correctly if the init_routine exits via an
    exception. The pthread_once_t object is left in the "active" state, and
    any later attempts to run another init_routine will block forever.
    
    This change makes std::call_once work correctly for Linux targets, by
    replacing the use of pthread_once with a futex, based on the code from
    __cxa_guard_acquire. For both glibc and musl, the Linux implementation
    of pthread_once is already based on futexes, and pthread_once_t is just
    a typedef for int, so this change does not alter the layout of
    std::once_flag. By choosing the values for the int appropriately, the
    new code is even ABI compatible. Code that calls the old implementation
    of std::call_once will use pthread_once to manipulate the int, while new
    code will use the new std::once_flag members to manipulate it, but they
    should interoperate correctly. In both cases, the int is initially zero,
    has the lowest bit set when there is an active execution, and equals 2
    after a successful returning execution. The difference with the new code
    is that exceptional exceptions are correctly detected and the int is
    reset to zero.
    
    The __cxa_guard_acquire code (and musl's pthread_once) use an additional
    state to say there are other threads waiting. This allows the futex wake
    syscall to be skipped if there is no contention. Glibc doesn't use a
    waiter bit, so we have to unconditionally issue the wake in order to be
    compatible with code calling the old std::call_once that uses Glibc's
    pthread_once. If we know that we're using musl (and musl's pthread_once
    doesn't change) it would be possible to set a waiting state and check
    for it in std::once_flag::_M_finish(bool), but this patch doesn't do
    that.
    
    This doesn't fix the bug for non-linux targets. A similar approach could
    be used for targets where we know the definition of pthread_once_t is a
    mutex and an integer. We could make once_flag._M_activate() use
    pthread_mutex_lock on the mutex member within the pthread_once_t, and
    then only set the integer if the execution finishes, and then unlock the
    mutex. That would require careful study of each target's pthread_once
    implementation and that work is left for a later date.
    
    This also fixes PR 55394 because pthread_once is no longer needed, and
    PR 84323 because the fast path is now just an atomic load.
    
    As a consequence of the new implementation that doesn't use
    pthread_once, we can also make std::call_once work for targets with no
    gthreads support. The code for the single-threaded implementation
    follows the same methods as on Linux, but with no need for atomics or
    futexes.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/55394
            PR libstdc++/66146
            PR libstdc++/84323
            * config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Add new symbols.
            * include/std/mutex [!_GLIBCXX_HAS_GTHREADS] (once_flag): Define
            even when gthreads is not supported.
            (once_flag::_M_once) [_GLIBCXX_HAVE_LINUX_FUTEX]: Change type
            from __gthread_once_t to int.
            (once_flag::_M_passive(), once_flag::_M_activate())
            (once_flag::_M_finish(bool), once_flag::_Active_execution):
            Define new members for futex and non-threaded implementation.
            [_GLIBCXX_HAS_GTHREADS] (once_flag::_Prepare_execution): New
            RAII helper type.
            (call_once): Use new members of once_flag.
            * src/c++11/mutex.cc (std::once_flag::_M_activate): Define.
            (std::once_flag::_M_finish): Define.
            * testsuite/30_threads/call_once/39909.cc: Do not require
            gthreads.
            * testsuite/30_threads/call_once/49668.cc: Likewise.
            * testsuite/30_threads/call_once/60497.cc: Likewise.
            * testsuite/30_threads/call_once/call_once1.cc: Likewise.
            * testsuite/30_threads/call_once/dr2442.cc: Likewise.
            * testsuite/30_threads/call_once/once_flag.cc: Add test for
            constexpr constructor.
            * testsuite/30_threads/call_once/66146.cc: New test.
            * testsuite/30_threads/call_once/constexpr.cc: Removed.
            * testsuite/30_threads/once_flag/cons/constexpr.cc: Removed.
Comment 5 Jonathan Wakely 2020-11-03 18:46:07 UTC
Fixed