Bug 99341 - [11 Regression] new std::call_once is not backwards compatible
Summary: [11 Regression] new std::call_once is not backwards compatible
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 11.0
: P1 blocker
Target Milestone: 11.0
Assignee: Jonathan Wakely
URL:
Keywords: ABI
Depends on:
Blocks:
 
Reported: 2021-03-02 13:11 UTC by Jonathan Wakely
Modified: 2021-03-16 13:58 UTC (History)
3 users (show)

See Also:
Host:
Target: *-*-*linux*
Build:
Known to work: 10.2.1
Known to fail: 11.0
Last reconfirmed: 2021-03-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Wakely 2021-03-02 13:11:03 UTC
Creating a new bug for bug 66146 comment 41 increased visibility, and to deal with it separately from the exception-handling bug that is the topic of bug 66146.

Quoting from there:

The new std::call_once using a futex is not backwards compatible, so I think it needs to be reverted, or hidden behind an ABI-breaking flag.

The new std::once_flag::_M_activate() function sets _M_once=1 when an initialization is in progress.

With glibc, if a call to the new std::call_once happens before a call to the old version of std::call_once, then glibc's pthread_once will find no fork generation value in _M_once and so will think it should run the init_function itself. Both threads will run their init_function, instead of the second one waiting for the first to finish.

With musl, if a call to the old std::call_once happens before a call to the new std::call_once, then the second thread won't set _M_once=3 and so musl's pthread_once won't wake the second thread when the first finishes. The second thread will sleep forever (or until a spurious wake from the futex wait).
Comment 1 Jonathan Wakely 2021-03-02 13:12:34 UTC
More details ...

Although libstdc++ ignores any fork generation bits, glibc doesn't and
it *requires* a (possibly zero) fork generation to be present.

The __pthread_once_slow function uses CAS to set the once_control
variable, and if that succeeds and the IN_PROGRESS bit is set, it
compares the high bits with the current fork generation.

If the IN_PROGRESS bit was set by the new libstdc++ std::call_once
there won't be a fork generation. If the process' fork generation is
not zero then this check in glibc will be false:

          /* Check whether the initializer execution was interrupted by a
             fork.  We know that for both values, __PTHREAD_ONCE_INPROGRESS
             is set and __PTHREAD_ONCE_DONE is not.  */
          if (val == newval)
            {
              /* Same generation, some other thread was faster.  Wait and
                 retry.  */
              futex_wait_simple ((unsigned int *) once_control,
                                 (unsigned int) newval, FUTEX_PRIVATE);
              continue;
            }

If std::call_once in another thread set val=2 but the fork gen is 4,
then newval=4|1 and so val == newval is false. That means glibc
assumes that an in-progress initialization was interrupted by a fork
and so this thread should run it. That means two threads will be
running it at once.

Demo:

#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <iostream>
#include <mutex>

extern std::once_flag once_flag;
extern void old();

#if __GNUC__ >= 11
std::once_flag once_flag;

int main()
{
  // increase for generation:
  switch (fork())
  {
  case -1:
    abort();
  case 0:
    break;
  default:
    wait(nullptr);
    return 0;
  }

  // This is the child process, fork generation is non-zero.

  std::call_once(once_flag, [] {
    std::cout << "Active execution started using new code\n";
    old();
    std::cout << "Active execution finished using new code\n";
  });
}
#else
void old()
{
  std::call_once(once_flag, [] {
    std::cout << "Active execution started using old code\n";
    std::cout << "Active execution finished using old code\n";
  });
}
#endif

Compile this once with GCC 11 and once with GCC 10 and link with GCC
11, and instead of deadlocking (as it should do) you'll get:

Active execution started using new code
Active execution started using old code
Active execution finished using old code
Active execution finished using new code

And using a libstdc++ build with _GLIBCXX_ASSERTIONS you'll get:
/home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/mutex.cc:79: void std::once_flag::_M_finish(bool): Assertion 'prev & _Bits::_Active' failed.

because the "inner" execution changes the state to _Done when it
finishes, and the assertion in the "outer" execution fails.
Comment 2 Jonathan Wakely 2021-03-02 13:23:09 UTC
The new implementation added these new symbols to libstdc++.so:

_ZNSt9once_flag11_M_activateEv
_ZNSt9once_flag9_M_finishEb

I think I'd like to keep them, but have the new implementation disabled by default (except for the ABI-changing gnu-versioned-namespace build).

The new implementation is superior (it fixes PR 66146, and performs better in some cases due to inlining an initial check to see if executions should be passive) so I'd like to retain it for users who explicitly request it. By default we need to be compatible with the old version based on pthread_once.

At some point (maybe when there are further ABI changes queued) maybe we'll do another painful transition like the cxx11-abi one and switch to the new std::once_flag by default.

Maybe the new std::once_flag should be moved to an inline namespace, and then the symbols above defined as aliases for the equivalents using the inline namespace in the mangled name (the aliases are needed if we want to support binaries that are already using them, e.g. packages in the upcoming Fedora 34 which were compiled with gcc-11.0.1 already).
Comment 3 Jakub Jelinek 2021-03-03 16:59:24 UTC
A glibc fix has been posted: https://sourceware.org/pipermail/libc-alpha/2021-March/123167.html
Comment 5 CVS Commits 2021-03-16 12:39:48 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:6ee24638ed0ad51e568c799bacf149ba9bd7628b

commit r11-7688-g6ee24638ed0ad51e568c799bacf149ba9bd7628b
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Mar 12 11:47:20 2021 +0000

    libstdc++: Revert to old std::call_once implementation [PR 99341]
    
    The new std::call_once implementation is not backwards compatible,
    contrary to my intention. Because std::once_flag::_M_active() doesn't
    write glibc's "fork generation" into the pthread_once_t object, it's
    possible for glibc and libstdc++ to run two active executions
    concurrently. This violates the primary invariant of the feature!
    
    This patch reverts std::once_flag and std::call_once to the old
    implementation that uses pthread_once. This means PR 66146 is a problem
    again, but glibc has been changed to solve that. A new API similar to
    pthread_once but supporting failure and resetting the pthread_once_t
    will be proposed for inclusion in glibc and other C libraries.
    
    This change doesn't simply revert r11-4691 because I want to retain the
    new implementation for non-ghtreads targets (which didn't previously
    support std::call_once at all, so there's no backwards compatibility
    concern). This also leaves the new std::call_once::_M_activate() and
    std::call_once::_M_finish(bool) symbols present in libstdc++.so.6 so
    that code already compiled against GCC 11 can still use them. Those
    symbols will be removed in a subsequent commit (which distros can choose
    to temporarily revert if needed).
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/99341
            * include/std/mutex [_GLIBCXX_HAVE_LINUX_FUTEX] (once_flag):
            Revert to pthread_once_t implementation.
            [_GLIBCXX_HAVE_LINUX_FUTEX] (call_once): Likewise.
            * src/c++11/mutex.cc [_GLIBCXX_HAVE_LINUX_FUTEX]
            (struct __once_flag_compat): New type matching the reverted
            implementation of once_flag using futexes.
            (once_flag::_M_activate): Remove, replace with ...
            (_ZNSt9once_flag11_M_activateEv): ... alias symbol.
            (once_flag::_M_finish): Remove, replace with ...
            (_ZNSt9once_flag9_M_finishEb): ... alias symbol.
            * testsuite/30_threads/call_once/66146.cc: Removed.
Comment 6 CVS Commits 2021-03-16 12:39:54 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:995a740cb01a0671a2082cb1ae13d0c356d4b568

commit r11-7689-g995a740cb01a0671a2082cb1ae13d0c356d4b568
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Mar 12 11:47:20 2021 +0000

    libstdc++: Remove symbols for new std::call_once implementation [PR 99341]
    
    This removes the new symbols added for the new futex-based
    std::call_once implementation. These symbols were new on trunk, so not
    in any released version. However, they are already present in some
    beta distro releases (Fedora Linux 34) and in Fedora Linux rawhide. This
    change can be locally reverted by distros that need to keep the symbols
    present until affected packages have been rebuilt.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/99341
            * config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Remove
            std::once_flag symbols.
            * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise.
            * config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Likewise.
            * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
            Likewise.
            * config/abi/pre/gnu.ver: Likewise.
            * src/c++11/mutex.cc [_GLIBCXX_HAVE_LINUX_FUTEX]
            (struct __once_flag_compat): Remove.
            (_ZNSt9once_flag11_M_activateEv): Remove.
            (_ZNSt9once_flag9_M_finishEb): Remove.
Comment 7 Jakub Jelinek 2021-03-16 13:58:04 UTC
Fixed then.