[committed] libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

Christophe Lyon christophe.lyon@linaro.org
Thu Oct 1 07:30:43 GMT 2020


On Wed, 30 Sep 2020 at 22:44, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On 30/09/20 16:03 +0100, Jonathan Wakely wrote:
> >On 29/09/20 13:51 +0200, Christophe Lyon via Libstdc++ wrote:
> >>On Sat, 26 Sep 2020 at 21:42, Jonathan Wakely via Gcc-patches
> >><gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>Glibc 2.32 adds a global variable that says whether the process is
> >>>single-threaded. We can use this to decide whether to elide atomic
> >>>operations, as a more precise and reliable indicator than
> >>>__gthread_active_p.
> >>>
> >>>This means that guard variables for statics and reference counting in
> >>>shared_ptr can use less expensive, non-atomic ops even in processes that
> >>>are linked to libpthread, as long as no threads have been created yet.
> >>>It also means that we switch to using atomics if libpthread gets loaded
> >>>later via dlopen (this still isn't supported in general, for other
> >>>reasons).
> >>>
> >>>We can't use __libc_single_threaded to replace __gthread_active_p
> >>>everywhere. If we replaced the uses of __gthread_active_p in std::mutex
> >>>then we would elide the pthread_mutex_lock in the code below, but not
> >>>the pthread_mutex_unlock:
> >>>
> >>>  std::mutex m;
> >>>  m.lock();            // pthread_mutex_lock
> >>>  std::thread t([]{}); // __libc_single_threaded = false
> >>>  t.join();
> >>>  m.unlock();          // pthread_mutex_unlock
> >>>
> >>>We need the lock and unlock to use the same "is threading enabled"
> >>>predicate, and similarly for init/destroy pairs for mutexes and
> >>>condition variables, so that we don't try to release resources that were
> >>>never acquired.
> >>>
> >>>There are other places that could use __libc_single_threaded, such as
> >>>_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
> >>>they can be changed later.
> >>>
> >>>libstdc++-v3/ChangeLog:
> >>>
> >>>        PR libstdc++/96817
> >>>        * include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
> >>>        New function wrapping __libc_single_threaded if available.
> >>>        (__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
> >>>        * libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
> >>>        (__cxa_guard_release): Likewise.
> >>>        * testsuite/18_support/96817.cc: New test.
> >>>
> >>>Tested powerpc64le-linux, with glibc 2.31 and 2.32. Committed to trunk.
> >>
> >>Hi,
> >>
> >>This patch introduced regressions on armeb-linux-gnueabhf:
> >>--target armeb-none-linux-gnueabihf --with-cpu cortex-a9
> >>   g++.dg/compat/init/init-ref2 cp_compat_x_tst.o-cp_compat_y_tst.o execute
> >>   g++.dg/cpp2a/decomp1.C  -std=gnu++14 execution test
> >>   g++.dg/cpp2a/decomp1.C  -std=gnu++17 execution test
> >>   g++.dg/cpp2a/decomp1.C  -std=gnu++2a execution test
> >>   g++.dg/init/init-ref2.C  -std=c++14 execution test
> >>   g++.dg/init/init-ref2.C  -std=c++17 execution test
> >>   g++.dg/init/init-ref2.C  -std=c++2a execution test
> >>   g++.dg/init/init-ref2.C  -std=c++98 execution test
> >>   g++.dg/init/ref15.C  -std=c++14 execution test
> >>   g++.dg/init/ref15.C  -std=c++17 execution test
> >>   g++.dg/init/ref15.C  -std=c++2a execution test
> >>   g++.dg/init/ref15.C  -std=c++98 execution test
> >>   g++.old-deja/g++.jason/pmf7.C  -std=c++98 execution test
> >>   g++.old-deja/g++.mike/leak1.C  -std=c++14 execution test
> >>   g++.old-deja/g++.mike/leak1.C  -std=c++17 execution test
> >>   g++.old-deja/g++.mike/leak1.C  -std=c++2a execution test
> >>   g++.old-deja/g++.mike/leak1.C  -std=c++98 execution test
> >>   g++.old-deja/g++.other/init19.C  -std=c++14 execution test
> >>   g++.old-deja/g++.other/init19.C  -std=c++17 execution test
> >>   g++.old-deja/g++.other/init19.C  -std=c++2a execution test
> >>   g++.old-deja/g++.other/init19.C  -std=c++98 execution test
> >>
> >>and probably some (280) in libstdc++ tests: (I didn't bisect those):
> >>   19_diagnostics/error_category/generic_category.cc execution test
> >>   19_diagnostics/error_category/system_category.cc execution test
> >>   20_util/scoped_allocator/1.cc execution test
> >>   20_util/scoped_allocator/2.cc execution test
> >>   20_util/scoped_allocator/construct_pair_c++2a.cc execution test
> >>   20_util/to_address/debug.cc execution test
> >>   20_util/variant/run.cc execution test
> >
> >I think this is a latent bug in the static initialization code for
> >EABI that affects big endian. In libstdc++-v3/libsupc++/guard.cc we
> >have:
> >
> ># ifndef _GLIBCXX_GUARD_TEST_AND_ACQUIRE
> >
> >// Test the guard variable with a memory load with
> >// acquire semantics.
> >
> >inline bool
> >__test_and_acquire (__cxxabiv1::__guard *g)
> >{
> >  unsigned char __c;
> >  unsigned char *__p = reinterpret_cast<unsigned char *>(g);
> >  __atomic_load (__p, &__c,  __ATOMIC_ACQUIRE);
> >  (void) __p;
> >  return _GLIBCXX_GUARD_TEST(&__c);
> >}
> >#  define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __test_and_acquire (G)
> ># endif
> >
> >That inspects the first byte of the guard variable. But for EABI the
> >"is initialized" bit is the least significant bit of the guard
> >variable. For little endian that's fine, the least significant bit is
> >in the first byte. But for big endian, it's not in the first byte, so
> >we are looking in the wrong place. This means that the initial check
> >in __cxa_guard_acquire is wrong:
> >
> >  extern "C"
> >  int __cxa_guard_acquire (__guard *g)
> >  {
> >#ifdef __GTHREADS
> >    // If the target can reorder loads, we need to insert a read memory
> >    // barrier so that accesses to the guarded variable happen after the
> >    // guard test.
> >    if (_GLIBCXX_GUARD_TEST_AND_ACQUIRE (g))
> >      return 0;
> >
> >This will always be false for big endian EABI, which means we run the
> >rest of the function even when the variable is initialized. Previously
> >that still gave the right answer, but inefficiently. After my change
> >it gives the wrong answer, because the new code assumes that the check
> >right at the start of __cxa_guard_acquire ACTUALLY WORKS. Silly me.
> >
> >I think the attached should fix it. This should be backported to fix
> >inefficient code for static init on big endian EABI targets.
>
>
> Committed as r11-3572, but I haven't been able to test it on a big
> endian EABI system, only little endian EABI.
>
Hi Jonathan,

unfortunately, this breaks the build even for little-endian for me:
9980570_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include
-I/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++
-D_GLIBCXX_SHARED -fno-implicit-templates -Wall -Wextra
-Wwrite-strings -Wcast-qual -Wabi=2 -fdiagnostics-show-location=once
-ffunction-sections -fdata-sections -frandom-seed=guard.lo -g -O2
-D_GNU_SOURCE -c
/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++/guard.cc
 -fPIC -DPIC -D_GLIBCXX_SHARED -o guard.o
In file included from
/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++/cxxabi.h:50,
                 from
/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++/guard.cc:28:
/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++/guard.cc:
In function 'int
__cxxabiv1::__cxa_guard_acquire(__cxxabiv1::__guard*)':
/tmp/9980570_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++/guard.cc:249:9:
error: invalid type argument of unary '*' (have 'int')
  249 |     if (_GLIBCXX_GUARD_TEST_AND_ACQUIRE (g))
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[4]: *** [Makefile:763: guard.lo] Error 1

How did you test it? Maybe I'm missing something?

Thanks,

Christophe


More information about the Gcc-patches mailing list