[PATCH] libstdc++: Add support for C++20 barriers

Jonathan Wakely jwakely@redhat.com
Wed Nov 4 18:50:08 GMT 2020


On 04/11/20 09:29 -0800, Thomas Rodgers wrote:
>From: Thomas Rodgers <trodgers@redhat.com>
>
>Adds <barrier>
>
>libstdc++/ChangeLog:
>
>	* include/Makefile.am (std_headers): Add new header.
>	* include/Makefile.in: Regenerate.
>	* include/std/barrier: New file.
>	* testsuite/30_thread/barrier/1.cc: New test.
>	* testsuite/30_thread/barrier/2.cc: Likewise.
>	* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>	* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>	* testsuite/30_thread/barrier/arrive.cc: Likewise.
>	* testsuite/30_thread/barrier/completion.cc: Likewise.
>	* testsuite/30_thread/barrier/max.cc: Likewise.
>---
> libstdc++-v3/include/std/barrier              | 248 ++++++++++++++++++
> .../testsuite/30_threads/barrier/1.cc         |  27 ++
> .../testsuite/30_threads/barrier/2.cc         |  27 ++
> .../testsuite/30_threads/barrier/arrive.cc    |  51 ++++
> .../30_threads/barrier/arrive_and_drop.cc     |  49 ++++
> .../30_threads/barrier/arrive_and_wait.cc     |  51 ++++
> .../30_threads/barrier/completion.cc          |  54 ++++
> .../testsuite/30_threads/barrier/max.cc       |  44 ++++
> 8 files changed, 551 insertions(+)
> create mode 100644 libstdc++-v3/include/std/barrier
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>
>diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
>new file mode 100644
>index 00000000000..80e6d668cf5
>--- /dev/null
>+++ b/libstdc++-v3/include/std/barrier
>@@ -0,0 +1,248 @@
>+// <barrier> -*- C++ -*-
>+
>+// Copyright (C) 2020 Free Software Foundation, Inc.
>+//
>+// This file is part of the GNU ISO C++ Library.  This library is free
>+// software; you can redistribute it and/or modify it under the
>+// terms of the GNU General Public License as published by the
>+// Free Software Foundation; either version 3, or (at your option)
>+// any later version.
>+
>+// This library is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+// GNU General Public License for more details.
>+
>+// You should have received a copy of the GNU General Public License along
>+// with this library; see the file COPYING3.  If not see
>+// <http://www.gnu.org/licenses/>.
>+
>+// This implementation is based on libcxx/include/barrier
>+//===-- barrier.h --------------------------------------------------===//
>+//
>+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
>+// See https://llvm.org/LICENSE.txt for license information.
>+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>+//
>+//===---------------------------------------------------------------===//
>+
>+#ifndef _GLIBCXX_BARRIER
>+#define _GLIBCXX_BARRIER 1
>+
>+#pragma GCC system_header
>+
>+#if __cplusplus > 201703L
>+#define __cpp_lib_barrier 201907L

This feature test macro will be defined unconditionally, even if
_GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
for gthreads.

You're also missing an edit to <version> (which should depend on the
same conditions).


>+#include <bits/c++config.h>
>+
>+#if defined(_GLIBCXX_HAS_GTHREADS)
>+#include <bits/atomic_base.h>
>+#include <bits/atomic_wait.h>
>+#include <bits/functional_hash.h>
>+#include <ext/numeric_traits.h>
>+
>+#include <memory>
>+
>+namespace std _GLIBCXX_VISIBILITY(default)
>+{
>+_GLIBCXX_BEGIN_NAMESPACE_VERSION
>+
>+  struct __empty_completion
>+  {
>+    _GLIBCXX_ALWAYS_INLINE void
>+    operator()() noexcept
>+    { }
>+  };
>+
>+/*
>+
>+The default implementation of __barrier_base is a classic tree barrier.
>+
>+It looks different from literature pseudocode for two main reasons:
>+ 1. Threads that call into std::barrier functions do not provide indices,
>+    so a numbering step is added before the actual barrier algorithm,
>+    appearing as an N+1 round to the N rounds of the tree barrier.
>+ 2. A great deal of attention has been paid to avoid cache line thrashing
>+    by flattening the tree structure into cache-line sized arrays, that
>+    are indexed in an efficient way.
>+
>+*/
>+
>+  using __barrier_phase_t = uint8_t;

Please add <stdint.h> or <cstdint> since you're using uint8_t
(it's currently included by <bits/atomic_base.h> but that could
change).

Would it work to use a scoped enumeration type here instead? That
would prevent people accidentally doing arithmetic on it, or passing
it to functions taking an integer (and prevent it promoting to int in
arithmetic).

e.g. define it similar to std::byte:

enum class __barrier_phase_t : unsigned char { };

and then cast to an integer on the way in and the way out, so that the
implementation works with its numeric value, but users have a
non-arithmetic type that they can't do anything with.

>+
>+  template<class _CompletionF>

s/typename/class/

>+    class __barrier_base
>+    {
>+      struct alignas(64) /* naturally-align the heap state */ __state_t
>+      {
>+	struct
>+	{
>+	  __atomic_base<__barrier_phase_t> __phase = ATOMIC_VAR_INIT(0);

No need to use ATOMIC_VAR_INIT, we're not trying to be compatible with
C, it can be just = {0} or similar.

>+	} __tickets[64];
>+      };
>+
>+      ptrdiff_t _M_expected;
>+      unique_ptr<char[]> _M_state_allocation;
>+      __state_t* 		_M_state;
>+      __atomic_base<ptrdiff_t> _M_expected_adjustment;
>+      _CompletionF _M_completion;
>+      __atomic_base<__barrier_phase_t> _M_phase;

Can these just use atomic instead of __atomic_base?

The order of the members looks suboptimal. If alignof(_CompletionF) is
less than alignof(atomic<__barrier_phase_t>) then you'll get padding
bytes between them (or is that intentional, to avoid false sharing?)

I would put _M_completion last, and add [[no_unique_address]] so it
doesn't need to waste space if it's an empty type.

>+
>+      static __gthread_t
>+      _S_get_tid() noexcept
>+      {
>+#ifdef __GLIBC__
>+	// For the GNU C library pthread_self() is usable without linking to
>+	// libpthread.so but returns 0, so we cannot use it in single-threaded
>+	// programs, because this_thread::get_id() != thread::id{} must be true.
>+	// We know that pthread_t is an integral type in the GNU C library.
>+	if (!__gthread_active_p())
>+	  return 1;
>+#endif
>+	return __gthread_self();

This is the third place we're doing this (and all of them need to be
changed for recent versions of glibc) so we should put it in one
place.

>+      }
>+
>+      bool
>+      _M_arrive(__barrier_phase_t __old_phase)
>+      {
>+	__barrier_phase_t const __half_step = __old_phase + 1,
>+				__full_step = __old_phase + 2;
>+	size_t __current_expected = _M_expected;
>+	size_t __current = _Hash_impl::hash(_S_get_tid())
>+				% ((_M_expected + 1) >> 1);
>+	for (int __round = 0;; ++__round)
>+	  {
>+	    if (__current_expected <= 1)
>+		return true;
>+	    size_t const __end_node = ((__current_expected + 1) >> 1),
>+			 __last_node = __end_node - 1;
>+	    for (;;++__current)

A space after the semi-colons please.

>+	      {
>+		if (__current == __end_node)
>+		  __current = 0;
>+		__barrier_phase_t __expect = __old_phase;
>+		if (__current == __last_node && (__current_expected & 1))
>+		  {
>+		    if (_M_state[__current].__tickets[__round].__phase
>+			.compare_exchange_strong(__expect, __full_step,
>+						 memory_order_acq_rel))
>+		      break;     // I'm 1 in 1, go to next __round
>+		  }
>+		else if (_M_state[__current].__tickets[__round].__phase
>+			 .compare_exchange_strong(__expect, __half_step,
>+						  memory_order_acq_rel))
>+		  {
>+		    return false; // I'm 1 in 2, done with arrival
>+		  }
>+		else if (__expect == __half_step)
>+		  {
>+		    if (_M_state[__current].__tickets[__round].__phase
>+			.compare_exchange_strong(__expect, __full_step,
>+						 memory_order_acq_rel))
>+		      break;    // I'm 2 in 2, go to next __round
>+		  }
>+	      }
>+	    __current_expected = __last_node + 1;
>+	    __current >>= 1;
>+	  }
>+      }
>+
>+    public:
>+      using arrival_token = __barrier_phase_t;
>+
>+      static constexpr ptrdiff_t
>+      max() noexcept
>+      { return __gnu_cxx::__numeric_traits<ptrdiff_t>::__max; }

s/__numeric_traits/__int_traits/

Or just use __PTRDIFF_MAX__ which doesn't require any template
instantiation and is pre-defined by the compiler.

>+
>+      __barrier_base(ptrdiff_t __expected, _CompletionF __completion = _CompletionF())

This line is too long.

As the ctor can be called with a single argument it should be
'explicit', although I don't think it needs to be callable with a
single argument. It's always going to be passed two arguments by the
derived class, so it doesn't need the default argument for
__completion (which solves the line length problem!)


>+	  : _M_expected(__expected), _M_expected_adjustment(0),
>+	    _M_completion(move(__completion)), _M_phase(0)
>+      {
>+	size_t const __count = (_M_expected + 1) >> 1;
>+	size_t const __size = sizeof(__state_t) * __count;
>+	size_t __allocation_size = __size + alignof(__state_t);
>+	_M_state_allocation = unique_ptr<char[]>(new char[__allocation_size]);
>+	void* __allocation = _M_state_allocation.get();
>+	void* const __state = align(alignof(__state_t), __size, __allocation,
>+				      __allocation_size);
>+	_M_state = new (__state) __state_t[__count];

This looks like a workaround for not having aligned-new. I think you
can get rid of _M_state_allocation and just do:

#ifndef __cpp_aligned_new
# error NO BARRIER FOR YOU
#endif
         _M_state = std::make_unique<__state_t[]>(__count);

Probably a bit more user-friendly to add a check for __cpp_aligned_new
to the #if at the top of the file though.

Aligned new is enabled by default in C++20. If anybody wants to
compile with -fno-aligned-new then they don't get nice things.


>+      }
>+
>+      [[nodiscard]] arrival_token
>+      arrive(ptrdiff_t __update)
>+      {
>+	auto const __old_phase = _M_phase.load(memory_order_relaxed);
>+	for(; __update; --__update)
>+	  {
>+	    if(_M_arrive(__old_phase))
>+	      {
>+		_M_completion();
>+		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
>+		_M_expected_adjustment.store(0, memory_order_relaxed);
>+		_M_phase.store(__old_phase + 2, memory_order_release);
>+		_M_phase.notify_all();
>+	      }
>+	  }
>+	return __old_phase;
>+      }
>+
>+      void
>+      wait(arrival_token&& __old_phase) const
>+      {
>+	auto const __test_fn = [=, this]
>+	  {
>+	    return _M_phase.load(memory_order_acquire) != __old_phase;
>+	  };
>+	_M_phase._M_wait(__old_phase, __test_fn, memory_order_acquire);
>+      }
>+
>+      void
>+      arrive_and_drop()
>+      {
>+	_M_expected_adjustment.fetch_sub(1, memory_order_relaxed);
>+	(void)arrive(1);
>+      }
>+    };
>+
>+  template<class _CompletionF = __empty_completion>

s/class/typename/

>+    class barrier
>+    {
>+      __barrier_base<_CompletionF> _M_b;

New line after this member declaration.

Why is it called _base if it's a member? Why is the impl in a separate
class anyway? It just seems to add a level of indirection for no real
benefit.

>+    public:
>+      using arrival_token = typename __barrier_base<_CompletionF>::arrival_token;
>+
>+      static constexpr ptrdiff_t
>+      max() noexcept
>+      { return __barrier_base<_CompletionF>::max(); }
>+
>+      barrier(ptrdiff_t __count, _CompletionF __completion = _CompletionF())

This is missing 'explicit'

>+	  : _M_b(__count, std::move(__completion))
>+      { }
>+
>+      barrier(barrier const&) = delete;
>+      barrier& operator=(barrier const&) = delete;
>+
>+      [[nodiscard]] arrival_token
>+      arrive(ptrdiff_t __update = 1)
>+      { return _M_b.arrive(__update); }
>+
>+      void
>+      wait(arrival_token&& __phase) const
>+      { _M_b.wait(std::move(__phase)); }
>+
>+      void
>+      arrive_and_wait()
>+      { wait(arrive()); }
>+
>+      void
>+      arrive_and_drop()
>+      { _M_b.arrive_and_drop(); }
>+    };
>+
>+_GLIBCXX_END_NAMESPACE_VERSION
>+} // namespace
>+#endif // _GLIBCXX_HAS_GTHREADS
>+#endif // __cplusplus > 201703L
>+#endif // _GLIBCXX_BARRIER
>+
>diff --git a/libstdc++-v3/testsuite/30_threads/barrier/1.cc b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>new file mode 100644
>index 00000000000..0b38160a58b
>--- /dev/null
>+++ b/libstdc++-v3/testsuite/30_threads/barrier/1.cc
>@@ -0,0 +1,27 @@
>+// Copyright (C) 2020 Free Software Foundation, Inc.
>+//
>+// This file is part of the GNU ISO C++ Library.  This library is free
>+// software; you can redistribute it and/or modify it under the
>+// terms of the GNU General Public License as published by the
>+// Free Software Foundation; either version 3, or (at your option)
>+// any later version.
>+
>+// This library is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+// GNU General Public License for more details.
>+
>+// You should have received a copy of the GNU General Public License along
>+// with this library; see the file COPYING3.  If not see
>+// <http://www.gnu.org/licenses/>.
>+
>+// { dg-options "-std=gnu++2a" }
>+// { dg-do compile { target c++2a } }
>+
>+#include <barrier>
>+
>+#ifndef __cpp_lib_barrier
>+# error "Feature-test macro for barrier missing in <barrier>"
>+#elif __cpp_lib_barrier != 201907L
>+# error "Feature-test macro for barrier has wrong value in <barrier>"
>+#endif
>diff --git a/libstdc++-v3/testsuite/30_threads/barrier/2.cc b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>new file mode 100644
>index 00000000000..1d8d83639e0
>--- /dev/null
>+++ b/libstdc++-v3/testsuite/30_threads/barrier/2.cc
>@@ -0,0 +1,27 @@
>+// Copyright (C) 2019-2020 Free Software Foundation, Inc.

Just 2020 here

>+// This file is part of the GNU ISO C++ Library.  This library is free
>+// software; you can redistribute it and/or modify it under the
>+// terms of the GNU General Public License as published by the
>+// Free Software Foundation; either version 3, or (at your option)
>+// any later version.
>+
>+// This library is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+// GNU General Public License for more details.
>+
>+// You should have received a copy of the GNU General Public License along
>+// with this library; see the file COPYING3.  If not see
>+// <http://www.gnu.org/licenses/>.
>+
>+// { dg-options "-std=gnu++2a" }
>+// { dg-do compile { target c++2a } }

Once the macro is only defined inside the _GLIBCXX_HAS_GTHREADS guard
this test should { dg-require-gthreads "" }.

Now that I think about it, we should probably have negative tests that
check that the feature test macro *isn't* defined when we don't have a
gthreads target. But we aren't doing the anywhere else either.


>+
>+#include <version>
>+
>+#ifndef __cpp_lib_barrier
>+# error "Feature-test macro for barrier missing in <version>"
>+#elif __cpp_lib_barrier != 201907L
>+# error "Feature-test macro for barrier has wrong value in <version>"
>+#endif

This is going to currently fail, right? The patch doesn't add anything
to <version>.

>diff --git a/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>new file mode 100644
>index 00000000000..c128d5ea794
>--- /dev/null
>+++ b/libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>@@ -0,0 +1,51 @@
>+// { dg-options "-std=gnu++2a -pthread" }
>+// { dg-do run { target c++2a } }
>+// { dg-require-effective-target pthread }
>+// { dg-require-gthreads "" }

The directives should be:

// { dg-options "-std=gnu++2a" }
// { dg-do run { target c++2a } }
// { dg-require-gthreads "" }
// { dg-additional-options "-pthread" { target pthread } }

i.e. don't require pthreads, only add -pthread for pthread targets.

This allows it to work for targets that have a non-pthreads
implementation of the gthreads abstraction (e.g. vxworks).

Similarly for the remaining tests.

>+
>+// Copyright (C) 2020 Free Software Foundation, Inc.
>+//
>+// This file is part of the GNU ISO C++ Library.  This library is free
>+// software; you can redistribute it and/or modify it under the
>+// terms of the GNU General Public License as published by the
>+// Free Software Foundation; either version 3, or (at your option)
>+// any later version.
>+
>+// This library is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+// GNU General Public License for more details.
>+
>+// You should have received a copy of the GNU General Public License along
>+// with this library; see the file COPYING3.  If not see
>+// <http://www.gnu.org/licenses/>.
>+
>+// This implementation is based on libcxx/test/std/thread/thread.barrier/arrive.pass.cpp
>+//===----------------------------------------------------------------------===//
>+//
>+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
>+// See https://llvm.org/LICENSE.txt for license information.
>+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>+//
>+//===----------------------------------------------------------------------===//
>+
>+#include <barrier>
>+#include <thread>
>+
>+#include <testsuite_hooks.h>

No need for this header if you don't use VERIFY.

>+int main(int, char**)

int main()

>+{
>+  std::barrier<> b(2);
>+
>+  auto tok = b.arrive();
>+  std::thread t([&](){
>+    (void)b.arrive();
>+  });
>+  b.wait(std::move(tok));
>+  t.join();
>+
>+  auto tok2 = b.arrive(2);
>+  b.wait(std::move(tok2));
>+  return 0;

No need for the return statement.

And similarly for the remaining tests.



More information about the Libstdc++ mailing list