[PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock

Fri Sep 11 09:47:30 GMT 2020

I'm finally getting round to merging this series!

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>If std::future::wait_until is passed a time point measured against a clock
>that is neither std::chrono::steady_clock nor std::chrono::system_clock
>then the generic implementation of
>__atomic_futex_unsigned::_M_load_when_equal_until is called which
>calculates the timeout based on __clock_t and calls the
>_M_load_when_equal_until method for that clock to perform the actual wait.
>
>There's no guarantee that __clock_t is running at the same speed as the
>caller's clock, so if the underlying wait times out timeout we need to
>check the timeout against the caller's clock again before potentially
>looping.
>
>Also add two extra tests to the testsuite's async.cc:
>
>* run test03 with steady_clock_copy, which behaves identically to
>  std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
>  causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
>  that takes an arbitrary clock to be called.
>
>* invent test04 which uses a deliberately slow running clock in order to
>  exercise the looping behaviour o
>  __atomic_futex_unsigned::_M_load_when_equal_until described above.
>
>	* libstdc++-v3/include/bits/atomic_futex.h:
>	(__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
>	generic _Clock to check the timeout against _Clock again after
>	_M_load_when_equal_until returns indicating a timeout.
>
>	* libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
>	slow_clock that runs at an eleventh of steady_clock's speed. Use it
>	to test the user-supplied-clock variant of
>	__atomic_futex_unsigned::_M_load_when_equal_until works generally
>	with test03 and loops correctly when the timeout time hasn't been
>	reached in test04.
>---
> libstdc++-v3/include/bits/atomic_futex.h         | 15 ++--
> libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +++++++++++++++++-
> 2 files changed, 80 insertions(+), 5 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>index 4375129..5f95ade 100644
>--- a/libstdc++-v3/include/bits/atomic_futex.h
>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _M_load_when_equal_until(unsigned __val, memory_order __mo,
> 	  const chrono::time_point<_Clock, _Duration>& __atime)
>       {
>-	const typename _Clock::time_point __c_entry = _Clock::now();
>-	const __clock_t::time_point __s_entry = __clock_t::now();
>-	const auto __delta = __atime - __c_entry;
>-	const auto __s_atime = __s_entry + __delta;
>-	return _M_load_when_equal_until(__val, __mo, __s_atime);
>+	typename _Clock::time_point __c_entry = _Clock::now();
>+	do {
>+	  const __clock_t::time_point __s_entry = __clock_t::now();
>+	  const auto __delta = __atime - __c_entry;
>+	  const auto __s_atime = __s_entry + __delta;
>+	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
>+	    return true;

I wonder if we can avoid looping if _Clock::is_steady is true i.e.

           if _GLIBCXX_CONSTEXPR (_Clock::is_steady)
             return false

If _Clock is steady then it can't be adjusted to move backwards. I am
not 100% sure, but I don't think a steady clock can run "slow" either.
But I'm checking with LWG about that. We should keep the loop for now.

The reason for wanting to return early is that _Clock::now() can be
quite expensive, so avoiding the second call if it's not needed would
be significant.

Related to this, I've been experimenting with a change to wait_for
that checks __rtime <= 0 and if so, uses __clock_t::time_point::min()
for the wait_until call, so that we don't bother to call
__clock_t::now(). For a non-positive duration we don't need to
determine the precise time_point in the past, because it's in the past
anyway. Using time_point::min() is as good as __clock_t::now() - rtime
(as long as we don't overflow anything when converting it to a struct
timespec, or course!)

Using future.wait_for(0s) to poll for a future becoming ready takes a
long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ
Using wait_until(C::time_point::min()) is faster, because it doesn't
call C::now().

This probably isn't important for most timed waiting functions in the
library, because I don't see any reason to use wait_for(0s) to poll a
mutex, condition_variable or atomic. But it's reasonable to do for
futures.

I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for
in d1a74a287ee1a84b90e5675904dac7f945cffca1.

The extra branch to check rtime <= rtime.zero() or atime < C::now() is
probably insignificant compared to the cost of unnecessary calls to
now() on one or more clocks.

>+	  __c_entry = _Clock::now();
>+	} while (__c_entry < __atime);
>+	return false;
>       }
>