Created attachment 46727 [details] demonstration code
The same incorrect logic that was fixed in bug 68519 is present in the implementations of future::wait_for and shared_timed_mutex::wait_for. The fix should be the same: explicitly duration_cast the __rtime argument to __clock::duration before adding to __clock::now(). Fixes are required in shared_mutex and bits/atomic_futex.h. The attached code demonstrates the problem. wait_for either waits too long or too short. Output is: drdws0134$ garden with -m gcc/8.1.0-01c7/bin g++ -std=c++14 futurewait.cpp -pthread && ./a.out ERROR - future::wait_for took too short: t1 looping: 0 milliseconds 0 second t2 looping: 0 second ERROR - shared_timed_mutex::try_lock_shared took too short: 0 milliseconds t2 looping: 1 second t1 looping: 1 second t2 looping: 2 second t1 looping: 2 second t2 looping: 3 second t1 looping: 3 second t2 looping: 4 second t1 looping: 4 second
I grep'ed the latest devel source tree (git sha: afadff66) for occurrences of now\(\). The same bug appears several times in include/experimental/io_context and include/experimental/timer. The underlying problem is that operator+(time_point, duration) has well-defined but surprising and error-prone semantics when duration's Rep is float. Maybe it would be better to define a less error-prone helper function, e.g., template<class Clk, class Dur, class Rep, class Period> time_point<Clk, Dur> __timepoint_plus_duration(const time_point<Clk, Dur>&, duration<Rep, Period>&); and to use it consistently whenever adding a time_point to a duration?
Yes that probably makes sense to do.
C++17 already has the needed helper function: ceil(duration). So just change all instances of: __clock_t::now() + __reltime to using __dur = typename __clock_t::duration; __clock_t::now() + __chrono_detail::ceil<__dur>(__reltime) and make the C++17 implementation of ceil(duration) visible in all versions as __chrono_detail::__ceil.
Ah yes, of course. Thanks!
I think that https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01934.html fixes this for std::future::wait_for but without the __chrono_detail part mentioned in comment 5. std::shared_timed_mutex was fixed in r278903/ab40695a46c6649bac40f4251e37993d73fa7a13.
I think that https://gcc.gnu.org/pipermail/libstdc++/2020-May/050437.html fixes this for std::future::wait_for including the __chrono_detail part mentioned in comment 5.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:f9ddb696a289cc48d24d3d23c0b324cb88de9573 commit r11-3157-gf9ddb696a289cc48d24d3d23c0b324cb88de9573 Author: Mike Crowe <mac@mcrowe.com> Date: Fri Sep 11 14:25:00 2020 +0100 libstdc++: Avoid rounding errors in std::future::wait_* [PR 91486] Convert the specified duration to the target clock's duration type before adding it to the current time in __atomic_futex_unsigned::_M_load_when_equal_for and _M_load_when_equal_until. This removes the risk of the timeout being rounded down to the current time resulting in there being no wait at all when the duration type lacks sufficient precision to hold the steady_clock current time. Rather than using the style of fix from PR68519, let's expose the C++17 std::chrono::ceil function as std::chrono::__detail::ceil so that it can be used in code compiled with earlier standards versions and simplify the fix. This was suggested by John Salmon in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 . This problem has become considerably less likely to trigger since I switched the __atomic__futex_unsigned::__clock_t reference clock from system_clock to steady_clock and added the loop, but the consequences of triggering it have changed too. By my calculations it takes just over 194 days from the epoch for the current time not to be representable in a float. This means that system_clock is always subject to the problem (with the standard 1970 epoch) whereas steady_clock with float duration only runs out of resolution machine has been running for that long (assuming the Linux implementation of CLOCK_MONOTONIC.) The recently-added loop in __atomic_futex_unsigned::_M_load_when_equal_until turns this scenario into a busy wait. Unfortunately the combination of both of these things means that it's not possible to write a test case for this occurring in _M_load_when_equal_until as it stands. libstdc++-v3/ChangeLog: PR libstdc++/91486 * include/bits/atomic_futex.h (__atomic_futex_unsigned::_M_load_when_equal_for) (__atomic_futex_unsigned::_M_load_when_equal_until): Use __detail::ceil to convert delta to the reference clock duration type to avoid resolution problems. * include/std/chrono (__detail::ceil): Move implementation of std::chrono::ceil into private namespace so that it's available to pre-C++17 code. * testsuite/30_threads/async/async.cc (test_pr91486): Test __atomic_futex_unsigned::_M_load_when_equal_for.
Unfortunately https://gcc.gnu.org/g:f9ddb696a289cc48d24d3d23c0b324cb88de9573 mentioned in comment 9 contains a small bug (all my fault) that Jonathan pointed out in https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html . I've posted a fix in https://gcc.gnu.org/pipermail/libstdc++/2020-September/051025.html and I'll attach the patch here too.
Created attachment 49305 [details] Small fix to f9ddb696a289cc48d24d3d23c0b324cb88de9573
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:f33a43f9f7eab7482837662821abb7fd02cb4350 commit r11-3653-gf33a43f9f7eab7482837662821abb7fd02cb4350 Author: Mike Crowe <mac@mcrowe.com> Date: Mon Oct 5 11:12:38 2020 +0100 libstdc++: Use correct duration for atomic_futex wait on custom clock [PR 91486] As Jonathan Wakely pointed out[1], my change in commit f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to the target clock duration type rather than the input clock duration type in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.) condition_variable does. As well as fixing this, let's create a rather contrived test that fails with the previous code, but unfortunately only when run on a machine with an uptime of over 208.5 days, and even then not always. [1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html libstdc++-v3/ChangeLog: PR libstdc++/91486 * include/bits/atomic_futex.h: (__atomic_futex_unsigned::_M_load_when_equal_until): Use target clock duration type when rounding. * testsuite/30_threads/async/async.cc (test_pr91486_wait_for): Rename from test_pr91486. (float_steady_clock): New class for test. (test_pr91486_wait_until): New test.
(In reply to John Salmon from comment #3) > I grep'ed the latest devel source tree (git sha: afadff66) for occurrences > of now\(\). The same bug appears several times in > include/experimental/io_context and include/experimental/timer. Those are doing exactly what the Networking TS requires. I'll create a library issue to track that. I think all other cases have been fixed by Mike's patches. Thanks for the report.
(In reply to Jonathan Wakely from comment #13) > Those are doing exactly what the Networking TS requires. For that matter, so was our implementation of condition_variable::wait_for that was changed for bug 68519.