91486 – future::wait_for and shared_timed_mutex::wait_for do not work properly with float duration

Bug 91486 - future::wait_for and shared_timed_mutex::wait_for do not work properly with float duration

Summary: future::wait_for and shared_timed_mutex::wait_for do not work properly with f...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libstdc++ (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	11.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-08-18 22:48 UTC by John Salmon
Modified:	2020-11-16 14:36 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2019-08-19 00:00:00

Attachments
demonstration code (633 bytes, text/x-csrc) 2019-08-18 22:54 UTC, John Salmon	Details
Small fix to f9ddb696a289cc48d24d3d23c0b324cb88de9573 (1.92 KB, patch) 2020-10-03 16:18 UTC, Mike Crowe	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Salmon 2019-08-18 22:48:01 UTC

Comment 1 John Salmon 2019-08-18 22:54:48 UTC

Created attachment 46727 [details]
demonstration code

Comment 2 John Salmon 2019-08-18 22:59:28 UTC

The same incorrect logic that was fixed in bug 68519 is present in the implementations of future::wait_for and shared_timed_mutex::wait_for.  The fix should be the same:  explicitly duration_cast the __rtime argument to __clock::duration before adding to __clock::now().  Fixes are required in shared_mutex and bits/atomic_futex.h.

The attached code demonstrates the problem.  wait_for either waits too long or too short.  Output is:

drdws0134$ garden with -m gcc/8.1.0-01c7/bin g++ -std=c++14 futurewait.cpp -pthread && ./a.out
ERROR - future::wait_for took too short: t1 looping: 0 milliseconds
0 second
t2 looping: 0 second
ERROR - shared_timed_mutex::try_lock_shared took too short: 0 milliseconds
t2 looping: 1 second
t1 looping: 1 second
t2 looping: 2 second
t1 looping: 2 second
t2 looping: 3 second
t1 looping: 3 second
t2 looping: 4 second
t1 looping: 4 second

Comment 3 John Salmon 2019-08-19 13:25:38 UTC

I grep'ed the latest devel source tree (git sha: afadff66) for occurrences of now\(\).  The same bug appears several times in include/experimental/io_context and include/experimental/timer.

The underlying problem is that operator+(time_point, duration) has well-defined but surprising and error-prone semantics when duration's Rep is float.  Maybe it would be better to define a less error-prone helper function, e.g.,

  template<class Clk, class Dur, class Rep, class Period>
  time_point<Clk, Dur> __timepoint_plus_duration(const time_point<Clk, Dur>&, duration<Rep, Period>&);

and to use it consistently whenever adding a time_point to a duration?

Comment 4 Jonathan Wakely 2019-08-19 20:09:51 UTC

Yes that probably makes sense to do.

Comment 5 John Salmon 2019-08-23 12:38:52 UTC

C++17 already has the needed helper function:  ceil(duration).  

So just change all instances of:

     __clock_t::now() + __reltime

to

    using __dur = typename __clock_t::duration;
    __clock_t::now() + __chrono_detail::ceil<__dur>(__reltime)

and make the C++17 implementation of ceil(duration) visible in all versions as __chrono_detail::__ceil.

Comment 6 Jonathan Wakely 2019-08-23 12:47:04 UTC

Ah yes, of course. Thanks!

Comment 7 Mike Crowe 2020-02-02 17:40:28 UTC

I think that https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01934.html fixes this for std::future::wait_for but without the __chrono_detail part mentioned in comment 5.

std::shared_timed_mutex was fixed in r278903/ab40695a46c6649bac40f4251e37993d73fa7a13.

Comment 8 Mike Crowe 2020-06-05 11:50:23 UTC

I think that https://gcc.gnu.org/pipermail/libstdc++/2020-May/050437.html fixes this for std::future::wait_for including the __chrono_detail part mentioned in comment 5.

Comment 9 GCC Commits 2020-09-11 13:29:39 UTC

The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:f9ddb696a289cc48d24d3d23c0b324cb88de9573

commit r11-3157-gf9ddb696a289cc48d24d3d23c0b324cb88de9573
Author: Mike Crowe <mac@mcrowe.com>
Date:   Fri Sep 11 14:25:00 2020 +0100

    libstdc++: Avoid rounding errors in std::future::wait_* [PR 91486]
    
    Convert the specified duration to the target clock's duration type
    before adding it to the current time in
    __atomic_futex_unsigned::_M_load_when_equal_for and
    _M_load_when_equal_until.  This removes the risk of the timeout being
    rounded down to the current time resulting in there being no wait at all
    when the duration type lacks sufficient precision to hold the
    steady_clock current time.
    
    Rather than using the style of fix from PR68519, let's expose the C++17
    std::chrono::ceil function as std::chrono::__detail::ceil so that it can
    be used in code compiled with earlier standards versions and simplify
    the fix. This was suggested by John Salmon in
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .
    
    This problem has become considerably less likely to trigger since I
    switched the __atomic__futex_unsigned::__clock_t reference clock from
    system_clock to steady_clock and added the loop, but the consequences of
    triggering it have changed too.
    
    By my calculations it takes just over 194 days from the epoch for the
    current time not to be representable in a float. This means that
    system_clock is always subject to the problem (with the standard 1970
    epoch) whereas steady_clock with float duration only runs out of
    resolution machine has been running for that long (assuming the Linux
    implementation of CLOCK_MONOTONIC.)
    
    The recently-added loop in
    __atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
    into a busy wait.
    
    Unfortunately the combination of both of these things means that it's
    not possible to write a test case for this occurring in
    _M_load_when_equal_until as it stands.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/91486
            * include/bits/atomic_futex.h
            (__atomic_futex_unsigned::_M_load_when_equal_for)
            (__atomic_futex_unsigned::_M_load_when_equal_until): Use
            __detail::ceil to convert delta to the reference clock
            duration type to avoid resolution problems.
            * include/std/chrono (__detail::ceil): Move implementation
            of std::chrono::ceil into private namespace so that it's
            available to pre-C++17 code.
            * testsuite/30_threads/async/async.cc (test_pr91486):
            Test __atomic_futex_unsigned::_M_load_when_equal_for.

Comment 10 Mike Crowe 2020-10-03 16:17:16 UTC

Unfortunately https://gcc.gnu.org/g:f9ddb696a289cc48d24d3d23c0b324cb88de9573 mentioned in comment 9 contains a small bug (all my fault) that Jonathan pointed out in https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html . I've posted a fix in https://gcc.gnu.org/pipermail/libstdc++/2020-September/051025.html and I'll attach the patch here too.

Comment 11 Mike Crowe 2020-10-03 16:18:01 UTC

Created attachment 49305 [details]
Small fix to f9ddb696a289cc48d24d3d23c0b324cb88de9573

Comment 12 GCC Commits 2020-10-05 10:32:40 UTC

The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:f33a43f9f7eab7482837662821abb7fd02cb4350

commit r11-3653-gf33a43f9f7eab7482837662821abb7fd02cb4350
Author: Mike Crowe <mac@mcrowe.com>
Date:   Mon Oct 5 11:12:38 2020 +0100

    libstdc++: Use correct duration for atomic_futex wait on custom clock [PR 91486]
    
    As Jonathan Wakely pointed out[1], my change in commit
    f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
    the target clock duration type rather than the input clock duration type
    in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
    condition_variable does.
    
    As well as fixing this, let's create a rather contrived test that fails
    with the previous code, but unfortunately only when run on a machine
    with an uptime of over 208.5 days, and even then not always.
    
    [1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/91486
            * include/bits/atomic_futex.h:
            (__atomic_futex_unsigned::_M_load_when_equal_until): Use target
            clock duration type when rounding.
            * testsuite/30_threads/async/async.cc (test_pr91486_wait_for):
            Rename from test_pr91486.
            (float_steady_clock): New class for test.
            (test_pr91486_wait_until): New test.

Comment 13 Jonathan Wakely 2020-11-16 12:58:15 UTC

(In reply to John Salmon from comment #3)
> I grep'ed the latest devel source tree (git sha: afadff66) for occurrences
> of now\(\).  The same bug appears several times in
> include/experimental/io_context and include/experimental/timer.

Those are doing exactly what the Networking TS requires. I'll create a library issue to track that.

I think all other cases have been fixed by Mike's patches.

Thanks for the report.

Comment 14 Jonathan Wakely 2020-11-16 14:36:02 UTC

(In reply to Jonathan Wakely from comment #13)
> Those are doing exactly what the Networking TS requires.

For that matter, so was our implementation of condition_variable::wait_for that was changed for bug 68519.