[PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

Jonathan Wakely jwakely@redhat.com
Fri Nov 13 22:22:27 GMT 2020


On 13/11/20 21:58 +0000, Mike Crowe via Libstdc++ wrote:
>On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
>> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>> > The futex system call supports waiting for an absolute time if
>> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
>> > benefits:
>> >
>> > 1. The call to gettimeofday is not required in order to calculate a
>> >   relative timeout.
>> >
>> > 2. If someone changes the system clock during the wait then the futex
>> >   timeout will correctly expire earlier or later.  Currently that only
>> >   happens if the clock is changed prior to the call to gettimeofday.
>> >
>> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
>> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
>> > that the code still works correctly with earlier kernel versions, an ENOSYS
>> > error from futex[1] results in the futex_clock_realtime_unavailable flag
>> > being set.  This flag is used to avoid the unnecessary unsupported futex
>> > call in the future and to fall back to the previous gettimeofday and
>> > relative time implementation.
>> >
>> > glibc applied an equivalent switch in pthread_cond_timedwait to use
>> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
>> > glibc-2.10 back in 2009.  See
>> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>> >
>> > The futex_clock_realtime_unavailable flag is accessed using
>> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
>> > two calls to _M_futex_wait_until happen to happen simultaneously then the
>> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
>> > risk discovering that it doesn't work and, if so, both set the flag.
>> >
>> > [1] This is how glibc's nptl-init.c determines whether these flags are
>> >    supported.
>> >
>> > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>> > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
>> > 	result of trying to use
>> > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>> > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>> > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
>> > 	supported.
>>
>> Mike,
>>
>> I've been doing some performance comparisons and this patch seems to
>> make quite a big difference to code that polls a future by calling
>> fut.wait_until(t) using any t < now() as the timeout. For example,
>> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
>> UNIX epoch.
>>
>> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
>> commented out) I see that polling take < 100ns. With the change, it
>> takes 3000ns or more.
>>
>> Now this is still far better than polling using fut.wait_for(0s) which
>> takes around 50000ns due to the clock_gettime call, but I'm about to
>> fix that.
>>
>> I'm not sure how important it is for wait_until(past) to be fast, but
>> the difference from 100ns to 3000ns seems significant. Do you see the
>> same kind of numbers? Is this just a property of the futex wait with
>> an absolute time?
>>
>> N.B. using wait_until(system_clock::time_point::min()) or any other
>> time before the epoch doesn't work. The futex syscall returns EINVAL
>> which we don't check for. I'm about to fix that too.
>
>I see similar behaviour. I suppose this is because the
>gettimeofday/clock_gettime system calls are in the VDSO and therefore
>usually much cheaper to call than the real system call SYS_futex.
>
>If rather than bailing out early when the relative timeout is negative, I
>call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
>wait_until call takes about ten times longer than when using the absolute
>SYS_futex. I can't really explain that.
>
>Calling these functions with a time in the past is probably quite common if
>you calculate a single timeout for several operations in sequence. What's
>less clear is whether the performance matters that much when the return
>value indicates a timeout anyway.
>
>If gettimeofday/clock_gettime are cheap enough then I suppose we can call
>them even in the absolute timeout case (losing benefit 1 above, which
>appears to not really exist) to get the improved performance for timeouts
>in the past whilst retaining the correct behaviour if the clock is warped
>that this patch addressed (benefit 2 above.)
>
>I'll try to come up with some standalone test cases with results for
>further discussion. I suspect that the glibc people will be interested too.

Thanks, that would be great. I have about twenty things on my plate
already.



More information about the Gcc-patches mailing list