[PATCH] libstdc++: More efficient std::chrono::year::leap.
Jonathan Wakely
jwakely@redhat.com
Wed Jun 23 11:45:12 GMT 2021
On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
>I've checked the generated code and the compiler doesn't figure out
>the logic. I added a comment to explain.
>(Revised patch below and attached.)
>Best wishes,
>Cassio.
>---
>Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
>then it's divisible by 400 if and only if it's divisible by 16. The latter
>allows for better code generation.
>Tested on x86_64-pc-linux-gnu.
>
>libstdc++-v3/ChangeLog:
> * include/std/chrono:
>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>index 4631a727d73..85aa0379432 100644
>--- a/libstdc++-v3/include/std/chrono
>+++ b/libstdc++-v3/include/std/chrono
>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100;
> const bool __is_multiple_of_100
> = __multiplier * (_M_y + __offset) < __bound;
>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
>+ // Usually we test _M_y % 400 == 0 but, when it's already known that
>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
^^
N.B. this comment should say !=
>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
If y % 16 == 0 then y % 4 == 0 too. So we could write that as:
return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;
This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?
However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.
My rearranged expression above is equivalent to:
return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;
which can be written without branches:
return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;
However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC
(see https://gcc.gnu.org/PR101179 regarding that). But neither of
these seems to improve compared to my first rearrangement above.
