[PATCH] libstdc++: More efficient std::chrono::year::leap.

Jonathan Wakely jwakely@redhat.com
Wed Jun 23 11:45:12 GMT 2021


On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
>I've checked the generated code and the compiler doesn't figure out
>the logic. I added a comment to explain.
>
>(Revised patch below and attached.)
>
>Best wishes,
>Cassio.
>
>---
>
>Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
>then it's divisible by 400 if and only if it's divisible by 16. The latter
>allows for better code generation.
>
>Tested on x86_64-pc-linux-gnu.
>
>libstdc++-v3/ChangeLog:
>libstdc++-v3/ChangeLog:
>
>    * include/std/chrono:
>
>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>index 4631a727d73..85aa0379432 100644
>--- a/libstdc++-v3/include/std/chrono
>+++ b/libstdc++-v3/include/std/chrono
>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     constexpr uint32_t __offset       = __max_dividend / 2 / 100 * 100;
>     const bool __is_multiple_of_100
>       = __multiplier * (_M_y + __offset) < __bound;
>-    return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
>+    // Usually we test _M_y % 400 == 0 but, when it's already known that
>+    // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
                   ^^
                   N.B. this comment should say !=

>+    return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;

If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

   return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

   return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

   return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC 
(see https://gcc.gnu.org/PR101179 regarding that). But neither of
these seems to improve compared to my first rearrangement above.




More information about the Gcc-patches mailing list