[PATCH] libstdc++: More efficient std::chrono::year::leap.

Jonathan Wakely jwakely@redhat.com
Wed Jun 23 13:16:31 GMT 2021


On 23/06/21 12:45 +0100, Jonathan Wakely wrote:
>On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
>>I've checked the generated code and the compiler doesn't figure out
>>the logic. I added a comment to explain.
>>
>>(Revised patch below and attached.)
>>
>>Best wishes,
>>Cassio.
>>
>>---
>>
>>Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
>>then it's divisible by 400 if and only if it's divisible by 16. The latter
>>allows for better code generation.
>>
>>Tested on x86_64-pc-linux-gnu.
>>
>>libstdc++-v3/ChangeLog:
>>libstdc++-v3/ChangeLog:
>>
>>   * include/std/chrono:
>>
>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>>index 4631a727d73..85aa0379432 100644
>>--- a/libstdc++-v3/include/std/chrono
>>+++ b/libstdc++-v3/include/std/chrono
>>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>    constexpr uint32_t __offset       = __max_dividend / 2 / 100 * 100;
>>    const bool __is_multiple_of_100
>>      = __multiplier * (_M_y + __offset) < __bound;
>>-    return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
>>+    // Usually we test _M_y % 400 == 0 but, when it's already known that
>>+    // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
>                  ^^
>                  N.B. this comment should say !=
>
>>+    return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
>
>If y % 16 == 0 then y % 4 == 0 too. So we could write that as:
>
>  return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;
>
>This seems to perform even better over a wide range of inputs, can you
>confirm that result with your own tests?
>
>However, my microbenchmark also shows that the simplistic code using
>y%100 often performs even better than the current code calculating
>__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
>are bad.
>
>My rearranged expression above is equivalent to:
>
>  return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;
>
>which can be written without branches:
>
>  return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;
>
>However, both Clang and GCC already remove the branch for (x ? 16 : 4)
>and the conditional expression produces slightly smaller code with GCC 
>(see https://gcc.gnu.org/PR101179 regarding that). But neither of
>these seems to improve compared to my first rearrangement above.

This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

   return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;




More information about the Libstdc++ mailing list