This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize to_chars


On 8/30/19 8:27 AM, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
saves 129 bytes of data and totally avoids a chance of cache misses on
__digits.
* For base == 16 replacing the lookup in __digits table with
arithmetic computations leads to a few additional instructions, but
totally avoids a chance of cache misses on __digits (- ~9 cache misses
for worst case) and saves 513 bytes of const data.
* Replacing __first[pos] and __first[pos - 1] with __first[1] and
__first[0] on final iterations saves ~2% of code size.
* Removing trailing '\0' from arrays of digits allows the linker to
merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and
"0123456789abcdef" could share the same address). This improves data
locality and reduces binary sizes.
* Using __detail::__to_chars_len_2 instead of a generic
__detail::__to_chars_len makes the operation O(1) instead of O(N). It
also makes the code two times shorter ( https://godbolt.org/z/Peq_PG)
.

In sum: this significantly reduces the size of a binary (for about
4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals
with latency (CPU cache misses) without changing the iterations count
and without adding costly instructions into the loops.

Would it make sense to move some of this code into GCC as
a built-in so that it could also be used by GCC to expand
some strtol and sprintf calls?

Martin


Changelog:
     * include/std/charconv (__detail::__to_chars_8,
     __detail::__to_chars_16): Replace array of precomputed digits
     with arithmetic operations to avoid CPU cache misses. Remove
     zero termination from array of digits to allow symbol merge with
     generic implementation of __detail::__to_chars. Replace final
     offsets with constants. Use __detail::__to_chars_len_2 instead
     of a generic __detail::__to_chars_len.
     * include/std/charconv (__detail::__to_chars): Remove
     zero termination from array of digits.
     * include/std/charconv (__detail::__to_chars_2): Leading digit
     is always '1'.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]