This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Optimize to_chars
- From: Jonathan Wakely <jwakely at redhat dot com>
- To: Martin Sebor <msebor at gmail dot com>
- Cc: Antony Polukhin <antoshkka at gmail dot com>, libstdc++ <libstdc++ at gcc dot gnu dot org>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 30 Aug 2019 20:24:49 +0100
- Subject: Re: [PATCH] Optimize to_chars
- References: <CAKqmYPZh+KXP3J_5X3xmz=-D0h013_w201Z=E5e56sfvCtAr1A@mail.gmail.com> <email@example.com>
On 30/08/19 11:03 -0600, Martin Sebor wrote:
On 8/30/19 8:27 AM, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
saves 129 bytes of data and totally avoids a chance of cache misses on
* For base == 16 replacing the lookup in __digits table with
arithmetic computations leads to a few additional instructions, but
totally avoids a chance of cache misses on __digits (- ~9 cache misses
for worst case) and saves 513 bytes of const data.
* Replacing __first[pos] and __first[pos - 1] with __first and
__first on final iterations saves ~2% of code size.
* Removing trailing '\0' from arrays of digits allows the linker to
merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and
"0123456789abcdef" could share the same address). This improves data
locality and reduces binary sizes.
* Using __detail::__to_chars_len_2 instead of a generic
__detail::__to_chars_len makes the operation O(1) instead of O(N). It
also makes the code two times shorter ( https://godbolt.org/z/Peq_PG)
In sum: this significantly reduces the size of a binary (for about
4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals
with latency (CPU cache misses) without changing the iterations count
and without adding costly instructions into the loops.
Would it make sense to move some of this code into GCC as
a built-in so that it could also be used by GCC to expand
some strtol and sprintf calls?
Makes sense, although we'd still need it in libstdc++ until Clang and
EDG implement the same built-in.