Created attachment 52021 [details] Proposed implementation Current implementation of basic_ios::init(), through invocation of _M_cache_locale(), calls has_facet() and use_facet() on three facet types (ctype, num_get, num_put). As each one of them does a dynamic_cast under the hood, a mere construction of an iostream results in six (6) dynamic_casts, resulting in poor performance in cases of short-lived iostreams. However, one can note that these {has,use}_facet() only reference facets which present in classic() locale and therefore are guaranteed to be there in any locale (with the matching type) and therefore don't require any bounds checks or dynamic_casts in the locale impl object. So we can do some TMP to establish a subset of facets which are always present in any locale, and instruct has_facet() and use_facet() to bypass any checks for those. (It may also be worth it to add __try_use_facet() function, providing more efficient version of (has_facet<T>(loc) ? &use_facet<T>(loc) : 0)). I've drafted a patch implementing changes described above. On my hardware (i9-9900k @ 5 GHz) it reduces iostream construction time from 100 ns down to 36 ns).
That's a nice speedup, and seems reasonable. We might be able to do it more simply though, without the full generality of the new policies.
Created attachment 52024 [details] Alternative implementation This seems like a much simpler approach. This causes 22_locale/ctype/is/string/89728_neg.cc to fail for C++98/11/14 modes, because without 'if constexpr' the body of __try_use_facet creates dozens of errors, rather than just one from the current definition of use_facet. That isn't the end of the world, but it's a regression in diagnostic quality.
Created attachment 52025 [details] Alternative implementation v2 The diagnostic regression is easy to fix with a static assertion before calling __try_use_facet.
(In reply to Jonathan Wakely from comment #3) > Created attachment 52025 [details] > Alternative implementation v2 > > The diagnostic regression is easy to fix with a static assertion before > calling __try_use_facet. Although that causes errors for certain uses of streams without including <locale>, specifically when building the library. Also I forgot to say that I can still reproduce approx. 3x speedup using my patch. Before: BM_sstream_construct 288 ns 287 ns 2525913 BM_fstream_construct 319 ns 318 ns 2204065 After: BM_sstream_construct 99.5 ns 99.1 ns 6512679 BM_fstream_construct 108 ns 107 ns 6536463
Created attachment 52029 [details] Build fix for alternative implementation v2.
That would also work, I suppose (it even outperforms my original approach by a tiny bit -- 33 ns for v2 vs 36 for my original implementation). There are a few build errors in the alternative implementations, however: is_convertible<locale::facet*, _Facet*>::value is false, as it tries to downcast the facet -- did you mean is_base_of<>, which would also better align with the static assert message? Also, despite the always_inline attribute of __try_use_facet, my attempts to build the library with -O0 -ggdb and link my benchmark yielded a bunch of errors like ld: /home/av/prgs/gcc/debug/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so: undefined reference to `std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > const* std::__try_use_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&)' Suggest explicitly instantiating __try_use_facet where has_facet() and use_facet() is instantiated. See attached fix for build problems I discovered.
(In reply to Dmitry Prokoptsev from comment #6) > That would also work, I suppose (it even outperforms my original approach by > a tiny bit -- 33 ns for v2 vs 36 for my original implementation). > > There are a few build errors in the alternative implementations, however: > is_convertible<locale::facet*, _Facet*>::value is false, as it tries to > downcast the facet -- did you mean is_base_of<>, which would also better > align with the static assert message? Yes, I've already changed it to use that in my local branch. > Also, despite the always_inline attribute of __try_use_facet, my attempts to > build the library with -O0 -ggdb and link my benchmark yielded a bunch of > errors like > > ld: > /home/av/prgs/gcc/debug/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++. > so: undefined reference to `std::num_get<char, > std::istreambuf_iterator<char, std::char_traits<char> > > const* > std::__try_use_facet<std::num_get<char, std::istreambuf_iterator<char, > std::char_traits<char> > > >(std::locale const&)' Yes, that's what happens for the debug libs built with --enable-libstdcxx-debug > Suggest explicitly instantiating __try_use_facet where has_facet() and > use_facet() is instantiated. Yes, I wanted to avoid that :-( > See attached fix for build problems I discovered. Thanks, I'll compare it with the fixes I've already got locally and finish retesting.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:b3ac43a3c05744d62a963d656bed782fc867ad79 commit r13-3888-gb3ac43a3c05744d62a963d656bed782fc867ad79 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Nov 9 21:44:31 2022 +0000 libstdc++: Avoid redundant checks in std::use_facet [PR103755] We do not need to do bounds checks or a runtime dynamic_cast when using std::has_facet and std::use_facet to access the default facets that are guaranteed to be present in every std::locale object. We can just index straight into the array and use a static_cast for the conversion. This patch adds a new std::__try_use_facet function that is like std::use_facet but returns a pointer, so can be used to implement both std::has_facet and std::use_facet. We can then do the necessary metaprogramming to skip the redundant checks in std::__try_use_facet. To avoid having to export (or hide) instantiations of the new function from libstdc++.so the instantiations are given hidden visibility. This allows them to be used in the library, but user code will instantiate it again using the definition in the header. That would happen anyway, because there are no explicit instantiation declarations for any of std::has_facet, std::use_facet, or the new std::__try_use_facet. libstdc++-v3/ChangeLog: PR libstdc++/103755 * config/abi/pre/gnu.ver: Tighten patterns for facets in the base version. Add exports for __try_use_facet. * include/bits/basic_ios.tcc (basic_ios::_M_cache_locale): Use __try_use_facet instead of has_facet and use_facet. * include/bits/fstream.tcc (basic_filebuf::basic_filebuf()): Likewise. (basic_filebuf::imbue): Likewise. * include/bits/locale_classes.h (locale, locale::id) (locale::_Impl): Declare __try_use_facet as a friend. * include/bits/locale_classes.tcc (__try_use_facet): Define new function template with special cases for default facets. (has_facet, use_facet): Call __try_use_facet. * include/bits/locale_facets.tcc (__try_use_facet): Declare explicit instantiations. * include/bits/locale_facets_nonio.tcc (__try_use_facet): Likewise. * src/c++11/locale-inst-monetary.h (INSTANTIATE_FACET_ACCESSORS): Use new macro for facet accessor instantiations. * src/c++11/locale-inst-numeric.h (INSTANTIATE_FACET_ACCESSORS): Likewise. * src/c++11/locale-inst.cc (INSTANTIATE_USE_FACET): Define new macro for instantiating __try_use_facet and use_facet. (INSTANTIATE_FACET_ACCESSORS): Define new macro for also defining has_facet. * src/c++98/compatibility-ldbl.cc (__try_use_facet): Instantiate. * testsuite/22_locale/ctype/is/string/89728_neg.cc: Adjust expected errors.
This causes an ABI regression on powerpc64le: FAIL: libstdc++-abi/abi_check
It also breaks the build on one of our powerpc64 le machines (just the one). It fails in stage 1. g:b3ac43a3c05744d62a963d656bed782fc867ad79, r13-3888-gb3ac43a3c05744 When building gcc on one of our systems (just the one) the build fails in stage 1. Ubuntu 22.04.1 LTS gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1) Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35 make . . . libtool: compile: /home/seurer/gcc/git/build/gcc-trunk-bootstrap/./gcc/xgcc -shared-libgcc -B/home/seurer/gcc/git/build/gcc-trunk-bootstrap/./gcc -nostdinc++ -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/src -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -B/home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/bin/ -B/home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/lib/ -isystem /home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/include -isystem /home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/sys-include -fno-checking -I/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu -I/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/include -I/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/libsupc++ -std=gnu++98 -fPIC -DPIC -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi=2 -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=compatibility-ldbl-alt128.lo -mno-gnu-attribute -g -O2 -D_GNU_SOURCE -mabi=ieeelongdouble -mno-gnu-attribute -Wno-psabi -std=gnu++11 -c /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc -fPIC -DPIC -D_GLIBCXX_SHARED -o .libs/compatibility-ldbl-alt128.o In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:36: /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:33:40: error: expected constructor, destructor, or type conversion before ';' token 33 | INSTANTIATE_FACET_ACCESSORS(num_get<C>); | ^ /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:34:40: error: expected constructor, destructor, or type conversion before ';' token 34 | INSTANTIATE_FACET_ACCESSORS(num_put<C>); | ^ In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:37: /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:36:42: error: expected constructor, destructor, or type conversion before ';' token 36 | INSTANTIATE_FACET_ACCESSORS(money_put<C>); | ^ /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:37:42: error: expected constructor, destructor, or type conversion before ';' token 37 | INSTANTIATE_FACET_ACCESSORS(money_get<C>); | ^ In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:44: /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:33:40: error: expected constructor, destructor, or type conversion before ';' token 33 | INSTANTIATE_FACET_ACCESSORS(num_get<C>); | ^ /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:34:40: error: expected constructor, destructor, or type conversion before ';' token 34 | INSTANTIATE_FACET_ACCESSORS(num_put<C>); | ^ In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:45: /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:36:42: error: expected constructor, destructor, or type conversion before ';' token 36 | INSTANTIATE_FACET_ACCESSORS(money_put<C>); | ^ /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:37:42: error: expected constructor, destructor, or type conversion before ';' token 37 | INSTANTIATE_FACET_ACCESSORS(money_get<C>); | ^ make[6]: *** [Makefile:1027: compatibility-ldbl-alt128.lo] Error 1 commit b3ac43a3c05744d62a963d656bed782fc867ad79 (HEAD) Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Nov 9 21:44:31 2022 +0000 libstdc++: Avoid redundant checks in std::use_facet [PR103755]
As I said in Bug 107632 comment 2: I'm kinda tempted to just disable the new optimization on these targets, the handling of compat facets for different float ABIs is impossible to get right.
I see the problem here though, and can fix it easily enough.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:a7f51059fb009dcd7d491d6b2164bce75dbd9975 commit r13-3917-ga7f51059fb009dcd7d491d6b2164bce75dbd9975 Author: Jonathan Wakely <jwakely@redhat.com> Date: Fri Nov 11 22:36:01 2022 +0000 libstdc++: Define INSTANTIATE_FACET_ACCESSORS macro in compat source [PR103755] compatibility-ldbl-alt128.cc re-includes locale-inst-numeric.h and locale-inst-monetary.h but wasn't defining the macros added in r13-3888-gb3ac43a3c05744. Put those macros in a new internal header that can be included everywhere they're used. libstdc++-v3/ChangeLog: PR libstdc++/103755 * src/c++11/locale-inst-monetary.h: Include new header. * src/c++11/locale-inst-numeric.h: Likewise. * src/c++11/locale-inst.cc: Likewise. (INSTANTIATE_USE_FACET, INSTANTIATE_FACET_ACCESSORS): Move macro definitions to ... * src/c++11/facet_inst_macros.h: New file.
Should be fixed now, I think
Fixed then.
Unfortunately this change causes a regression for libs that were statically linked to libstdc++.a before the PR 91057 fix. Any object which has the buggy std::locale::id::_M_id() code linked into it can get corrupted locale::_Impl::_M_facet arrays, where the facets are at the wrong indices. Before the introduction of __try_use_facet those corrupted _M_facet arrays would result in a failed dynamic_cast and so has_facet would be false and use_facet would throw. With the new code in GCC 13 the static_cast succeeds, but with undefined behaviour. So to avoid a regression from detecting the bug and throwing an exception to crashing with a segfault, I think we need to change __try_use_facet to use dynamic_cast, unfortunately. We will still retain the use of __try_use_facet in std::basic_ios::_M_cache_locale, so we'll still only do three dynamic_casts not six, so that's still a bit better than it was before.
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:8caf5805ad76125b84430b8653003f4776489d46 commit r12-9462-g8caf5805ad76125b84430b8653003f4776489d46 Author: Jonathan Wakely <jwakely@redhat.com> Date: Thu Apr 20 21:02:22 2023 +0100 libstdc++: Optimize std::try_facet and std::use_facet [PR103755] The std::try_facet and std::use_facet functions were optimized in r13-3888-gb3ac43a3c05744 to avoid redundant checking for all facets that are required to always be present in every locale. This performs a simpler version of the optimization that only applies to std::ctype<char>, std::num_get<char>, std::num_put<char>, and the wchar_t specializations of those facets. Those are the facets that are cached by std::basic_ios, which means they're used on construction for every iostream object. This smaller change is suitable for the gcc-12 branch, and mitigates the performance loss for powerpc64le-linux on the gcc-12 branch caused by r12-9454-g24cf9f4c6f45f7 for PR 103387. It also greatly improves the performance of constructing iostreams objects, for all targets. libstdc++-v3/ChangeLog: PR libstdc++/103755 * include/bits/locale_classes.tcc (try_facet, use_facet): Do not check array index or dynamic type when accessing required specializations of std::ctype, std::num_get, or std::num_put. * testsuite/22_locale/ctype/is/string/89728_neg.cc: Adjust expected errors.
(In reply to Jonathan Wakely from comment #16) > So to avoid a regression from detecting the bug and throwing an exception to > crashing with a segfault, I think we need to change __try_use_facet to use > dynamic_cast, unfortunately. Ugh, I completely forgot about this.