Bug 103755 - {has,use}_facet() and iostream constructor performance
Summary: {has,use}_facet() and iostream constructor performance
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 11.2.1
: P3 enhancement
Target Milestone: 12.3
Assignee: Jonathan Wakely
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-17 03:49 UTC by Dmitry Prokoptsev
Modified: 2023-12-20 23:52 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-17 00:00:00


Attachments
Proposed implementation (4.42 KB, text/plain)
2021-12-17 03:49 UTC, Dmitry Prokoptsev
Details
Alternative implementation (2.86 KB, patch)
2021-12-17 17:22 UTC, Jonathan Wakely
Details | Diff
Alternative implementation v2 (3.16 KB, patch)
2021-12-17 17:52 UTC, Jonathan Wakely
Details | Diff
Build fix for alternative implementation v2. (1.31 KB, text/plain)
2021-12-19 03:41 UTC, Dmitry Prokoptsev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Prokoptsev 2021-12-17 03:49:28 UTC
Created attachment 52021 [details]
Proposed implementation

Current implementation of basic_ios::init(), through invocation of _M_cache_locale(), calls has_facet() and use_facet() on three facet types (ctype, num_get, num_put). As each one of them does a dynamic_cast under the hood, a mere construction of an iostream results in six (6) dynamic_casts, resulting in poor performance in cases of short-lived iostreams.

However, one can note that these {has,use}_facet() only reference facets which present in classic() locale and therefore are guaranteed to be there in any locale (with the matching type) and therefore don't require any bounds checks or dynamic_casts in the locale impl object. So we can do some TMP to establish a subset of facets which are always present in any locale, and instruct has_facet() and use_facet() to bypass any checks for those.

(It may also be worth it to add __try_use_facet() function, providing more efficient version of (has_facet<T>(loc) ? &use_facet<T>(loc) : 0)).

I've drafted a patch implementing changes described above. On my hardware (i9-9900k @ 5 GHz) it reduces iostream construction time from 100 ns down to 36 ns).
Comment 1 Jonathan Wakely 2021-12-17 10:31:24 UTC
That's a nice speedup, and seems reasonable. We might be able to do it more simply though, without the full generality of the new policies.
Comment 2 Jonathan Wakely 2021-12-17 17:22:41 UTC
Created attachment 52024 [details]
Alternative implementation

This seems like a much simpler approach.

This causes 22_locale/ctype/is/string/89728_neg.cc to fail for C++98/11/14 modes, because without 'if constexpr' the body of __try_use_facet creates dozens of errors, rather than just one from the current definition of use_facet. That isn't the end of the world, but it's a regression in diagnostic quality.
Comment 3 Jonathan Wakely 2021-12-17 17:52:46 UTC
Created attachment 52025 [details]
Alternative implementation v2

The diagnostic regression is easy to fix with a static assertion before calling __try_use_facet.
Comment 4 Jonathan Wakely 2021-12-17 21:05:24 UTC
(In reply to Jonathan Wakely from comment #3)
> Created attachment 52025 [details]
> Alternative implementation v2
> 
> The diagnostic regression is easy to fix with a static assertion before
> calling __try_use_facet.

Although that causes errors for certain uses of streams without including <locale>, specifically when building the library.

Also I forgot to say that I can still reproduce approx. 3x speedup using my patch.

Before:

BM_sstream_construct        288 ns          287 ns      2525913
BM_fstream_construct        319 ns          318 ns      2204065

After:

BM_sstream_construct       99.5 ns         99.1 ns      6512679
BM_fstream_construct        108 ns          107 ns      6536463
Comment 5 Dmitry Prokoptsev 2021-12-19 03:41:04 UTC
Created attachment 52029 [details]
Build fix for alternative implementation v2.
Comment 6 Dmitry Prokoptsev 2021-12-19 03:44:50 UTC
That would also work, I suppose (it even outperforms my original approach by a tiny bit -- 33 ns for v2 vs 36 for my original implementation).

There are a few build errors in the alternative implementations, however: is_convertible<locale::facet*, _Facet*>::value is false, as it tries to downcast the facet -- did you mean is_base_of<>, which would also better align with the static assert message?

Also, despite the always_inline attribute of __try_use_facet, my attempts to build the library with -O0 -ggdb and link my benchmark yielded a bunch of errors like 

    ld: /home/av/prgs/gcc/debug/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so: undefined reference to `std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > const* std::__try_use_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&)'

Suggest explicitly instantiating __try_use_facet where has_facet() and use_facet() is instantiated.

See attached fix for build problems I discovered.
Comment 7 Jonathan Wakely 2021-12-19 13:04:33 UTC
(In reply to Dmitry Prokoptsev from comment #6)
> That would also work, I suppose (it even outperforms my original approach by
> a tiny bit -- 33 ns for v2 vs 36 for my original implementation).
> 
> There are a few build errors in the alternative implementations, however:
> is_convertible<locale::facet*, _Facet*>::value is false, as it tries to
> downcast the facet -- did you mean is_base_of<>, which would also better
> align with the static assert message?

Yes, I've already changed it to use that in my local branch.

> Also, despite the always_inline attribute of __try_use_facet, my attempts to
> build the library with -O0 -ggdb and link my benchmark yielded a bunch of
> errors like 
> 
>     ld:
> /home/av/prgs/gcc/debug/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.
> so: undefined reference to `std::num_get<char,
> std::istreambuf_iterator<char, std::char_traits<char> > > const*
> std::__try_use_facet<std::num_get<char, std::istreambuf_iterator<char,
> std::char_traits<char> > > >(std::locale const&)'

Yes, that's what happens for the debug libs built with --enable-libstdcxx-debug 

> Suggest explicitly instantiating __try_use_facet where has_facet() and
> use_facet() is instantiated.

Yes, I wanted to avoid that :-(

> See attached fix for build problems I discovered.

Thanks, I'll compare it with the fixes I've already got locally and finish retesting.
Comment 8 GCC Commits 2022-11-11 05:29:45 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:b3ac43a3c05744d62a963d656bed782fc867ad79

commit r13-3888-gb3ac43a3c05744d62a963d656bed782fc867ad79
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Nov 9 21:44:31 2022 +0000

    libstdc++: Avoid redundant checks in std::use_facet [PR103755]
    
    We do not need to do bounds checks or a runtime dynamic_cast when using
    std::has_facet and std::use_facet to access the default facets that are
    guaranteed to be present in every std::locale object. We can just index
    straight into the array and use a static_cast for the conversion.
    
    This patch adds a new std::__try_use_facet function that is like
    std::use_facet but returns a pointer, so can be used to implement both
    std::has_facet and std::use_facet. We can then do the necessary
    metaprogramming to skip the redundant checks in std::__try_use_facet.
    
    To avoid having to export (or hide) instantiations of the new function
    from libstdc++.so the instantiations are given hidden visibility. This
    allows them to be used in the library, but user code will instantiate it
    again using the definition in the header. That would happen anyway,
    because there are no explicit instantiation declarations for any of
    std::has_facet, std::use_facet, or the new std::__try_use_facet.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/103755
            * config/abi/pre/gnu.ver: Tighten patterns for facets in the
            base version. Add exports for __try_use_facet.
            * include/bits/basic_ios.tcc (basic_ios::_M_cache_locale): Use
            __try_use_facet instead of has_facet and use_facet.
            * include/bits/fstream.tcc (basic_filebuf::basic_filebuf()):
            Likewise.
            (basic_filebuf::imbue): Likewise.
            * include/bits/locale_classes.h (locale, locale::id)
            (locale::_Impl): Declare __try_use_facet as a friend.
            * include/bits/locale_classes.tcc (__try_use_facet): Define new
            function template with special cases for default facets.
            (has_facet, use_facet): Call __try_use_facet.
            * include/bits/locale_facets.tcc (__try_use_facet): Declare
            explicit instantiations.
            * include/bits/locale_facets_nonio.tcc (__try_use_facet):
            Likewise.
            * src/c++11/locale-inst-monetary.h (INSTANTIATE_FACET_ACCESSORS):
            Use new macro for facet accessor instantiations.
            * src/c++11/locale-inst-numeric.h (INSTANTIATE_FACET_ACCESSORS):
            Likewise.
            * src/c++11/locale-inst.cc (INSTANTIATE_USE_FACET): Define new
            macro for instantiating __try_use_facet and use_facet.
            (INSTANTIATE_FACET_ACCESSORS): Define new macro for also
            defining has_facet.
            * src/c++98/compatibility-ldbl.cc (__try_use_facet):
            Instantiate.
            * testsuite/22_locale/ctype/is/string/89728_neg.cc: Adjust
            expected errors.
Comment 9 Jonathan Wakely 2022-11-11 13:27:48 UTC
This causes an ABI regression on powerpc64le:

FAIL: libstdc++-abi/abi_check
Comment 10 seurer 2022-11-11 21:54:07 UTC
It also breaks the build on one of our powerpc64 le machines (just the one).  It fails in stage 1.

g:b3ac43a3c05744d62a963d656bed782fc867ad79, r13-3888-gb3ac43a3c05744

When building gcc on one of our systems (just the one) the build fails in stage 1.

Ubuntu 22.04.1 LTS
gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1) 
Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35


make
. . .
libtool: compile:  /home/seurer/gcc/git/build/gcc-trunk-bootstrap/./gcc/xgcc -shared-libgcc -B/home/seurer/gcc/git/build/gcc-trunk-bootstrap/./gcc -nostdinc++ -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/src -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs -L/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -B/home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/bin/ -B/home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/lib/ -isystem /home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/include -isystem /home/seurer/gcc/git/install/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/sys-include -fno-checking -I/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu -I/home/seurer/gcc/git/build/gcc-trunk-bootstrap/powerpc64le-unknown-linux-gnu/libstdc++-v3/include -I/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/libsupc++ -std=gnu++98 -fPIC -DPIC -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi=2 -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=compatibility-ldbl-alt128.lo -mno-gnu-attribute -g -O2 -D_GNU_SOURCE -mabi=ieeelongdouble -mno-gnu-attribute -Wno-psabi -std=gnu++11 -c /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc  -fPIC -DPIC -D_GLIBCXX_SHARED -o .libs/compatibility-ldbl-alt128.o
In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:36:
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:33:40: error: expected constructor, destructor, or type conversion before ';' token
   33 | INSTANTIATE_FACET_ACCESSORS(num_get<C>);
      |                                        ^
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:34:40: error: expected constructor, destructor, or type conversion before ';' token
   34 | INSTANTIATE_FACET_ACCESSORS(num_put<C>);
      |                                        ^
In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:37:
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:36:42: error: expected constructor, destructor, or type conversion before ';' token
   36 | INSTANTIATE_FACET_ACCESSORS(money_put<C>);
      |                                          ^
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:37:42: error: expected constructor, destructor, or type conversion before ';' token
   37 | INSTANTIATE_FACET_ACCESSORS(money_get<C>);
      |                                          ^
In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:44:
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:33:40: error: expected constructor, destructor, or type conversion before ';' token
   33 | INSTANTIATE_FACET_ACCESSORS(num_get<C>);
      |                                        ^
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-numeric.h:34:40: error: expected constructor, destructor, or type conversion before ';' token
   34 | INSTANTIATE_FACET_ACCESSORS(num_put<C>);
      |                                        ^
In file included from /home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/compatibility-ldbl-alt128.cc:45:
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:36:42: error: expected constructor, destructor, or type conversion before ';' token
   36 | INSTANTIATE_FACET_ACCESSORS(money_put<C>);
      |                                          ^
/home/seurer/gcc/git/gcc-trunk-bootstrap/libstdc++-v3/src/c++11/locale-inst-monetary.h:37:42: error: expected constructor, destructor, or type conversion before ';' token
   37 | INSTANTIATE_FACET_ACCESSORS(money_get<C>);
      |                                          ^
make[6]: *** [Makefile:1027: compatibility-ldbl-alt128.lo] Error 1


commit b3ac43a3c05744d62a963d656bed782fc867ad79 (HEAD)
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Nov 9 21:44:31 2022 +0000

    libstdc++: Avoid redundant checks in std::use_facet [PR103755]
Comment 11 Jonathan Wakely 2022-11-11 21:55:55 UTC
As I said in Bug 107632 comment 2:

I'm kinda tempted to just disable the new optimization on these targets, the handling of compat facets for different float ABIs is impossible to get right.
Comment 12 Jonathan Wakely 2022-11-11 22:39:08 UTC
I see the problem here though, and can fix it easily enough.
Comment 13 GCC Commits 2022-11-12 01:30:02 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:a7f51059fb009dcd7d491d6b2164bce75dbd9975

commit r13-3917-ga7f51059fb009dcd7d491d6b2164bce75dbd9975
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 11 22:36:01 2022 +0000

    libstdc++: Define INSTANTIATE_FACET_ACCESSORS macro in compat source [PR103755]
    
    compatibility-ldbl-alt128.cc re-includes locale-inst-numeric.h and
    locale-inst-monetary.h but wasn't defining the macros added in
    r13-3888-gb3ac43a3c05744.
    
    Put those macros in a new internal header that can be included everywhere
    they're used.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/103755
            * src/c++11/locale-inst-monetary.h: Include new header.
            * src/c++11/locale-inst-numeric.h: Likewise.
            * src/c++11/locale-inst.cc: Likewise.
            (INSTANTIATE_USE_FACET, INSTANTIATE_FACET_ACCESSORS): Move
            macro definitions to ...
            * src/c++11/facet_inst_macros.h: New file.
Comment 14 Jonathan Wakely 2022-11-12 02:00:50 UTC
Should be fixed now, I think
Comment 15 Jonathan Wakely 2023-01-06 12:12:53 UTC
Fixed then.
Comment 16 Jonathan Wakely 2023-01-11 14:07:08 UTC
Unfortunately this change causes a regression for libs that were statically linked to libstdc++.a before the PR 91057 fix. Any object which has the buggy std::locale::id::_M_id() code linked into it can get corrupted locale::_Impl::_M_facet arrays, where the facets are at the wrong indices.

Before the introduction of __try_use_facet those corrupted _M_facet arrays would result in a failed dynamic_cast and so has_facet would be false and use_facet would throw. With the new code in GCC 13 the static_cast succeeds, but with undefined behaviour.

So to avoid a regression from detecting the bug and throwing an exception to crashing with a segfault, I think we need to change __try_use_facet to use dynamic_cast, unfortunately.

We will still retain the use of __try_use_facet in std::basic_ios::_M_cache_locale, so we'll still only do three dynamic_casts not six, so that's still a bit better than it was before.
Comment 17 GCC Commits 2023-04-21 15:01:29 UTC
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:8caf5805ad76125b84430b8653003f4776489d46

commit r12-9462-g8caf5805ad76125b84430b8653003f4776489d46
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Apr 20 21:02:22 2023 +0100

    libstdc++: Optimize std::try_facet and std::use_facet [PR103755]
    
    The std::try_facet and std::use_facet functions were optimized in
    r13-3888-gb3ac43a3c05744 to avoid redundant checking for all facets that
    are required to always be present in every locale.
    
    This performs a simpler version of the optimization that only applies to
    std::ctype<char>, std::num_get<char>, std::num_put<char>, and the
    wchar_t specializations of those facets. Those are the facets that are
    cached by std::basic_ios, which means they're used on construction for
    every iostream object. This smaller change is suitable for the gcc-12
    branch, and mitigates the performance loss for powerpc64le-linux on the
    gcc-12 branch caused by r12-9454-g24cf9f4c6f45f7 for PR 103387. It also
    greatly improves the performance of constructing iostreams objects, for
    all targets.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/103755
            * include/bits/locale_classes.tcc (try_facet, use_facet): Do not
            check array index or dynamic type when accessing required
            specializations of std::ctype, std::num_get, or std::num_put.
            * testsuite/22_locale/ctype/is/string/89728_neg.cc: Adjust
            expected errors.
Comment 18 Jonathan Wakely 2023-04-21 15:06:44 UTC
(In reply to Jonathan Wakely from comment #16)
> So to avoid a regression from detecting the bug and throwing an exception to
> crashing with a segfault, I think we need to change __try_use_facet to use
> dynamic_cast, unfortunately.

Ugh, I completely forgot about this.