#include <array> int f[262144]; auto g(void) { return std::to_array(f); } (Thanks to Andrew for help reducing!) Baseline run: $ time g++ test.cc -std=gnu++20 -O0 -c real 0m17.274s user 0m16.806s sys 0m0.119s at -O1, 2, 3, or fast, it takes too long to ever finish. I killed it after 15 minutes. For the curious, that array size is 512*512. I didn't bisect to see where it starts to blow up.
Reduced: template<unsigned long N> struct array { int data[N]; }; template<unsigned long...> struct seq { }; template<unsigned long... Idx> array<sizeof...(Idx)> to_array_impl(int (&a)[sizeof...(Idx)], seq<Idx...>) { return {{a[Idx]...}}; } template<unsigned long N> array<N> to_array(int (&a)[N]) { return to_array_impl(a, seq<__integer_pack(N)...>{}); } int f[262144]; auto g(void) { return to_array(f); }
The "obvious" alternative impl using a lambda expression is even slower: template<unsigned long N> struct array { int data[N]; }; template<unsigned long...> struct seq { }; #ifndef USE_LAMBDA template<unsigned long... Idx> array<sizeof...(Idx)> to_array_impl(int (&a)[sizeof...(Idx)], seq<Idx...>) { return {{a[Idx]...}}; } #endif template<unsigned long N> array<N> to_array(int (&a)[N]) { #ifndef USE_LAMBDA return to_array_impl(a, seq<__integer_pack(N)...>{}); #else return [&a]<unsigned long... I>(seq<I...>) { return array<N>{{ a[I]... }}; }(seq<__integer_pack(N)...>{}); #endif } constexpr int S = 2000; int f[S]; array<S> g(void) { return to_array(f); }
$ time ~/gcc/9/bin/g++ 110167.cc -c -O3 real 0m5.568s user 0m5.497s sys 0m0.035s $ time ~/gcc/10/bin/g++ 110167.cc -c -O3 real 0m5.393s user 0m5.355s sys 0m0.028s $ time ~/gcc/11/bin/g++ 110167.cc -c -O3 real 0m10.543s user 0m10.205s sys 0m0.317s $ time ~/gcc/12/bin/g++ 110167.cc -c -O3 real 0m14.042s user 0m13.747s sys 0m0.270s $ time ~/gcc/13/bin/g++ 110167.cc -c -O3 real 0m9.422s user 0m9.232s sys 0m0.170s
(In reply to Jonathan Wakely from comment #2) > The "obvious" alternative impl using a lambda expression is even slower: Actually for GCC 13 it's faster: $ time ~/gcc/13/bin/g++ 110167.cc -c -O3 real 0m9.422s user 0m9.232s sys 0m0.170s $ time ~/gcc/13/bin/g++ 110167.cc -c -O3 -DUSE_LAMBDA real 0m6.403s user 0m6.342s sys 0m0.048s So that's nice.
It's hard to bisect where it got slower between 10 and 11, and between 11 and 12, because for --enable-checking builds the extra assertions slow everything down and the difference between "fast with slow assertions" and "slow with slow assertions" is much less significant than "fast" and "slow".
The bottom line, however, is that this use of std::to_array is not really reasonable. You're asking the compiler to create a variadic pack of 262144 integers, and then create a pack expansion for std::array<int, 262144>{arr[i]...} where i... has 262144 elements in the pack. That's going to be insanely large. Maybe we can do this though: template<typename _Tp, size_t _Nm> [[nodiscard]] constexpr array<remove_cv_t<_Tp>, _Nm> to_array(_Tp (&__a)[_Nm]) noexcept(is_nothrow_constructible_v<_Tp, _Tp&>) { static_assert(!is_array_v<_Tp>); static_assert(is_constructible_v<_Tp, _Tp&>); if constexpr (_Nm > 1024 && is_trivially_default_constructible_v<_Tp> && is_trivially_assignable_v<_Tp&, _Tp&>) { array<_Tp, _Nm> __arr; for (size_t __i = 0; __i < _Nm; ++i) __arr[__i] = __a[__i]; return __arr; } else if constexpr (is_constructible_v<_Tp, _Tp&>) return [&__a]<size_t... _Idx>(index_sequence<_Idx...>) { return array<remove_cv_t<_Tp>, _Nm>{{ __a[_Idx]... }}; }(make_index_sequence<_Nm>{}); __builtin_unreachable(); // FIXME: see PR c++/91388 }
For some value of 1024
Changing back to component=libstdc++, since I don't think the compiler can really help here, but libstdc++ can avoid asking the compiler to perform miracles.
Adjusting the misleading title though, as optimization makes it worse, but the problem exists even at -O0
(In reply to Jonathan Wakely from comment #6) > if constexpr (_Nm > 1024 && is_trivially_default_constructible_v<_Tp> > && is_trivially_assignable_v<_Tp&, _Tp&>) > { > array<_Tp, _Nm> __arr; > for (size_t __i = 0; __i < _Nm; ++i) > __arr[__i] = __a[__i]; > return __arr; I wonder if we should do this for any N, and rely on the compiler to DTRT for the loop.
It's also very bad for code size, so yes, a loopy implementation is very much prefered even for a much smaller threshold than 1024!
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:960de5dd886572711ef86fa1e15e30d3810eccb9 commit r14-1647-g960de5dd886572711ef86fa1e15e30d3810eccb9 Author: Jonathan Wakely <jwakely@redhat.com> Date: Thu Jun 8 12:24:43 2023 +0100 libstdc++: Optimize std::to_array for trivial types [PR110167] As reported in PR libstdc++/110167, std::to_array compiles extremely slowly for very large arrays. It needs to instantiate a very large specialization of std::index_sequence<N...> and then create a very large aggregate initializer from the pack expansion. For trivial types we can simply default-initialize the std::array and then use memcpy to copy the values. For non-trivial types we need to use the existing implementation, despite the compilation cost. As also noted in the PR, using a generic lambda instead of the __to_array helper compiles faster since gcc-13. It also produces slightly smaller code at -O1, due to additional inlining. The code at -Os, -O2 and -O3 seems to be the same. This new implementation requires __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1). libstdc++-v3/ChangeLog: PR libstdc++/110167 * include/std/array (to_array): Initialize arrays of trivial types using memcpy. For non-trivial types, use lambda expressions instead of a separate helper function. (__to_array): Remove. * testsuite/23_containers/array/creation/110167.cc: New test.
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:4c6bb36e88d5c8e510b10d12c01e3461c2aa4259 commit r13-8421-g4c6bb36e88d5c8e510b10d12c01e3461c2aa4259 Author: Jonathan Wakely <jwakely@redhat.com> Date: Thu Jun 8 12:24:43 2023 +0100 libstdc++: Optimize std::to_array for trivial types [PR110167] As reported in PR libstdc++/110167, std::to_array compiles extremely slowly for very large arrays. It needs to instantiate a very large specialization of std::index_sequence<N...> and then create a very large aggregate initializer from the pack expansion. For trivial types we can simply default-initialize the std::array and then use memcpy to copy the values. For non-trivial types we need to use the existing implementation, despite the compilation cost. As also noted in the PR, using a generic lambda instead of the __to_array helper compiles faster since gcc-13. It also produces slightly smaller code at -O1, due to additional inlining. The code at -Os, -O2 and -O3 seems to be the same. This new implementation requires __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1). libstdc++-v3/ChangeLog: PR libstdc++/110167 * include/std/array (to_array): Initialize arrays of trivial types using memcpy. For non-trivial types, use lambda expressions instead of a separate helper function. (__to_array): Remove. * testsuite/23_containers/array/creation/110167.cc: New test. (cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:ec5da76ad33dcba7858525fdb6b39288631fcd8a commit r12-10206-gec5da76ad33dcba7858525fdb6b39288631fcd8a Author: Jonathan Wakely <jwakely@redhat.com> Date: Thu Jun 8 12:24:43 2023 +0100 libstdc++: Optimize std::to_array for trivial types [PR110167] As reported in PR libstdc++/110167, std::to_array compiles extremely slowly for very large arrays. It needs to instantiate a very large specialization of std::index_sequence<N...> and then create a very large aggregate initializer from the pack expansion. For trivial types we can simply default-initialize the std::array and then use memcpy to copy the values. For non-trivial types we need to use the existing implementation, despite the compilation cost. As also noted in the PR, using a generic lambda instead of the __to_array helper compiles faster since gcc-13. It also produces slightly smaller code at -O1, due to additional inlining. The code at -Os, -O2 and -O3 seems to be the same. This new implementation requires __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1). libstdc++-v3/ChangeLog: PR libstdc++/110167 * include/std/array (to_array): Initialize arrays of trivial types using memcpy. For non-trivial types, use lambda expressions instead of a separate helper function. (__to_array): Remove. * testsuite/23_containers/array/creation/110167.cc: New test. (cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)
Fixed for 13.3 and 12.4