Bug 110167 - excessive compile time for std::to_array with huge arrays
Summary: excessive compile time for std::to_array with huge arrays
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 13.1.0
: P3 normal
Target Milestone: 12.4
Assignee: Jonathan Wakely
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2023-06-08 03:18 UTC by nightstrike
Modified: 2024-06-17 15:15 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-06-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nightstrike 2023-06-08 03:18:30 UTC
#include <array>
int f[262144]; auto g(void) { return std::to_array(f); }

(Thanks to Andrew for help reducing!)

Baseline run:

$ time g++ test.cc -std=gnu++20 -O0 -c
real	0m17.274s
user	0m16.806s
sys	0m0.119s


at -O1, 2, 3, or fast, it takes too long to ever finish.  I killed it after 15 minutes.

For the curious, that array size is 512*512.  I didn't bisect to see where it starts to blow up.
Comment 1 Jonathan Wakely 2023-06-08 09:51:59 UTC
Reduced:

template<unsigned long N>
struct array
{
  int data[N];
};

template<unsigned long...> struct seq { };


template<unsigned long... Idx>
array<sizeof...(Idx)>
to_array_impl(int (&a)[sizeof...(Idx)], seq<Idx...>)
{
  return {{a[Idx]...}};
}


template<unsigned long N>
array<N>
to_array(int (&a)[N])
{
  return to_array_impl(a, seq<__integer_pack(N)...>{});
}

int f[262144];

auto g(void) { return to_array(f); }
Comment 2 Jonathan Wakely 2023-06-08 10:17:34 UTC
The "obvious" alternative impl using a lambda expression is even slower:

template<unsigned long N>
struct array
{
  int data[N];
};

template<unsigned long...> struct seq { };

#ifndef USE_LAMBDA
template<unsigned long... Idx>
array<sizeof...(Idx)>
to_array_impl(int (&a)[sizeof...(Idx)], seq<Idx...>)
{
  return {{a[Idx]...}};
}
#endif

template<unsigned long N>
array<N>
to_array(int (&a)[N])
{
#ifndef USE_LAMBDA
  return to_array_impl(a, seq<__integer_pack(N)...>{});
#else
  return [&a]<unsigned long... I>(seq<I...>) {
    return array<N>{{ a[I]... }};
  }(seq<__integer_pack(N)...>{});
#endif
}

constexpr int S = 2000;
int f[S];

array<S> g(void) { return to_array(f); }
Comment 3 Jonathan Wakely 2023-06-08 10:21:23 UTC
$ time ~/gcc/9/bin/g++ 110167.cc -c -O3 

real    0m5.568s
user    0m5.497s
sys     0m0.035s

$ time ~/gcc/10/bin/g++ 110167.cc -c -O3 

real    0m5.393s
user    0m5.355s
sys     0m0.028s

$ time ~/gcc/11/bin/g++ 110167.cc -c -O3 

real    0m10.543s
user    0m10.205s
sys     0m0.317s

$ time ~/gcc/12/bin/g++ 110167.cc -c -O3 

real    0m14.042s
user    0m13.747s
sys     0m0.270s

$ time ~/gcc/13/bin/g++ 110167.cc -c -O3 

real    0m9.422s
user    0m9.232s
sys     0m0.170s
Comment 4 Jonathan Wakely 2023-06-08 10:22:49 UTC
(In reply to Jonathan Wakely from comment #2)
> The "obvious" alternative impl using a lambda expression is even slower:

Actually for GCC 13 it's faster:

$ time ~/gcc/13/bin/g++ 110167.cc -c -O3 

real    0m9.422s
user    0m9.232s
sys     0m0.170s

$ time ~/gcc/13/bin/g++ 110167.cc -c -O3  -DUSE_LAMBDA

real    0m6.403s
user    0m6.342s
sys     0m0.048s

So that's nice.
Comment 5 Jonathan Wakely 2023-06-08 11:05:13 UTC
It's hard to bisect where it got slower between 10 and 11, and between 11 and 12, because for --enable-checking builds the extra assertions slow everything down and the difference between "fast with slow assertions" and "slow with slow assertions" is much less significant than "fast" and "slow".
Comment 6 Jonathan Wakely 2023-06-08 15:01:14 UTC
The bottom line, however, is that this use of std::to_array is not really reasonable.

You're asking the compiler to create a variadic pack of 262144 integers, and then create a pack expansion for std::array<int, 262144>{arr[i]...} where i... has 262144 elements in the pack.

That's going to be insanely large.

Maybe we can do this though:

  template<typename _Tp, size_t _Nm>
    [[nodiscard]]
    constexpr array<remove_cv_t<_Tp>, _Nm>
    to_array(_Tp (&__a)[_Nm])
    noexcept(is_nothrow_constructible_v<_Tp, _Tp&>)
    {
      static_assert(!is_array_v<_Tp>);
      static_assert(is_constructible_v<_Tp, _Tp&>);
      if constexpr (_Nm > 1024 && is_trivially_default_constructible_v<_Tp>
		      && is_trivially_assignable_v<_Tp&, _Tp&>)
	{
	  array<_Tp, _Nm> __arr;
	  for (size_t __i = 0; __i < _Nm; ++i)
	    __arr[__i] = __a[__i];
	  return __arr;
	}
      else if constexpr (is_constructible_v<_Tp, _Tp&>)
	return [&__a]<size_t... _Idx>(index_sequence<_Idx...>) {
	  return array<remove_cv_t<_Tp>, _Nm>{{ __a[_Idx]... }};
	}(make_index_sequence<_Nm>{});
      __builtin_unreachable(); // FIXME: see PR c++/91388
    }
Comment 7 Jonathan Wakely 2023-06-08 15:09:58 UTC
For some value of 1024
Comment 8 Jonathan Wakely 2023-06-08 15:28:38 UTC
Changing back to component=libstdc++, since I don't think the compiler can really help here, but libstdc++ can avoid asking the compiler to perform miracles.
Comment 9 Jonathan Wakely 2023-06-08 15:30:37 UTC
Adjusting the misleading title though, as optimization makes it worse, but the problem exists even at -O0
Comment 10 Jonathan Wakely 2023-06-08 15:44:34 UTC
(In reply to Jonathan Wakely from comment #6)
>       if constexpr (_Nm > 1024 && is_trivially_default_constructible_v<_Tp>
> 		      && is_trivially_assignable_v<_Tp&, _Tp&>)
> 	{
> 	  array<_Tp, _Nm> __arr;
> 	  for (size_t __i = 0; __i < _Nm; ++i)
> 	    __arr[__i] = __a[__i];
> 	  return __arr;

I wonder if we should do this for any N, and rely on the compiler to DTRT for the loop.
Comment 11 Richard Biener 2023-06-09 07:00:44 UTC
It's also very bad for code size, so yes, a loopy implementation is very
much prefered even for a much smaller threshold than 1024!
Comment 12 GCC Commits 2023-06-09 12:08:45 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:960de5dd886572711ef86fa1e15e30d3810eccb9

commit r14-1647-g960de5dd886572711ef86fa1e15e30d3810eccb9
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jun 8 12:24:43 2023 +0100

    libstdc++: Optimize std::to_array for trivial types [PR110167]
    
    As reported in PR libstdc++/110167, std::to_array compiles extremely
    slowly for very large arrays. It needs to instantiate a very large
    specialization of std::index_sequence<N...> and then create a very large
    aggregate initializer from the pack expansion. For trivial types we can
    simply default-initialize the std::array and then use memcpy to copy the
    values. For non-trivial types we need to use the existing
    implementation, despite the compilation cost.
    
    As also noted in the PR, using a generic lambda instead of the
    __to_array helper compiles faster since gcc-13. It also produces
    slightly smaller code at -O1, due to additional inlining. The code at
    -Os, -O2 and -O3 seems to be the same. This new implementation requires
    __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
    since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110167
            * include/std/array (to_array): Initialize arrays of trivial
            types using memcpy. For non-trivial types, use lambda
            expressions instead of a separate helper function.
            (__to_array): Remove.
            * testsuite/23_containers/array/creation/110167.cc: New test.
Comment 13 GCC Commits 2024-03-12 14:17:16 UTC
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:4c6bb36e88d5c8e510b10d12c01e3461c2aa4259

commit r13-8421-g4c6bb36e88d5c8e510b10d12c01e3461c2aa4259
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jun 8 12:24:43 2023 +0100

    libstdc++: Optimize std::to_array for trivial types [PR110167]
    
    As reported in PR libstdc++/110167, std::to_array compiles extremely
    slowly for very large arrays. It needs to instantiate a very large
    specialization of std::index_sequence<N...> and then create a very large
    aggregate initializer from the pack expansion. For trivial types we can
    simply default-initialize the std::array and then use memcpy to copy the
    values. For non-trivial types we need to use the existing
    implementation, despite the compilation cost.
    
    As also noted in the PR, using a generic lambda instead of the
    __to_array helper compiles faster since gcc-13. It also produces
    slightly smaller code at -O1, due to additional inlining. The code at
    -Os, -O2 and -O3 seems to be the same. This new implementation requires
    __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
    since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110167
            * include/std/array (to_array): Initialize arrays of trivial
            types using memcpy. For non-trivial types, use lambda
            expressions instead of a separate helper function.
            (__to_array): Remove.
            * testsuite/23_containers/array/creation/110167.cc: New test.
    
    (cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)
Comment 14 GCC Commits 2024-03-13 09:52:39 UTC
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:ec5da76ad33dcba7858525fdb6b39288631fcd8a

commit r12-10206-gec5da76ad33dcba7858525fdb6b39288631fcd8a
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jun 8 12:24:43 2023 +0100

    libstdc++: Optimize std::to_array for trivial types [PR110167]
    
    As reported in PR libstdc++/110167, std::to_array compiles extremely
    slowly for very large arrays. It needs to instantiate a very large
    specialization of std::index_sequence<N...> and then create a very large
    aggregate initializer from the pack expansion. For trivial types we can
    simply default-initialize the std::array and then use memcpy to copy the
    values. For non-trivial types we need to use the existing
    implementation, despite the compilation cost.
    
    As also noted in the PR, using a generic lambda instead of the
    __to_array helper compiles faster since gcc-13. It also produces
    slightly smaller code at -O1, due to additional inlining. The code at
    -Os, -O2 and -O3 seems to be the same. This new implementation requires
    __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
    since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110167
            * include/std/array (to_array): Initialize arrays of trivial
            types using memcpy. For non-trivial types, use lambda
            expressions instead of a separate helper function.
            (__to_array): Remove.
            * testsuite/23_containers/array/creation/110167.cc: New test.
    
    (cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)
Comment 15 Jonathan Wakely 2024-03-13 10:00:33 UTC
Fixed for 13.3 and 12.4