The following program fails to compile for g++ 4.9.0 and further #include <utility> int main() { constexpr auto N = 1024; using T = std::make_integer_sequence<int, N>; static_assert(T::size() == N, ""); } fatal error: template instantiation depth exceeds maximum of 900 (use -ftemplate-depth= to increase the maximum) Note that Clang with -stdlib=libc++ can compile integer sequences of up to one million for machines with enough memory, and around 65K for online compilers such as wandbox.com. Clang + libstdc++ requires -ftemplate-depth=N for N beyond 256. Changing g++ default template instantiation depth beyond 900 is not a viable solution. It simply masks the problem of excessive template instantiations. Instead, the implementation of make_integer_sequence<T, N> should be made to scale as log(N). See e.g. http://stackoverflow.com/a/17426611/819272
Yep, been meaning to do this for some time.
Apparently VC and Clang have compiler hooks for this: https://www.reddit.com/r/cpp/comments/3t0nrc/true_story_efficient_packing_tales_of_c/cx26s02
A better O(log N) library-only solution than the linked one is available at https://github.com/taocpp/sequences/blob/master/include/tao/seq/make_integer_sequence.hpp Jonathan, we already exchanged some mails a few weeks ago about std::tuple improvements. The above implementation can be used if the compiler-intrinsic still needs time to materialize, the rest should be independently implementable. If you are interested, let me know how to proceed.
I already have a better make_integer_sequence, but have been trying to write an intrinsic.
Fixed on trunk
Author: redi Date: Tue Nov 17 19:54:33 2015 New Revision: 230496 URL: https://gcc.gnu.org/viewcvs?rev=230496&root=gcc&view=rev Log: PR libstdc++/66059 optimise _Build_index_tuple PR libstdc++/66059 * include/std/utility (_Build_index_tuple): Optimise. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/std/utility
(In reply to Daniel Frey from comment #3) > A better O(log N) library-only solution than the linked one is available at > https://github.com/taocpp/sequences/blob/master/include/tao/seq/ > make_integer_sequence.hpp The version I came up with is very close to Xeo's at stackoverflow. I tried something more like yours and it used a LOT more memory. Here's what I tested: #include <stddef.h> namespace std { template<size_t... _Indexes> struct _Index_tuple { }; #if DUP template<typename _ITup, int _Odd> struct _Itup_dup; template<size_t... _Ind> struct _Itup_dup<_Index_tuple<_Ind...>, 0> { static constexpr size_t _Nm = sizeof...(_Ind); using __type = _Index_tuple<_Ind..., _Nm + _Ind...>; }; template<size_t... _Ind> struct _Itup_dup<_Index_tuple<_Ind...>, 1> { static constexpr size_t _Nm = sizeof...(_Ind); using __type = _Index_tuple<_Ind..., _Nm + _Ind..., 2 * _Nm>; }; // Builds an _Index_tuple<0, 1, 2, ..., _Num-1>. template<size_t _Num> struct _Build_index_tuple { using __type = typename _Itup_dup< typename _Build_index_tuple<_Num / 2>::__type, _Num % 2>::__type; }; #else template<typename _ITup, typename _Jtup> struct _Itup_cat; template<size_t... _Ind1, size_t... _Ind2> struct _Itup_cat<_Index_tuple<_Ind1...>, _Index_tuple<_Ind2...>> { using __type = _Index_tuple<_Ind1..., (_Ind2 + sizeof...(_Ind1))...>; }; template<size_t _Num> struct _Build_index_tuple : _Itup_cat<typename _Build_index_tuple<_Num / 2>::__type, typename _Build_index_tuple<_Num - _Num / 2>::__type> { }; #endif template<> struct _Build_index_tuple<1> { typedef _Index_tuple<0> __type; }; template<> struct _Build_index_tuple<0> { typedef _Index_tuple<> __type; }; } template<typename T, typename U> struct is_same { static const bool value = false; }; template<typename T> struct is_same<T, T> { static const bool value = true; }; int main() { constexpr auto N = 1024; using T = std::_Build_index_tuple<N>::__type; static_assert(sizeof(T), ""); static_assert( is_same<std::_Build_index_tuple<10>::__type, std::_Index_tuple<0, 1, 2, 3, 4, 5, 6, 7, 8, 9>>::value, ""); static_assert( is_same<std::_Build_index_tuple<9>::__type, std::_Index_tuple<0, 1, 2, 3, 4, 5, 6, 7, 8>>::value, ""); } tmp$ g++11 it.cc -ftime-report Execution times (seconds) phase setup : 0.01 (11%) usr 0.00 ( 0%) sys 0.02 (20%) wall 1396 kB (15%) ggc phase parsing : 0.07 (78%) usr 0.01 (100%) sys 0.08 (80%) wall 7235 kB (80%) ggc phase opt and generate : 0.01 (11%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 421 kB ( 5%) ggc |name lookup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (10%) wall 111 kB ( 1%) ggc |overload resolution : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (10%) wall 66 kB ( 1%) ggc garbage collection : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (10%) wall 0 kB ( 0%) ggc template instantiation : 0.07 (78%) usr 0.01 (100%) sys 0.07 (70%) wall 6763 kB (75%) ggc initialize rtl : 0.01 (11%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 8 kB ( 0%) ggc TOTAL : 0.09 0.01 0.10 9066 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. tmp$ g++11 it.cc -ftime-report -DDUP Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.01 (33%) sys 0.02 ( 2%) wall 1396 kB ( 3%) ggc phase parsing : 0.91 (99%) usr 0.02 (67%) sys 0.93 (98%) wall 45729 kB (96%) ggc phase finalize : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc preprocessing : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 30 kB ( 0%) ggc parser struct body : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 22 kB ( 0%) ggc template instantiation : 0.89 (97%) usr 0.02 (67%) sys 0.91 (96%) wall 45262 kB (95%) ggc TOTAL : 0.92 0.03 0.95 47577 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. As you can see, defining DUP makes it 10 times slower and using 5 times as much memory. Looking at your implementation I see only one relevant difference, you pass N/2 as a template argument instead of calculating it with sizeof... i.e. template<typename _ITup, size_t N, int _Odd> struct _Itup_dup; template<size_t... _Ind, size_t _Nm> struct _Itup_dup<_Index_tuple<_Ind...>, _Nm, 0> { using __type = _Index_tuple<_Ind..., _Nm + _Ind...>; }; template<size_t... _Ind, size_t _Nm> struct _Itup_dup<_Index_tuple<_Ind...>, _Nm, 1> { using __type = _Index_tuple<_Ind..., _Nm + _Ind..., 2 * _Nm>; }; // Builds an _Index_tuple<0, 1, 2, ..., _Num-1>. template<size_t _Num> struct _Build_index_tuple : typename _Itup_dup< typename _Build_index_tuple<_Num / 2>::__type, _Num / 2, _Num % 2> { }; and indeed this is even faster than what I just committed. I didn't realise sizeof... would have such an impact. My one can be optimised to be almost as fast by avoiding sizeof... in _Itup_cat.
(In reply to Jonathan Wakely from comment #7) > (In reply to Daniel Frey from comment #3) > > A better O(log N) library-only solution than the linked one is available at > > https://github.com/taocpp/sequences/blob/master/include/tao/seq/ > > make_integer_sequence.hpp > > The version I came up with is very close to Xeo's at stackoverflow. I tried > something more like yours and it used a LOT more memory. > > Here's what I tested: > > [...snip...] > > As you can see, defining DUP makes it 10 times slower and using 5 times as > much memory. Looking at your implementation I see only one relevant > difference, you pass N/2 as a template argument instead of calculating it > with sizeof... i.e. Yeah, avoiding sizeof... is crucial. I don't really know why, I just know it is. And only for GCC, Clang doesn't care if you use sizeof... or pass in the size explicitly. This behavior from GCC might be worth a look from the compiler people, as there is no reason why it should use so much more memory. > and indeed this is even faster than what I just committed. I didn't realise > sizeof... would have such an impact. > > My one can be optimised to be almost as fast by avoiding sizeof... in > _Itup_cat. I guess my version (without the sizeof...) is still better the larger the numbers get, but it won't make a real difference in practice as it's kind of unrealistic to generate index sequences with 100000 or more elements.
I did a simple comparison of the improved libstdc++ implementation and Daniel's one - the difference in time and resource usage is still significant. $ cat with_stdlib.cpp #include <utility> std::make_index_sequence<5000> thing; $ cat with_tao.cpp #include <make_integer_sequence.hpp> tao::seq::make_index_sequence<5000> thing; $ ./g++trunk with_stdlib.cpp -c -ftime-report Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1386 kB ( 1%) ggc phase parsing : 0.97 (100%) usr 0.01 (100%) sys 0.97 (98%) wall 118230 kB (99%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc |name lookup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 440 kB ( 0%) ggc |overload resolution : 0.03 ( 3%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall 335 kB ( 0%) ggc parser (global) : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1459 kB ( 1%) ggc parser struct body : 0.02 ( 2%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1049 kB ( 1%) ggc parser inl. func. body : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 37 kB ( 0%) ggc template instantiation : 0.95 (98%) usr 0.01 (100%) sys 0.94 (95%) wall 115432 kB (96%) ggc TOTAL : 0.97 0.01 0.99 119627 kB Internal checks disabled; compiler is not suited for release. Configure with --enable-checking=release to enable checks. $ ./g++trunk -I sequences/include/tao/seq/ with_tao.cpp -c -ftime-report Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1386 kB (16%) ggc phase parsing : 0.10 (100%) usr 0.01 (100%) sys 0.11 (100%) wall 7413 kB (84%) ggc |name lookup : 0.01 (10%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 450 kB ( 5%) ggc preprocessing : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 9%) wall 192 kB ( 2%) ggc parser (global) : 0.01 (10%) usr 0.00 ( 0%) sys 0.01 ( 9%) wall 1483 kB (17%) ggc parser struct body : 0.02 (20%) usr 0.00 ( 0%) sys 0.02 (18%) wall 1063 kB (12%) ggc template instantiation : 0.07 (70%) usr 0.01 (100%) sys 0.07 (64%) wall 4563 kB (52%) ggc TOTAL : 0.10 0.01 0.11 8811 kB Internal checks disabled; compiler is not suited for release. Configure with --enable-checking=release to enable checks.
(In reply to Adrian Wielgosik from comment #9) > TOTAL : 0.97 0.01 0.99 > 119627 kB By passing in the length of the first sequence instead of using sizeof... the std::make_integer_sequence result comes down to: TOTAL : 0.23 0.01 0.25 15114 kB Which is very reasonable, and within an order of magnitude of Daniel's, which I think is good enough (especially if we're going to get a compiler intrinsic eventually anyway). But rather than making that change I think the right thing to do is to fix the front-end so that sizeof... doesn't make such a big difference and we don't have to jump through hoops to avoid it. I'll file a PR for that tomorrow.
(In reply to Jonathan Wakely from comment #7) > The version I came up with is very close to Xeo's at stackoverflow. I tried > something more like yours and it used a LOT more memory. > > Here's what I tested: > > #include <stddef.h> > > namespace std > { > template<size_t... _Indexes> struct _Index_tuple { }; > > #if DUP > template<typename _ITup, int _Odd> struct _Itup_dup; > > template<size_t... _Ind> > struct _Itup_dup<_Index_tuple<_Ind...>, 0> > { > static constexpr size_t _Nm = sizeof...(_Ind); > using __type = _Index_tuple<_Ind..., _Nm + _Ind...>; > }; It turns out that the huge cost of this version is not due to sizeof... it's due to the static variable, _Nm Avoiding sizeof... helps too, but not as much as eliminating the variable and writing it as: using __type = _Index_tuple<_Ind..., sizeof...(_Ind) + _Ind...>;
I've opened PR 68422 for the sizeof... inefficiency.
*** Bug 68642 has been marked as a duplicate of this bug. ***
Ping to get this merged into the upcoming 5.4 release.
Fixed for 5.5 too.
Author: redi Date: Wed May 17 17:18:07 2017 New Revision: 248163 URL: https://gcc.gnu.org/viewcvs?rev=248163&root=gcc&view=rev Log: PR libstdc++/66059 optimise _Build_index_tuple Backport from mainline 2015-11-17 Jonathan Wakely <jwakely@redhat.com> PR libstdc++/66059 * include/std/utility (_Build_index_tuple): Optimise. Modified: branches/gcc-5-branch/libstdc++-v3/ChangeLog branches/gcc-5-branch/libstdc++-v3/include/std/utility