Created attachment 41107 [details] Source file demonstrating bad behaviour g++ uses an unreasonable amount of memory (greater than 5GB) when compiling the attached code. A source archive is attached, and the test can be run with `make`. Testing suggests this issue occurs with x86_64 g++s 5.4 and 6.2 from Ubuntu Yakkety, and both 6.3 and 7.0.1 built from source. The bug was originally encountered under a Yocto/Poky build cross-compiling from x86_64 to ARMv6 (targeting an ARM1176 core). Alan Modra's tests indicate g++ 4.8 performs better than any of the previously mentioned versions. Behaviour in clang 3.8 also appears reasonable. GCCs built from source were built with a configure line derived from Ubuntu Yakkety's g++ builds, with minor modifications to restrict the supported languages and architectures, and to change prefixes. Test case is built with `g++ -std=c++11 fru-gen.ii -o fru-gen.o` $ ~/gcc/master/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/andrew/gcc/master/bin/g++ COLLECT_LTO_WRAPPER=/home/andrew/gcc/master/libexec/gcc/x86_64-linux-gnu/7.0.1/lto-wrapper Target: x86_64-linux-gnu Configured with: ../configure -v --enable-languages=c,c++ --prefix=/home/andrew/gcc/master --enable-shared --enable-linker-build-id --without-included-gettext --enable-threads=posix --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-gtk-cairo --with-arch-directory=amd64 --disable-werror --disable-multilib --with-abi=m64 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --enable-bootstrap Thread model: posix gcc version 7.0.1 20170401 (experimental) (GCC)
Confirmed. The initializer is not unreasonably large IMHO but is certainly the problem. clang++ uses 500MB for me when not optimizing, g++ 4.8 and g++ 5 around 200MB. Starting with g++ 6 we use gigabytes. Seems to be caused by libstdc++ changes given g++ 7 preprocessed source uses as much memory with g++ 5 as with 7 (even worse, at 12GB and still compiling...)
Created attachment 41116 [details] unincluded testcase
All memory is allocated during Run till exit from #84 0x000000000080a9f7 in check_initializer ( decl=<var_decl 0x7ffff4ecdea0 frus>, init=<constructor 0x7ffff4dc2678>, flags=5, cleanups=0x7fffffffd7f0) at /space/rguenther/src/svn/gcc-7-branch/gcc/cp/decl.c:6318 ... Run till exit from #104 0x00000000009d4b2f in expand_default_init ( binfo=<tree_binfo 0x7ffff4ece960>, true_exp=<var_decl 0x7ffff4ecdea0 frus>, exp=<var_decl 0x7ffff4ecdea0 frus>, init=<constructor 0x7ffff4dc2678>, flags=5, complain=3) at /space/rguenther/src/svn/gcc-7-branch/gcc/cp/init.c:1787 where it all the time does template instantiation and overload resolution. I wonder if some trivial caching of "last elements overload set" during initializer processing could speed up things dramatically and avoid generating so much garbage (we don't GC during a single check_initializer call).
For an unoptimized compiler the profile points at conversion_obstack_alloc called 16 million times when simplifying the testcase to contain a single initializer element. Note we essentially have a std::map<unsigned, std::map<std::string, std::map<std::string, std::map<std::string, std::map<std::string, std::string> > > > > initializer. Simplifying the initializer down to extern const FruMap frus = { {1,{ {"/system/chassis/motherboard/cpu0",{ {"xyz.openbmc_project.Inventory.Decorator.Asset",{ {"PartNumber",{ {"IPMIFruSection","Board"}, }}, }}, }}, }}, }; shows that adding another element to the innermost map increases memory use from 107MB to 124MB. So even individual elements are quite costly. For the above simplest case we still have 1.4 _million_ calls to conversion_obstack_alloc. The C++ code is also a heavy allocator of tree vectors and tree lists during overload resolution / template instantiation and matching.
(In reply to Richard Biener from comment #1) > Seems to be caused by libstdc++ changes given g++ 7 preprocessed source uses > as much memory with g++ 5 as with 7 (even worse, at 12GB and still > compiling...) I doubt there's anything we can do to reduce it without losing standards conformance in some way, but I'll take a look and what changed.
Seems we build many functions (instantiate templates?) and then unify them via fn_type_unification? I believe the root cause is somewhere deeply embedded in the C++ FE ... But again, using preprocessed source with GCC 5 libstdc++ with GCC 7 compiles the thing in just 250MB ram... (note this is all with the old ABI classes but quick testing shows new std::string doesn't make a difference here).
Created attachment 41120 [details] testcase just using <map> Reduced testcase.
The drastic slowdown started with r225189 We can't realistically revert that, so we'll have to find a way to make the FE faster. I don't really know what I'm talking about, but I like this idea: (In reply to Richard Biener from comment #3) > I wonder if some trivial caching of "last elements overload set" during > initializer processing could speed up things dramatically and avoid > generating so much garbage > (we don't GC during a single check_initializer call).
Reduced: #include <utility> typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, std::pair<const char *, std::pair<const char *, const char *> > > > > FruMap; #define INIT { 1, \ { "/system/chassis/motherboard/cpu0", \ { "xyz.openbmc_project.Inventory.Decorator.Asset", \ { "PartNumber", \ {"IPMIFruSection","Board"}, \ }, \ } \ }, \ } extern const FruMap frus[] = { INIT, INIT, INIT, INIT, INIT, }; GCC 5: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (17%) wall 1323 kB (15%) ggc phase parsing : 0.04 (100%) usr 0.01 (100%) sys 0.05 (83%) wall 7520 kB (84%) ggc |name lookup : 0.01 (25%) usr 0.00 ( 0%) sys 0.03 (50%) wall 581 kB ( 7%) ggc |overload resolution : 0.01 (25%) usr 0.00 ( 0%) sys 0.01 (17%) wall 1867 kB (21%) ggc parser (global) : 0.02 (50%) usr 0.00 ( 0%) sys 0.02 (33%) wall 2692 kB (30%) ggc parser struct body : 0.01 (25%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 911 kB (10%) ggc template instantiation : 0.01 (25%) usr 0.01 (100%) sys 0.03 (50%) wall 3239 kB (36%) ggc TOTAL : 0.04 0.01 0.06 8941 kB GCC 6: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1397 kB ( 1%) ggc phase parsing : 0.62 (98%) usr 0.19 (100%) sys 0.81 (96%) wall 125355 kB (99%) ggc phase opt and generate : 0.01 ( 2%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 231 kB ( 0%) ggc |name lookup : 0.01 ( 2%) usr 0.01 ( 5%) sys 0.00 ( 0%) wall 1155 kB ( 1%) ggc |overload resolution : 0.24 (38%) usr 0.04 (21%) sys 0.27 (32%) wall 68949 kB (54%) ggc callgraph construction : 0.01 ( 2%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 231 kB ( 0%) ggc callgraph optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc preprocessing : 0.01 ( 2%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 187 kB ( 0%) ggc parser (global) : 0.36 (57%) usr 0.10 (53%) sys 0.55 (65%) wall 103307 kB (81%) ggc parser struct body : 0.01 ( 2%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1227 kB ( 1%) ggc parser inl. meth. body : 0.01 ( 2%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 95 kB ( 0%) ggc template instantiation : 0.21 (33%) usr 0.09 (47%) sys 0.24 (29%) wall 19154 kB (15%) ggc symout : 0.02 ( 3%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1387 kB ( 1%) ggc TOTAL : 0.63 0.19 0.84 127016 kB
Or even just: #include <utility> typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, std::pair<const char *, std::pair<const char *, const char *> > > > > FruMap; #define INIT { 1, { "", { "", { "", {"",""}, }, } }, } extern const FruMap frus[] = { INIT, INIT, INIT, INIT, INIT, }; All the time is spent in implicit_conversion().
With gcc-6 every extra member of the init-list increases memory usage dramatically: TOTAL : 0.64 0.17 0.82 126289 kB TOTAL : 0.76 0.23 1.00 147632 kB TOTAL : 0.88 0.26 1.15 168975 kB but with gcc-5: TOTAL : 0.03 0.01 0.06 8390 kB TOTAL : 0.03 0.01 0.05 8727 kB TOTAL : 0.05 0.02 0.08 9064 kB
switching offender to the C++ FE
typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, std::pair<const char *, const char *> > > > FruMap; extern const FruMap frus[] = { { 1, { "", { "", {"",""}, }, }, }, }; is twice as fast as typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, std::pair<const char *, std::pair<const char *, const char *> > > > > FruMap; extern const FruMap frus[] = { { 1, { "", { "", { "", {"",""}, }, } }, }, }; the latter with 70713 implicit_conversion calls while the former with "only" 14331 implicit_conversion calls. Removing another std::pair<> level reduces it to 4389 calls (still 10689 tsubst_copy_and_build calls(!)). typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, const char *> > > FruMap; extern const FruMap frus[] = { { 1, { "", {"",""}, }, }, }; constexpr evaluation seems to cause quite some bits of it. The constexpr hash is quite big and we arrive with following exprs to evaluate/cache: {} 0 || 0 std::_PCC<false, const char*, const char*>::_ConstructiblePair<const char*, const char*> () !std::_PCC<false, const char*, const char*>::_ImplicitlyConvertiblePair<const char*, const char*> () std::_PCC<false, const char*, const char*>::_ConstructiblePair<const char*, const char*> () && !std::_PCC<false, const char*, const char*>::_ImplicitlyConvertiblePair<const char*, const char*> () and for the && we don't use the cached LHS we computed above because we don't use maybe_constant_value for example in cxx_eval_logical_expression but we recurse into cxx_eval_constant_expression! A simple patch like Index: gcc/cp/constexpr.c =================================================================== --- gcc/cp/constexpr.c (revision 246678) +++ gcc/cp/constexpr.c (working copy) @@ -3860,6 +3860,9 @@ lookup_placeholder (const constexpr_ctx return ob; } +static GTY((deletable)) hash_map<tree, tree> *cv_cache; + + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -3874,6 +3877,11 @@ cxx_eval_constant_expression (const cons constexpr_ctx new_ctx; tree r = t; + tree *cached; + if (cv_cache + && (cached = cv_cache->get (t))) + return *cached; + if (jump_target && *jump_target) { /* If we are jumping, ignore all statements/expressions except those @@ -4824,8 +4832,6 @@ fold_simple (tree t) Otherwise, if T does not have TREE_CONSTANT set, returns T. Otherwise, returns a version of T without TREE_CONSTANT. */ -static GTY((deletable)) hash_map<tree, tree> *cv_cache; - tree maybe_constant_value (tree t, tree decl) { should improve this, but while it catches some cases it doesn't result in an overall improvement. The patch is likely incorrect anyway. But I think it shows the issue (and likely the patch doesn't help because "equivalent" calls are not detected as such by the cv_cache map which ends up with pointer equivalence AFAICS). Oh, and there are constexpr evaluations that do not go through the cache anyway (potential_constant_expression, cxx_constant_value)
(In reply to Richard Biener from comment #13) > typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, > std::pair<const char *, const char *> > > > FruMap; > extern const FruMap frus[] = { > { 1, { "", { "", {"",""}, }, }, }, > }; > > is twice as fast as > > typedef std::pair<unsigned, std::pair<const char *, std::pair<const char *, > std::pair<const char *, std::pair<const char *, const char *> > > > > FruMap; > > extern const FruMap frus[] = { > { 1, { "", { "", { "", {"",""}, }, } }, }, > }; > > the latter with 70713 implicit_conversion calls while the former with "only" > 14331 implicit_conversion calls. Removing another std::pair<> level reduces > it to 4389 calls (still 10689 tsubst_copy_and_build calls(!)). Similar amount of cxx_eval_call_expression so it boils down to constexpr handling...
Adding a timevar to cxx_eval_outermost_constant_expr shows minimal time spent in constexpr evaluation; it also doesn't show up in -fmem-report. So constexpr isn't the problem. It seems to me that the problem is that we added a bunch of constructor template overloads, and made all of them rely on doing further overload resolution in order to evaluate the SFINAE condition, and that's a lot more work for the compiler. And then there are a few places, notably push_tinst_level, that allocate memory that isn't freed until the next GC, so in this case it keeps building up. We ought to be able to be smarter about caching previous overload resolution results, but that would be new machinery that doesn't seem suitable for stage 4. I'll keep poking.
On April 19, 2017 8:10:58 AM GMT+02:00, "jason at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80290 > >--- Comment #15 from Jason Merrill <jason at gcc dot gnu.org> --- >Adding a timevar to cxx_eval_outermost_constant_expr shows minimal time >spent >in constexpr evaluation; it also doesn't show up in -fmem-report. So >constexpr >isn't the problem. The number of calls suggest otherwise. Are you sure the timevar isn't pushed away here (overload resolution seemed to be triggered by constexpr evaluation, so better constrxpr caching should help) >It seems to me that the problem is that we added a bunch of constructor >template overloads, and made all of them rely on doing further overload >resolution in order to evaluate the SFINAE condition, and that's a lot >more >work for the compiler. And then there are a few places, notably >push_tinst_level, that allocate memory that isn't freed until the next >GC, so >in this case it keeps building up. I suppose adding GC points isn't easily possible either? >We ought to be able to be smarter about caching previous overload >resolution >results, but that would be new machinery that doesn't seem suitable for >stage >4. > >I'll keep poking.
GCC 6.4 is being released, adjusting target milestone.
Not currently working on this.
Mine
Patch posted https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00709.html
Author: aoliva Date: Wed Apr 18 05:17:26 2018 New Revision: 259457 URL: https://gcc.gnu.org/viewcvs?rev=259457&root=gcc&view=rev Log: [PR c++/80290] recycle tinst garbage sooner tinst_level objects are created during template instantiation, and they're most often quite short-lived, but since there's no intervening garbage collection, they accumulate throughout the pass while most by far could be recycled after a short while. The original testcase in PR80290, for example, creates almost 55 million tinst_level objects, all but 10 thousand of them without intervening GC, but we don't need more than 284 of them live at a time. Furthermore, in many cases, TREE_LIST objects are created to stand for the decl in tinst_level. In most cases, those can be recycled as soon as the tinst_level object is recycled; in some relatively common cases, they are modified and reused in up to 3 tinst_level objects. In the same testcase, TREE_LISTs are used in all but 3 thousand of the tinst_level objects, and we don't need more than 89 tinst_level objects with TREE_LISTs live at a time. Furthermore, all but 2 thousand of those are created without intervening GC. This patch arranges for tinst_level objects to be refcounted (I've seen as many as 20 live references to a single tinst_level object in my testing), and for pending_template, tinst_level and TREE_LIST objects that can be recycled to be added to freelists; that's much faster than ggc_free()ing them and reallocating them shortly thereafter. In fact, the introduction of such freelists is what makes this entire patch lighter-weight, when it comes to memory use, and faster. With refcounting alone, the total memory footprint was still about the same, and we spent more time. In order to further reduce memory use, I've arranged for us to create TREE_LISTs lazily, only at points that really need them (when printing error messages). This grows tinst_level by an additional pointer, but since a TREE_LIST holds a lot more than an extra pointer, and tinst_levels with TREE_LISTs used to be allocated tens of thousands of times more often than plain decl ones, we still save memory overall. I was still not quite happy about growing memory use in cases that used template classes but not template overload resolution, so I changed the type of the errors field to unsigned short, from int. With that change, in_system_header_p and refcount move into the same word or half-word that used to hold errors, releasing a full word, bringing tinst_level back to its original size, now without any padding. The errors field is only used to test whether we've had any errors since the expansion of some template, to skip the expansion of further templates. If we get 2^16 errors or more, it is probably reasonable to refrain from expanding further templates, even if we would expand them before this change. With these changes, compile time for the original testcase at -O0, with default checking enabled, is cut down by some 3.7%, total GCable memory allocation is cut down by almost 20%, and total memory use (as reported by GNU time as maxresident) is cut down by slightly over 15%. for gcc/cp/ChangeLog PR c++/80290 * cp-tree.h (struct tinst_level): Split decl into tldcl and targs. Add private split_list_p, tree_list_p, and not_list_p inline const predicates; to_list private member function declaration; free public member function declaration; list_p, get_node and maybe_get_node accessors, and refcount data member. Narrow errors to unsigned short. * error.c (print_instantiation_full_context): Use new accessors. (print_instantiation_partial_context_line): Likewise. Drop const from tinst_level-typed parameter. * mangle.c (mangle_decl_string): Likewise. * pt.c (freelist): New template class. (tree_list_freelist_head): New var. (tree_list_freelist): New fn, along with specializations. (tinst_level_freelist_head): New var. (pending_template_freelist_head): Likewise. (tinst_level_freelist, pending_template_freelist): New fns. (tinst_level::to_list, tinst_level::free): Define. (inc_refcount_use, dec_refcount_use): New fns for tinst_level. (set_refcount_ptr): New template fn. (add_pending_template): Adjust for refcounting, freelists and new accessors. (neglectable_inst_p): Take a NULL d as a non-DECL. (limit_bad_template_recursion): Use new accessors. (push_tinst_level): New overload to create the list. (push_tinst_level_loc): Make it static, split decl into two args, adjust tests and initialization to cope with split lists, use freelist, adjust for refcounting. (push_tinst_level_loc): New wrapper with the old interface. (pop_tinst_level): Adjust for refcounting. (record_last_problematic_instantiation): Likewise. (reopen_tinst_level): Likewise. Use new accessors. (instantiate_alias_template): Adjust for split list. (fn_type_unification): Likewise. (get_partial_spec_bindings): Likewise. (instantiate_pending_templates): Use new accessors. Adjust for refcount. Release pending_template to freelist. (instantiating_current_function_p): Use new accessors. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/cp-tree.h trunk/gcc/cp/error.c trunk/gcc/cp/mangle.c trunk/gcc/cp/pt.c
The patch is in. I suppose there's still room for improvement, but this was as far as I could get for now.
> The patch is in. I suppose there's still room for improvement, but this was as far as I could get for now. Thanks for your effort! I'll take it for a test drive when I get a moment, we hit this case still with one of our codebases. Is there a chance the pach will be backported? I'd love to apply it to the 7.3.0 release (which we're using via Yocto).
While memory use improved the testcase in comment#10 still shows substantial increase in memory and time for each added init element: 1.24user 0.04system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 170772maxresident)k 1.49user 0.07system 0:01.56elapsed 99%CPU (0avgtext+0avgdata 206396maxresident)k 0inputs+0outputs (0major+50914minor)pagefaults 0swaps 1.70user 0.05system 0:01.76elapsed 100%CPU (0avgtext+0avgdata 233904maxresident)k so we improved some constant factor but didn't really address the underlying issue. Numbers from before the patch (on a loaded system, sorry) were 3.87user 0.10system 0:04.33elapsed 91%CPU (0avgtext+0avgdata 235760maxresident)k 0inputs+0outputs (0major+54479minor)pagefaults 0swaps 5.23user 0.16system 0:06.06elapsed 88%CPU (0avgtext+0avgdata 283860maxresident)k 0inputs+0outputs (0major+64971minor)pagefaults 0swaps 6.17user 0.15system 0:06.90elapsed 91%CPU (0avgtext+0avgdata 323796maxresident)k 0inputs+0outputs (0major+74942minor)pagefaults 0swaps
I built and ran gcc master over the project that was causing us pain, and it still does. Here's the output: $ /usr/bin/time make libapphandler_la-sensor-gen.lo CXX libapphandler_la-sensor-gen.lo TINST: ctor 2 dtor 2 max 1 end 0 long 2 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 1 reused: 1 TINST: ctor 178 dtor 178 max 1 end 0 long 176 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 1 reused: 2 TINST: ctor 589 dtor 589 max 12 end 0 long 411 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 2 reused: 14 TINST: ctor 876 dtor 844 max 33 end 32 long 411 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 2 reused: 47 TINST: ctor 1105 dtor 1055 max 52 end 50 long 411 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 2 reused: 67 TINST: ctor 5179 dtor 5057 max 124 end 122 long 4074 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 9 reused: 141 TINST: ctor 7298 dtor 7143 max 163 end 155 long 4074 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 9 reused: 182 TINST: ctor 10737 dtor 10391 max 347 end 346 long 4074 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 9 reused: 374 TINST: ctor 20890547 dtor 20889843 max 704 end 704 long 20879810 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 9 reused: 732 TINST: ctor 20912080 dtor 20911687 max 791 end 393 long 20879810 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 23 reused: 819 TINST: ctor 20917617 dtor 20917349 max 791 end 268 long 20879810 LST: ctor 0 dtor 0 max 0 end 0 long 0 REFS: 23 reused: 834 96.24user 2.97system 1:39.31elapsed 99%CPU (0avgtext+0avgdata 5058160maxresident)k 37592inputs+23512outputs (192major+1376999minor)pagefaults 0swaps
I don't know that the so-called underlying problem is fixable. Several template constructors are now involved that weren't in earlier versions of the standard, and so we have to instantiate them. Even if we recover some of the temporary memory with the patch that went in, Jason's comment 15 still (and again) applies. Which means this should now be marked as deferred.
Author: aoliva Date: Thu Apr 19 06:39:06 2018 New Revision: 259486 URL: https://gcc.gnu.org/viewcvs?rev=259486&root=gcc&view=rev Log: PR c++/80290 * cp-tree.h (tinst_level::free): Fix whitespace. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/cp-tree.h
This is definitely an algorithmic complexity issue. With 4 levels of pair, we do 3886 substitutions of default arguments. With 5 levels, we do 23326, a bit more than 5x as many. The basic problem seems to be that for function parameters that have no deducible template parameters, we check convertibility as part of type deduction, and then we check convertibility again once we've actually formed the candidate. And many of the pair constructor templates use the pair template parameters in their parameter types, which are non-deducible in this context. With a simple example like template <class T, class U> struct A { template <class U2 = U> A(T, U) { } }; using A1 = A<int, int>; using A2 = A<int, A1>; using A3 = A<int, A2>; A1 a1 = { 1, 1 }; A2 a2 = { 1, { 1, 1 } }; A3 a3 = { 1, { 1, { 1, 1 } } }; We end up considering the constructor template once for a1, three times (rather than twice) for a2, and seven times (rather than three times) for a3. If the second parameter is U2, and therefore deduced, the number of constructor calls scales linearly as expected.
Author: jason Date: Wed Jun 27 02:59:38 2018 New Revision: 262172 URL: https://gcc.gnu.org/viewcvs?rev=262172&root=gcc&view=rev Log: PR c++/80290 - memory-hog with std::pair. * pt.c (fn_type_unification): Add convs parameter. (check_non_deducible_conversion): Remember conversion. (check_non_deducible_conversions): New. Do checks here. (type_unification_real): Not here. Remove flags parm. * call.c (add_function_candidate): Make convs a parameter. Don't recalculate the conversion if it's already set. (add_template_candidate_real): Allocate convs here. (good_conversion, conv_flags): New. When the std::pair constructors got more complex to handle, it aggravated a preexisting algorithmic problem in template overload resolution: As part of template argument deduction in a call, once we've deduced all the template arguments we can but before we substitute them to form an actual declaration, for any function parameters that don't involve template parameters we need to check that it's possible to convert the argument to the parameter type (wg21.link/cwg1391). As a result, we end up calculating the conversion twice: once here, and then again in add_function_candidate as part of normal overload resolution. Normally this isn't a big deal, but when the argument is a multiply-nested initializer list, doubling the conversion processing at each level leads to combinatorial explosion. The patch for trunk avoids the duplication by remembering the conversion we calculate at deduction time and then reusing it in overload resolution rather than calculating it again. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c trunk/gcc/cp/class.c trunk/gcc/cp/cp-tree.h trunk/gcc/cp/pt.c
Author: jason Date: Wed Jun 27 03:05:17 2018 New Revision: 262174 URL: https://gcc.gnu.org/viewcvs?rev=262174&root=gcc&view=rev Log: PR c++/80290 - memory-hog with std::pair. * pt.c (type_unification_real): Skip non-dependent conversion check for a nested list argument. (braced_init_depth): New. Modified: branches/gcc-8-branch/gcc/cp/ChangeLog branches/gcc-8-branch/gcc/cp/pt.c
Author: jason Date: Thu Jun 28 00:25:21 2018 New Revision: 262204 URL: https://gcc.gnu.org/viewcvs?rev=262204&root=gcc&view=rev Log: PR c++/80290 - memory-hog with std::pair. * pt.c (type_unification_real): Skip non-dependent conversion check for a nested list argument. (braced_init_depth): New. Modified: branches/gcc-7-branch/gcc/cp/ChangeLog branches/gcc-7-branch/gcc/cp/pt.c
Fixed for 7.4/8.2/9.