#define A(n) \ struct S##n { int i; }; \ S##n v##n;\ extern int foo (S##n, S##n);\ extern void bar (S##n); #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #ifndef N #define N 10000 #endif #if N == 1000 #define E(n) D(n##0) #elif N == 2000 #define E(n) D(n##0) D(n##1) #elif N == 3000 #define E(n) D(n##0) D(n##1) D(n##2) #else #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) #endif E(0) void foo () { #undef A #define A(n) if (foo (v##n, v##n)) bar (v##n); E(0) } testcase eats huge amounts of memory. With -DN=1000 -O0 it compiles quickly, using 432768 kB of memory (852MB in top), with -DN=2000 -O0 it already uses 1672544 kB (3.5GB in top). With -DN=3000 it took around 8GB in top. In --enable-gather-detailed-mem-stats -fmem-report, for N=1000 I see: cp/tree.c:1447 (ovl_cons) 64032000:15.1% 0: 0.0% 256032: 1.8% 0: 0.0% 2009001 c-family/c-common.c:9556 (make_tree_vector) 159840120:37.8% 0: 0.0% 0: 0.0% 0: 0.0% 3996003 cp/search.c:1135 (build_baselink) 191952000:45.4% 0: 0.0% 0: 0.0% 0: 0.0% 3999000 Total 422660566 8986384 14294815 2724053 10281399 source location Garbage Freed Leak Overhead Times and for N=2000: cp/tree.c:1447 (ovl_cons) 256064000:15.3% 0: 0.0% 512032: 1.9% 0: 0.0% 8018001 c-family/c-common.c:9556 (make_tree_vector) 639680120:38.1% 0: 0.0% 0: 0.0% 0: 0.0% 15992003 cp/search.c:1135 (build_baselink) 767904000:45.8% 0: 0.0% 0: 0.0% 0: 0.0% 15998000 Total 1677099246 12464328 27061439 3876781 40545425 source location Garbage Freed Leak Overhead Times
During perform_overload_resolution add_candidates allocates lots of ggc memory which splice_viable immediately throws away and we don't ggc_collect during parsing.
Created attachment 23906 [details] Patch This patch should avoid much of the baselink and tree vector garbage. Jakub, can you give it a spin?
Haven't bootstrapped/regtested it, but it is definitely improvement. With N=1000 and N=2000 the generated assembly is identical, for N=1000 reported TOTAL went down from 432768 kB to 89362 kB, for N=2000 from 1672544 kB to 298232 kB and on a box with 8GB of RAM I can compile even N=5000 case, which takes 1685817 kB reported TOTAL memory. N=10000 requires already too much RAM though. In the -DN=5000 -fmem-report dump the only interesting allocations are: cp/tree.c:1447 (ovl_cons) 1600160000:97.9% 0: 0.0% 1280032: 2.0% 0: 0.0% 50045001 Total 1634296366 38329920 65115511 11401989 51377483 source location Garbage Freed Leak Overhead Times so if even that garbage could be freed, this would be fixed completely. Even for N=1000 ovl_cons is the only one that really matters: cp/tree.c:1447 (ovl_cons) 64032000:90.2% 0: 0.0% 256032: 1.8% 0: 0.0% 2009001 Total 71012606 8986384 14294815 2724053 2289400 source location Garbage Freed Leak Overhead Times Those ovl_cons calls are from lookup_arg_dependent -> ... -> add_function -> build_overload. Is it guaranteed that perform_koenig_lookup, if it returns a chain of OVERLOADs, all OVERLOADs have been freshly make_noded and aren't shared with anything else? If yes, perhaps we could afterwards ggc_free the chain, or move it to some cache of OVERLOAD nodes and make ovl_cons start from that cache.
Created attachment 23920 [details] additional patch This ought to help with the OVERLOAD garbage.
Author: jason Date: Fri Apr 8 06:08:04 2011 New Revision: 172162 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172162 Log: PR c++/48481 * call.c (build_user_type_conversion_1): Use lookup_fnfields_slot. Release unused vector. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c
Author: jason Date: Fri Apr 8 06:08:13 2011 New Revision: 172163 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172163 Log: PR c++/48481 * cp-tree.h (OVL_ARG_DEPENDENT): New. * name-lookup.c (add_function): Set it. * semantics.c (finish_call_expr): Free OVERLOADs if it's set. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/cp-tree.h trunk/gcc/cp/name-lookup.c trunk/gcc/cp/semantics.c
Author: jason Date: Fri Apr 8 06:08:21 2011 New Revision: 172164 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172164 Log: PR c++/48481 * tree.c (build_overload): Allow an unwrapped FUNCTION_DECL at the end of the chain. * pt.c (dependent_template_p): Use OVL_CURRENT/NEXT. (iterative_hash_template_arg): Likewise. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/name-lookup.c trunk/gcc/cp/pt.c trunk/gcc/cp/tree.c
Should be fixed on the trunk.
The largest source of garbage at N=2000 is now emit_insn_raw at 9.1%, so I'm closing this as fixed.
Author: jason Date: Thu Jun 30 21:10:03 2011 New Revision: 175732 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175732 Log: PR c++/48481 * name-lookup.c (struct arg_lookup): Add fn_set. (add_function): Check it. (lookup_arg_dependent_1): Initialize it. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/name-lookup.c trunk/gcc/testsuite/g++.dg/template/crash37.C trunk/gcc/testsuite/g++.dg/template/ptrmem4.C trunk/gcc/testsuite/g++.old-deja/g++.other/pmf3.C