When compiling the attached program with g++ 3.3, the compiler takes about 80 MB of main memory on Intel/x86. When compiling it with g++ 3.4, the compiler takes > 400 MB and eventually crashes (potentially due to the Linux kernel killing processes due to out-of-memory). Since standard libraries are different between 3.3 and 3.4, I provide two preprocessed files. (This is boost random number library random_test.cpp.) g++ -v rt-3.3.ii [...] Configured with: ../gcc-3.3/configure --prefix=/usr/local --enable-threads --enable-shared Thread model: posix gcc version 3.3 (ok) /opt/exp/gcc-3.4/bin/g++ -v rt-3.4.ii [...] g++: Internal error: Killed (program cc1plus) Please submit a full bug report. See <URL:http://gcc.gnu.org/bugs.html> for instructions.
Created attachment 5021 [details] preprocessed random_test.cpp for g++ 3.3
Created attachment 5022 [details] preprocessed random_test.cpp for g++ 3.4
My 3.3.1 (20030707) takes about 260M of memory while 3.4 (20031030) takes about 420M of memory.
It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB.
My patches for saving space in C++ help but it does not fix the problem.
I think the problem is that 3.4 is not able to collect garbage while instantiating the templates. Calling ggc_collect while instantiating the templates and at the right level, I get {GC 95280k -> 45466k} which shows that it gets rid of half of the memory but it crashes right after doing that.
I think I have a patch for this, I just call ggc_collect in instantiate_decl if it is okay to do so.
Mine, I think.
the patch which I had in mind did not work, there is too much stored on the stack for this to work correctly.
interesting...
Created attachment 5425 [details] broken patch This is broken but really it is not the patch itself which is broken but rather the C++ front-end keeps references to variables on the stack/registers without references in variables seeable by the GC.
I am testing patch that peaks at 28MB with unit-at-a-time. It seems to be possible to deffer instantiation of all templates to very last pass where we can ggc collect.
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-01/msg00772.html>.
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Changes by: hubicka@gcc.gnu.org 2004-01-13 23:59:20 Modified files: gcc : ChangeLog cgraphunit.c gcc/cp : ChangeLog decl2.c optimize.c Log message: Partial fix PR c++/12850 * cgraphunit.c (cgraph_finalize_function): Always ggc_collect when at zero nest level. * decl2.c (mark_used): Do not proactively instantiate templates when compiling in unit-at-a-time or not optimizing. * optimize.c (maybe_clone_body): Do not increase function depth. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2270&r2=2.2271 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&r1=1.44&r2=1.45 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.3875&r2=1.3876 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl2.c.diff?cvsroot=gcc&r1=1.694&r2=1.695 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/optimize.c.diff?cvsroot=gcc&r1=1.102&r2=1.103
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Changes by: hubicka@gcc.gnu.org 2004-01-14 11:34:38 Modified files: gcc/cp : ChangeLog pt.c Log message: PR c++/12850 * pt.c (instantiate_decl): Do not increase function_depth. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.3879&r2=1.3880 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/pt.c.diff?cvsroot=gcc&r1=1.813&r2=1.814
Subject: Re: [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3 > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2003-12-19 08:43 ------- > It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB. I can get about 30MB at -O0, for unit-at-a-time we however still needs 250MB, this is the size of all templates instantiated together. I don't think we can reduce this for 3.4 further and it is no longer regression, in the future we may make trees more compact. This testcase has also interesting runtime properties, Mark may want to look at the for_each_template_param_r problem. Honza > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
Subject: Re: [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3 > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2003-12-19 08:43 ------- > It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB. Still GGC memory is only about 100MB, so perhaps we have 150MB memory leak in non-GGC memory reproduced by this. > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
Subject: Re: [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3 For a record, here is profile of the run. Lots of overhead is comming from quadratic behaviour in templates and frineds. Honza
Created attachment 5479 [details] profi
I will take a look for the leak but most likely it is not really a leak.
The only memory leak I had was from shorten_branches in final.c which I have a fix for now but that does account for the 60M difference between GC and real allocated memory (even though I suspect there are large amounts of pages still allocated because the GC is spread all over them). Also malloc only accounts for 20M.
Subject: Re: [3.4/3.5 Regression] memory consumption for heavy template instantiations tripled since 3.3 > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-01-27 16:35 ------- > The only memory leak I had was from shorten_branches in final.c which I have a fix for > now but that does account for the 60M difference between GC and real allocated > memory (even though I suspect there are large amounts of pages still allocated because > the GC is spread all over them). Also malloc only accounts for 20M. I have additional patches in testing cutting this into roughtly 118MB, still there is room for improvement as really we shall be decreasing amount of memory during the compilation stage that we don't (the parsed program after template instantiation is slightly over 60MB of GGC memory) We also burn a lot of unnecesary memory in C++ parser during name lookup, I am probably not going to address this as I simply don't understand the issue at all. Honza > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850 > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee.
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Changes by: hubicka@gcc.gnu.org 2004-01-29 00:34:09 Modified files: gcc : ChangeLog cgraph.c cgraphunit.c tree-optimize.c Log message: PR c++/12850 * cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and initial pointers. * cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS for functions that will be only inlined. (cgraph_mark_function_to_output): Likewise. (cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear; do not clear function body. * tree-optimize.c (clear_decl_rtl): Use decl_function_context. (tree_rest_of_compilation): Reorganize the logic releasing function body to use callgraph datastructure. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2535&r2=2.2536 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&r1=1.42&r2=1.43 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&r1=1.48&r2=1.49 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&r1=2.8&r2=2.9
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: hubicka@gcc.gnu.org 2004-01-30 11:46:28 Modified files: gcc : ChangeLog cgraph.c cgraphunit.c tree-optimize.c Log message: PR c++/12850 * cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and initial pointers. * cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS for functions that will be only inlined. (cgraph_mark_function_to_output): Likewise. (cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear; do not clear function body. * tree-optimize.c (clear_decl_rtl): Use decl_function_context. (tree_rest_of_compilation): Reorganize the logic releasing function body to use callgraph datastructure. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.110&r2=2.2326.2.111 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.1&r2=1.41.2.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.1&r2=1.46.2.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8&r2=2.8.8.1
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: hubicka@gcc.gnu.org 2004-01-31 12:01:25 Modified files: gcc : cgraph.c cgraphunit.c tree-optimize.c ChangeLog Log message: Revert the following patch until after AIX linker bug is fixed: PR c++/12850 * cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and initial pointers. * cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS for functions that will be only inlined. (cgraph_mark_function_to_output): Likewise. (cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear; do not clear function body. * tree-optimize.c (clear_decl_rtl): Use decl_function_context. (tree_rest_of_compilation): Reorganize the logic releasing function body to use callgraph datastructure. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.2&r2=1.41.2.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.2&r2=1.46.2.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8.8.2&r2=2.8.8.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.121&r2=2.2326.2.122
Subject: Bug 12850 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: hubicka@gcc.gnu.org 2004-02-01 13:01:15 Modified files: gcc : ChangeLog cgraph.c cgraphunit.c tree-optimize.c gcc/cp : ChangeLog semantics.c Log message: PR c++/12850 * cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and initial pointers. * cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS for functions that will be only inlined. (cgraph_mark_function_to_output): Likewise. (cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear; do not clear function body. * tree-optimize.c (clear_decl_rtl): Use decl_function_context. (tree_rest_of_compilation): Reorganize the logic releasing function body to use callgraph datastructure. * semantics.c (expand_body) Do emit_associated_thunks before expansion. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.127&r2=2.2326.2.128 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.3&r2=1.41.2.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.3&r2=1.46.2.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8.8.3&r2=2.8.8.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3892.2.23&r2=1.3892.2.24 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/semantics.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.381.4.4&r2=1.381.4.5
Memory usage is now at 104MB that is still more than 3.3 did, but give that this code is almost perfect testcase where unit-at-a-time shall lose, I think score is not bad. Mark's patches helped a lot to amount of garbage produced by C++ frontend on mainline now (reducing amount of garbage from 2GB to 700MB), but I think we can do significantly better still. One problem is large consumption of struct function (about 10% of memory surviving from frontend). Many of these struct functions are for functions that were never cgraph_finalize_function (either templates or unused functions). I think these should be freed but I don't know how. Also C++ frontend still produce a lot of gabrage(39MB of 700MB memory is needed) Major producers are: varray.c:161 (varray_grow) 20496 1473380 401588:1.600% 210256:0.532% cp/call.c:2181 (add_template_candidate_real) 96047 2339908 63560:2.051% 0:0.000% cp/name-lookup.c:1719 (set_identifier_type_value_with_scope) 151575 3031500 0:2.587% 0:0.000% tree.c:3962 (build_method_type_directly) 24117 2604636 482340:2.634% 1411456:3.574% cp/lex.c:773 (copy_decl) 31558 3408264 0:2.909% 2979720:7.545% cp/name-lookup.c:2800 (push_class_level_binding) 170972 3419440 0:2.918% 140:0.000% tree-inline.c:1970 (copy_tree_r) 176657 3606700 4084:3.081% 551820:1.397% cp/search.c:1200 (build_baselink) 168114 4034736 0:3.443% 552:0.001% cp/pt.c:6252 (tsubst_decl) 38668 4176144 0:3.564% 2390580:6.053% cp/name-lookup.c:4720 (store_bindings) 221688 4433760 0:3.784% 0:0.000% cp/pt.c:5738 (tsubst_template_args) 248319 5966576 74480:5.155% 570592:1.445% function.c:6397 (allocate_struct_function) 9158 4688896 1978128:5.689% 4087720:10.351% cp/pt.c:3814 (coerce_template_parms) 282818 6655016 65496:5.735% 46156:0.117% tree.c:3908 (build_function_type) 57993 6263244 1159860:6.335% 332672:0.842% (first percentage is garbage allocated, second percentage is amount of memory surviving to cgraph_optimize) Backend looks better now, produce about 300MB of additional garbage. About 10-20% can be saved by better aliasing and moving log links into separate structure. Overall we went from 4GB garbage to 900MB. I don't have enought knowledge of templates and name lookup to get things significantly better.
Subject: Re: [3.4/3.5 Regression] memory consumption for heavy template instantiations tripled since 3.3 > > ------- Additional Comments From hubicka at gcc dot gnu dot org 2004-02-14 14:27 ------- > Memory usage is now at 104MB that is still more than 3.3 did, but give that this code is almost perfect testcase where unit-at-a-time shall lose, I think score is not bad. > Mark's patches helped a lot to amount of garbage produced by C++ frontend on mainline now (reducing amount of garbage from 2GB to 700MB), but I think we can do significantly better still. > One problem is large consumption of struct function (about 10% of memory surviving from frontend). Many of these struct functions are for functions that were never cgraph_finalize_function (either templates or unused functions). I think these should be freed but I don't know how. > Also C++ frontend still produce a lot of gabrage(39MB of 700MB memory is needed) Major producers are: > varray.c:161 (varray_grow) 20496 1473380 401588:1.600% 210256:0.532% > cp/call.c:2181 (add_template_candidate_real) 96047 2339908 63560:2.051% 0:0.000% > cp/name-lookup.c:1719 (set_identifier_type_value_with_scope) 151575 3031500 0:2.587% 0:0.000% > tree.c:3962 (build_method_type_directly) 24117 2604636 482340:2.634% 1411456:3.574% > cp/lex.c:773 (copy_decl) 31558 3408264 0:2.909% 2979720:7.545% > cp/name-lookup.c:2800 (push_class_level_binding) 170972 3419440 0:2.918% 140:0.000% > tree-inline.c:1970 (copy_tree_r) 176657 3606700 4084:3.081% 551820:1.397% > cp/search.c:1200 (build_baselink) 168114 4034736 0:3.443% 552:0.001% > cp/pt.c:6252 (tsubst_decl) 38668 4176144 0:3.564% 2390580:6.053% > cp/name-lookup.c:4720 (store_bindings) 221688 4433760 0:3.784% 0:0.000% > cp/pt.c:5738 (tsubst_template_args) 248319 5966576 74480:5.155% 570592:1.445% > function.c:6397 (allocate_struct_function) 9158 4688896 1978128:5.689% 4087720:10.351% > cp/pt.c:3814 (coerce_template_parms) 282818 6655016 65496:5.735% 46156:0.117% > tree.c:3908 (build_function_type) 57993 6263244 1159860:6.335% 332672:0.842% > (first percentage is garbage allocated, second percentage is amount of memory surviving to cgraph_optimize) > Backend looks better now, produce about 300MB of additional garbage. About 10-20% can be saved by better aliasing and moving log links into separate structure. Overall we went from 4GB garbage to 900MB. > I don't have enought knowledge of templates and name lookup to get things significantly better. Actually I messed up the numbers. We produce 1.1GB of garbage in frontend and 1.2GB in backend. I have about 30% rediction of backend memory by mixture of retirincg line number notes, moving log links away and fixing some of cselib datastructures. One big problem is that inlined bodies remain reachable somehow. Partly it is because of ABSTRACT_ORIGIN pointers after subsequent inlining but I am not sure what is really causing the rest. The amount of memory used by trees grows from 39MB to 100MB during compilation stage. Honza > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850 > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee.
There's nothing more to be fixed here for 3.4.x, so I've retargeted this at 3.5.
We no longer have major memory consumption regression here. I don't want to see it red ;)
could someone test this again (I think Jan's memory tester has the numbers for the mainline but I could be wrong).
Removing the patch keyword since all the patches referenced here have been applied.
I should note that 4.0.0 is like 3x faster than 3.3.2 at -O1 on this test.
On the mainline at -O1 (since I cannot compile at -O0 but that is a different bug which I already filed): cp/lex.c:716 (copy_decl) 910284: 0.1% 0: 0.0% 6083812:11.3% 0: 0.0% 56404 ggc-common.c:193 (ggc_calloc) 3299632: 0.4% 11884228: 2.5% 1207832: 2.2% 2736868: 1.6% 22853 tree.c:4530 (build_method_type_directly) 1173592: 0.1% 0: 0.0% 2903048: 5.4% 750960: 0.4% 26820 tree.c:4266 (build_reference_type_for_mode) 456: 0.0% 0: 0.0% 602376: 1.1% 111048: 0.1% 3966 cp/class.c:2455 (maybe_add_class_template_decl_l 0: 0.0% 0: 0.0% 1061232: 2.0% 0: 0.0% 44218 tree.c:472 (copy_list) 2232: 0.0% 0: 0.0% 1083164: 2.0% 0: 0.0% 8854 Those are ones which leak still.
Here are the results for -O0, now that PR 18683 is now fixed: cp/lex.c:716 (copy_decl) 1087604: 0.3% 0: 0.0% 5906492:10.6% 0: 0.0% 56404 cp/pt.c:3978 (coerce_template_parms) 41586524: 9.6% 0: 0.0% 136540: 0.2% 3865680: 7.4% 1138236 Though we do create a lot: cp/parser.c:278 (cp_lexer_new_main) 0: 0.0% 22585856:36.1% 0: 0.0% 6332928:12.1% 5 Which is mostly a ggc_realloc of a buffer of all the tokens, maybe there is a better way of allocating this buffer as it seems like we create a lot of overhead because ot it.
The initial CP lexer bugger size is 10000: #define CP_LEXER_BUFFER_SIZE 10000 That came in with the lex-all-ahead patch from Matt and Zack, on 2004-09-20 (parser.c rev. 1.250 for the CVS history diggers) but it seems a bit low to me if you're going to lex the whole file up front. I would not be surprised if the average C++ code with lots of templates has several 100,000 tokens... Let me see: - preprocessed sources for generate.ii from PR8361, blank and pound lines stripped: 36200 lines - an average of 7 tokens per line in the first 500 lines, let's assume that's a reasonable average for the whole file (it's easy to instrument g++ to get the exact number of tokens, if you want more accurate numbers ;-) That makes it >250,000 tokens for this file. Since we double the buffer, we have: 10,000 + 20,000 + 40,000 + 80,000 + 160,000 + 320,000 = 630000 That is the number of tokens we have allocate room for, with no ggc-collect in the middle. With ggc-page, which has power-of-2 based page sizes, it's safe to assume that each previous buffer is too small to be reallocated, so a full new buffer is allocated and the old one is memcpy-ed to the new one. With checking off, we ggc_free the old buffer, but with checking enabled we don't so after finishing the whole lexing process, we have keep around a buffer of ~380,000*sizeof(cp_token), so that's roughly 10MB of memory we can't reclaim until the first ggc_collect call. Maybe buffer should not be in GC memory at all? We know the exact live time of buffer, and as far as I can tell we never ggc_collect while it is live. According to the comments for cp_lexer, "Tokens are never added to the cp_lexer after it is created." So it may be cheaper to have the buffer xmalloced, and memcpy-ed to a buffer in GC space just before saving it in the new cp_lexer object. So two suggestions for a person who wants to make g++ a little faster here: - make CP_LEXER_BUFFER_SIZE larger. To make it use pages more efficiently, look for some ratio of pagesize/(sizeof (cp_token)) - see buffer in parser.c:cp_lexer_new_main can be moved out of GC space as suggested above.
Subject: Re: memory consumption for heavy template instantiations tripled since 3.3 "steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: [...] | Maybe buffer should not be in GC memory at all? We know the | exact live time of buffer, and as far as I can tell we never | ggc_collect while it is live. According to the comments for | cp_lexer, "Tokens are never added to the cp_lexer after it is | created." So it may be cheaper to have the buffer xmalloced, | and memcpy-ed to a buffer in GC space just before saving it | in the new cp_lexer object. Your analysis makes sense to me. I never quite understood the addiction to GC-allocated memory throughout the compiler. -- Gaby
(In reply to comment #36) > The initial CP lexer bugger size is 10000: The same amount of garbage is also done for PR 8361. Also note I could not compile this source again becuase of the use of long double which causes an ICE for ppc-darwin but that has been fixed already.
I've been looking at a bunch of C++ codes, 160000 or 320000 seems like a reasonable value for CP_LEXER_BUFFER_SIZE.
Trivial 6MB win: Index: parser.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/cp/parser.c,v retrieving revision 1.298 diff -u -r1.298 parser.c --- parser.c 23 Dec 2004 22:07:01 -0000 1.298 +++ parser.c 29 Dec 2004 13:06:30 -0000 @@ -190,7 +190,7 @@ (cp_token *, cp_token *); /* Manifest constants. */ -#define CP_LEXER_BUFFER_SIZE 10000 +#define CP_LEXER_BUFFER_SIZE 160000 #define CP_SAVED_TOKEN_STACK 5 /* A token type for keywords, as opposed to ordinary identifiers. */ This does not fix the underlying problem that the buffer resizing in GC space gives a quadratic behavior in storage allocation, but it avoids it for most files, and it gives a ~2% speedup at -O0 on my box. Stats for cp_lexer_new_main for the test case from PR8361 (-O0): Before: source location Freed Leak Overhead Times cp/parser.c:263 728576: 1.2% 0: 0.0% 204288: 0.5% 1 cp/parser.c:278 45171712:71.7% 0: 0.0% 12665856:29.2% 5 cp/parser.c:253 72: 0.0% 0: 0.0% 8: 0.0% 1 After: source location Freed Leak Overhead Times cp/parser.c:263 11657216:22.4% 0: 0.0% 3268608: 8.1% 1 cp/parser.c:278 23314432:44.8% 0: 0.0% 6537216:16.2% 1 cp/parser.c:253 72: 0.0% 0: 0.0% 8: 0.0% 1 Perhaps we should look for an altogether different data structure for the token buffer - some kind of vector of smaller buffers perhaps.
cp/tree.c:827 (ovl_cons) 11464712: 3.2% 0: 0.0% 660240: 1.4% 1732136: 5.2% 433034 Hmm OVERLOAD tree takes 3% of the Garbage which seems like too big, though I don't know how big long the OVERLOAD trees are, I might add something to count that.
Subject: Re: memory consumption for heavy template instantiations tripled since 3.3 "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: | cp/tree.c:827 (ovl_cons) 11464712: 3.2% 0: 0.0% 660240: 1.4% 1732136: | 5.2% 433034 | | Hmm OVERLOAD tree takes 3% of the Garbage which seems like too big, | though I don't know how big | long the OVERLOAD trees are, I might add something to count that. It is not uncommon to have large overload sets in C++ -- that is what people do when they discover that they can overload in the literal sense ;-) -- Gaby
Mainline with release checking uses 520MB ram again on the testcase with -O0 on x86_64 and 650MB with -O2. time-report with -O2 shows df live regs : 4.84 ( 5%) usr 0.06 ( 1%) sys 4.76 ( 5%) wall 0 kB ( 0%) ggc parser : 3.97 ( 4%) usr 0.49 (10%) sys 4.82 ( 5%) wall 246991 kB (15%) ggc expand : 7.94 ( 8%) usr 0.22 ( 4%) sys 8.30 ( 8%) wall 169886 kB (10%) ggc CSE : 3.72 ( 4%) usr 0.03 ( 1%) sys 3.94 ( 4%) wall 5135 kB ( 0%) ggc global alloc : 4.91 ( 5%) usr 0.07 ( 1%) sys 5.02 ( 5%) wall 43268 kB ( 3%) ggc scheduling 2 : 4.00 ( 4%) usr 0.08 ( 2%) sys 4.07 ( 4%) wall 3173 kB ( 0%) ggc TOTAL : 95.64 5.07 101.29 1657755 kB that is, nothing really outstanding.
Created attachment 14232 [details] unincluded testcase
Memory footprint in TOP is about 430MB (64bit machine). On current mainline we need 191MB before IPA. Top consumers cfg.c:226 (connect_dest) 598696: 0.2% 180224: 0.5% 3484960: 1.8% 594504: 1.5% 73663 gimple-low.c:806 (record_vars_into) 0: 0.0% 0: 0.0% 3825552: 2.0% 0: 0.0% 79699 cp/pt.c:8316 (tsubst_decl) 2244888: 0.9% 0: 0.0% 4552704: 2.4% 357768: 0.9% 44721 tree.c:6061 (build_method_type_directly) 1946600: 0.8% 0: 0.0% 4703200: 2.5% 265992: 0.7% 33249 tree-inline.c:3589 (copy_tree_r) 9450136: 3.6% 0: 0.0% 4820840: 2.5% 1248128: 3.2% 187483 cfg.c:142 (alloc_block) 1046016: 0.4% 0: 0.0% 4988448: 2.6% 0: 0.0% 62859 cgraph.c:638 (cgraph_create_edge) 0: 0.0% 0: 0.0% 5183328: 2.7% 0: 0.0% 53993 gimplify.c:4314 (gimplify_modify_expr) 1185040: 0.5% 0: 0.0% 5570160: 2.9% 304112: 0.8% 57599 gimple-iterator.c:446 (gsi_insert_after_without_ 4904480: 1.9% 0: 0.0% 5843840: 3.1% 2149664: 5.5% 268708 cfg.c:280 (unchecked_make_edge) 0: 0.0% 783288: 2.2% 5930352: 3.1% 745960: 1.9% 93245 gimple.c:287 (gimple_build_call_1) 871144: 0.3% 0: 0.0% 6066056: 3.2% 247408: 0.6% 51874 tree.c:962 (build_int_cst_wide) 6096: 0.0% 0: 0.0% 9716432: 5.1% 3187680: 8.1% 2221 gimplify.c:521 (create_tmp_var_raw) 452760: 0.2% 0: 0.0% 10597944: 5.5% 526224: 1.3% 65778 cp/lex.c:590 (copy_decl) 26304: 0.0% 0: 0.0% 13586520: 7.1% 1326296: 3.4% 56894 Total 258936448 34882576 191255621 39440157 5928571 source location Garbage Freed Leak Overhead Times Apparently largest are the gimple temporaries after IPA: cp/lex.c:573 (cxx_dup_lang_specific_decl) 384: 0.0% 896: 0.0% 2770736: 0.9% 2992: 0.0% 43453 cp/lex.c:510 (build_lang_decl) 805432: 0.2% 209648: 0.2% 3196488: 1.1% 349552: 0.5% 18896 stringpool.c:74 (alloc_node) 1994400: 0.4% 0: 0.0% 3287712: 1.1% 0: 0.0% 55022 cfg.c:142 (alloc_block) 10005792: 2.1% 0: 0.0% 3966048: 1.4% 0: 0.0% 145540 cfg.c:280 (unchecked_make_edge) 4507272: 0.9% 7134984: 5.8% 4456296: 1.5% 1788728: 2.5% 223591 cgraph.c:408 (cgraph_create_node) 7802208: 1.6% 0: 0.0% 4477248: 1.5% 1364384: 1.9% 42637 cp/pt.c:8316 (tsubst_decl) 2244888: 0.5% 0: 0.0% 4552704: 1.6% 357768: 0.5% 44721 tree.c:6061 (build_method_type_directly) 1946600: 0.4% 0: 0.0% 4703200: 1.6% 265992: 0.4% 33249 cgraph.c:638 (cgraph_create_edge) 17196288: 3.5% 0: 0.0% 5254848: 1.8% 0: 0.0% 233866 tree-inline.c:4045 (copy_decl_to_var) 145488: 0.0% 0: 0.0% 5593392: 1.9% 273280: 0.4% 34160 gimple-iterator.c:446 (gsi_insert_after_without_ 14342320: 2.9% 0: 0.0% 5650120: 1.9% 3998488: 5.7% 499811 ggc-common.c:187 (ggc_calloc) 16950080: 3.5% 3025816: 2.4% 6151656: 2.1% 459072: 0.6% 69247 tree-ssanames.c:141 (make_ssa_name_fn) 16930080: 3.5% 0: 0.0% 8363760: 2.9% 1686256: 2.4% 210782 gimplify.c:521 (create_tmp_var_raw) 5453784: 1.1% 0: 0.0% 9020256: 3.1% 689240: 1.0% 86155 tree.c:962 (build_int_cst_wide) 6096: 0.0% 0: 0.0% 10131688: 3.5% 3323928: 4.7% 2299 tree-inline.c:3589 (copy_tree_r) 49631032:10.2% 0: 0.0% 12553384: 4.3% 5797568: 8.2% 800223 tree-dfa.c:150 (create_var_ann) 0: 0.0% 27303320:22.0% 12672616: 4.3% 3634176: 5.1% 454272 gimple.c:2106 (gimple_copy) 11226992: 2.3% 0: 0.0% 13146032: 4.5% 1196784: 1.7% 209491 cp/lex.c:590 (copy_decl) 64104: 0.0% 0: 0.0% 13548720: 4.6% 1326296: 1.9% 56894 tree-inline.c:484 (remap_block) 1928264: 0.4% 0: 0.0% 14843088: 5.1% 1290104: 1.8% 161263 tree-ssa-operands.c:499 (ssa_operand_alloc) 0: 0.0% 34199342:27.6% 18090837: 6.2% 3566211: 5.0% 11251 tree-inline.c:4088 (copy_decl_no_change) 11756840: 2.4% 0: 0.0% 40988416:14.0% 2425144: 3.4% 317455 Total 487237966 123888014 293018044 70674672 9669114 source location Garbage Freed Leak Overhead Times so debug info and declarations are quite top. This is with my DECL_INGORED_P fix I plan to commit to mainline soon. 5MB are also bitmaps tree-ssa-operands.c:2381 (add_to_addressa 73585 9052240 5946000 4181320 173579 I suspect most of the rest are operand caches, since they are so ineffective for small functions. at end of compilation: tree-inline.c:484 (remap_block) 29218176: 2.1% 0: 0.0% 104: 0.0% 2247560: 1.3% 280945 cselib.c:1155 (cselib_subst_to_values) 31320504: 2.3% 0: 0.0% 0: 0.0% 5958648: 3.4% 838942 cp/call.c:2346 (add_template_candidate_real) 31457040: 2.3% 0: 0.0% 0: 0.0% 3096816: 1.8% 457682 gimple-iterator.c:446 (gsi_insert_after_without_ 32515440: 2.3% 0: 0.0% 0: 0.0% 6503088: 3.7% 812886 tree-phinodes.c:157 (allocate_phi_node) 33375352: 2.4% 0: 0.0% 0: 0.0% 1120888: 0.6% 108792 ggc-common.c:187 (ggc_calloc) 34614992: 2.5% 9072016: 2.6% 1895328: 2.0% 671680: 0.4% 102129 rtl.c:269 (copy_rtx) 42322896: 3.1% 0: 0.0% 0: 0.0% 8318000: 4.7% 1083689 emit-rtl.c:3348 (make_insn_raw) 42838312: 3.1% 0: 0.0% 88: 0.0% 3894400: 2.2% 486800 gimple.c:2106 (gimple_copy) 43173352: 3.1% 0: 0.0% 0: 0.0% 2063688: 1.2% 368502 tree-ssanames.c:141 (make_ssa_name_fn) 73506000: 5.3% 0: 0.0% 26640: 0.0% 4902176: 2.8% 612772 tree-inline.c:4088 (copy_decl_no_change) 93714848: 6.8% 0: 0.0% 176464: 0.2% 4370928: 2.5% 562896 tree-inline.c:3589 (copy_tree_r) 98165464: 7.1% 0: 0.0% 2352: 0.0% 9178696: 5.2% 1250363 Total 1385145407 354509964 93434594 175468533 23822336 source location Garbage Freed Leak Overhead Times positive thing is that there are no leaked gimple statements at all. Most of alocation at the end is: cp/lex.c:590 (copy_decl) 1532928: 0.1% 0: 0.0% 12079896:12.9% 1326296: 0.8% 56894 tree.c:962 (build_int_cst_wide) 6096: 0.0% 0: 0.0% 10359544:11.1% 3388072: 1.9% 3011 tree.c:6061 (build_method_type_directly) 1947800: 0.1% 0: 0.0% 4703200: 5.0% 266040: 0.2% 33255 cp/pt.c:8316 (tsubst_decl) 2244888: 0.2% 0: 0.0% 4552704: 4.9% 357768: 0.2% 44721 DF and PRE allocate some giant bitmaps: df-problems.c:308 (df_rd_alloc) 145581 12612800 11870840 11870840 597073 df-problems.c:309 (df_rd_alloc) 145581 8655600 8293080 8293080 108099 df-problems.c:310 (df_rd_alloc) 145581 15585520 14869800 14869800 1391724 tree-ssa-pre.c:584 (bitmap_set_new) 987262 68922080 53349440 53349440 2631124 tree-ssa-pre.c:585 (bitmap_set_new) 987262 69386800 53918200 53918200 3978100 df-problems.c:311 (df_rd_alloc) 145581 74605440 73361600 73361600 0 df-problems.c:539 (df_rd_transfer_functio 100011 63125520 42433280 42433280 148378 My guess is that ssa-operands can be easiest to track if I am right about their memory usage. Honza
So with brand new tuplified world, we need new statistics ;) After parsing we are still the same: cfg.c:216 (connect_src) 608608: 0.2% 520: 0.0% 3028808: 1.6% 519680: 1.3% 64954 cp/lex.c:511 (build_lang_decl) 805432: 0.3% 209648: 0.6% 3196488: 1.7% 349552: 0.9% 18896 stringpool.c:73 (alloc_node) 65088: 0.0% 0: 0.0% 3208992: 1.7% 0: 0.0% 34105 fold-const.c:7969 (build_fold_addr_expr_with_typ 530352: 0.2% 0: 0.0% 3440880: 1.8% 441248: 1.1% 55156 cfg.c:226 (connect_dest) 598696: 0.2% 180224: 0.5% 3484960: 1.8% 594504: 1.5% 73663 cgraph.c:432 (cgraph_create_node) 0: 0.0% 0: 0.0% 3712320: 1.9% 412480: 1.1% 12890 gimple-low.c:888 (record_vars_into) 0: 0.0% 0: 0.0% 3808032: 2.0% 0: 0.0% 79334 cp/pt.c:8398 (tsubst_decl) 2244888: 0.9% 0: 0.0% 4552704: 2.4% 357768: 0.9% 44721 tree.c:6101 (build_method_type_directly) 1946800: 0.8% 0: 0.0% 4704000: 2.5% 266032: 0.7% 33254 tree-inline.c:3595 (copy_tree_r) 9428112: 3.7% 0: 0.0% 4793248: 2.5% 1243776: 3.2% 186815 cfg.c:142 (alloc_block) 1046016: 0.4% 0: 0.0% 4988448: 2.6% 0: 0.0% 62859 cgraph.c:681 (cgraph_create_edge) 0: 0.0% 0: 0.0% 5183328: 2.7% 0: 0.0% 53993 cfg.c:280 (unchecked_make_edge) 0: 0.0% 696256: 1.9% 5271424: 2.8% 0: 0.0% 93245 gimplify.c:4295 (gimplify_modify_expr) 1183600: 0.5% 0: 0.0% 5519400: 2.9% 300632: 0.8% 57164 gimple-iterator.c:446 (gsi_insert_after_without_ 4903960: 1.9% 0: 0.0% 5826440: 3.1% 2146080: 5.5% 268260 gimple.c:287 (gimple_build_call_1) 871144: 0.3% 0: 0.0% 6066056: 3.2% 247408: 0.6% 51874 tree.c:964 (build_int_cst_wide) 6096: 0.0% 0: 0.0% 10089256: 5.3% 3310168: 8.6% 2292 gimplify.c:522 (create_tmp_var_raw) 453768: 0.2% 0: 0.0% 10537632: 5.5% 523400: 1.4% 65425 cp/lex.c:591 (copy_decl) 26304: 0.0% 0: 0.0% 13586520: 7.1% 1326296: 3.4% 56894 Total 254844605 37217584 190405469 38695562 5886026 source location Garbage Freed Leak Overhead Times After early optimizations: tree-inline.c:4051 (copy_decl_to_var) 145488: 0.0% 0: 0.0% 5595072: 2.0% 273360: 0.4% 34170 gimple-iterator.c:446 (gsi_insert_after_without_ 14165480: 3.2% 0: 0.0% 5633680: 2.0% 3959832: 5.9% 494979 ggc-common.c:187 (ggc_calloc) 565872: 0.1% 17553976:12.6% 5692736: 2.1% 423928: 0.6% 67925 cgraph.c:681 (cgraph_create_edge) 0: 0.0% 0: 0.0% 5958624: 2.2% 0: 0.0% 62069 tree-ssanames.c:141 (make_ssa_name_fn) 16722480: 3.8% 0: 0.0% 8318040: 3.0% 1669368: 2.5% 208671 gimplify.c:522 (create_tmp_var_raw) 5067048: 1.2% 0: 0.0% 8897280: 3.2% 664968: 1.0% 83121 tree.c:964 (build_int_cst_wide) 6096: 0.0% 0: 0.0% 10327480: 3.8% 3388168: 5.1% 2341 tree-dfa.c:150 (create_var_ann) 0: 0.0% 23279784:16.7% 10578392: 3.8% 3078016: 4.6% 384752 tree-inline.c:3595 (copy_tree_r) 49340008:11.2% 0: 0.0% 12558984: 4.6% 5762144: 8.6% 796535 gimple.c:2071 (gimple_copy) 11219128: 2.6% 0: 0.0% 13100280: 4.8% 1193408: 1.8% 209046 cp/lex.c:591 (copy_decl) 62424: 0.0% 0: 0.0% 13550400: 4.9% 1326296: 2.0% 56894 tree-inline.c:484 (remap_block) 1924416: 0.4% 0: 0.0% 14794832: 5.4% 1286096: 1.9% 160762 tree-ssa-operands.c:499 (ssa_operand_alloc) 0: 0.0% 33737566:24.3% 17976728: 6.5% 3547382: 5.3% 11222 tree-inline.c:4094 (copy_decl_no_change) 12372728: 2.8% 0: 0.0% 29185744:10.6% 1892440: 2.8% 250867 Total 439703478 139072686 274931123 66651575 9262384 source location Garbage Freed Leak Overhead Times Declarations and debug info being major consumer. We improved 293->274 and final compilation: cp/call.c:2348 (add_template_candidate_real) 31457040: 2.5% 0: 0.0% 0: 0.0% 3096816: 1.8% 457682 gimple-iterator.c:446 (gsi_insert_after_without_ 32206200: 2.5% 0: 0.0% 0: 0.0% 6441240: 3.8% 805155 tree-phinodes.c:157 (allocate_phi_node) 33346952: 2.6% 0: 0.0% 0: 0.0% 1121800: 0.7% 108439 rtl.c:269 (copy_rtx) 41145992: 3.2% 0: 0.0% 0: 0.0% 8084728: 4.7% 1053375 emit-rtl.c:3502 (make_insn_raw) 41827016: 3.3% 0: 0.0% 88: 0.0% 3802464: 2.2% 475308 gimple.c:2071 (gimple_copy) 43033384: 3.4% 0: 0.0% 0: 0.0% 2056248: 1.2% 367296 tree-ssanames.c:141 (make_ssa_name_fn) 72524280: 5.7% 0: 0.0% 148440: 0.1% 4844848: 2.8% 605606 tree-inline.c:4094 (copy_decl_no_change) 74338752: 5.8% 0: 0.0% 226240: 0.2% 3451120: 2.0% 447839 tree-inline.c:3595 (copy_tree_r) 97555488: 7.7% 0: 0.0% 2792: 0.0% 9116248: 5.4% 1242196 Total 1271569773 404722617 103497642 170223016 23365837 source location Garbage Freed Leak Overhead Times 1.38GB to 1.27GB... so not much change, but some progress ;)
Honza, if you have some time, it'd be interested to see where things stand today.
I've added the testcase to http://gcc.opensuse.org/c++bench/random/
Current -fmem-report (r262156, checking build): GGC memory Garbage Freed Leak Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- [trim too-long comment] cp/pt.c:8348 (coerce_template_parms) 6817576: 2.4% 0: 0.0% 937528: 0.3% 0: 0.0% 202820 cp/parser.c:651 (cp_lexer_new_main) 0: 0.0% 7340056: 2.6% 8388616: 2.8% 32: 0.0% 4 emit-rtl.c:4116 (make_note_raw) 8686160: 3.0% 0: 0.0% 8736: 0.0% 0: 0.0% 155266 dwarf2cfi.c:437 (copy_cfi_row) 9461232: 3.3% 0: 0.0% 0: 0.0% 0: 0.0% 65703 emit-rtl.c:384 (set_mem_attrs) 9613240: 3.3% 0: 0.0% 49440: 0.0% 0: 0.0% 241567 cfg.c:125 (alloc_block) 3142880: 1.1% 0: 0.0% 7364448: 2.5% 0: 0.0% 101032 cp/pt.c:12459 (tsubst_template_args) 9494008: 3.3% 0: 0.0% 1141672: 0.4% 0: 0.0% 277002 tree-ssanames.c:83 (init_ssanames) 0: 0.0% 125952: 0.0% 11877888: 4.0% 4001280: 22.2% 7815 emit-rtl.c:845 (gen_rtx_MEM) 12411408: 4.3% 0: 0.0% 270192: 0.1% 0: 0.0% 528400 toplev.c:918 (realloc_for_line_map) 0: 0.0% 625728: 0.2% 12963872: 4.4% 2319240: 12.9% 14979 tree-ssanames.c:295 (make_ssa_name_fn) 959544: 0.3% 0: 0.0% 12208680: 4.1% 0: 0.0% 182892 emit-rtl.c:473 (gen_raw_REG) 14950248: 5.2% 24: 0.0% 30624: 0.0% 0: 0.0% 624204 emit-rtl.c:488 (gen_rtx_EXPR_LIST) 15043416: 5.2% 0: 0.0% 14112: 0.0% 0: 0.0% 627397 emit-rtl.c:5846 (init_emit) 18473000: 6.4% 1757600: 0.6% 2600: 0.0% 4295664: 23.8% 7782 tree-ssa-operands.c:265 (ssa_operand_alloc) 0: 0.0% 1232896: 0.4% 20814848: 7.1% 0: 0.0% 9657 emit-rtl.c:4021 (make_insn_raw) 29981632: 10.3% 0: 0.0% 60480: 0.0% 0: 0.0% 469408 -------------------------------------------------------------------------------------------------------------------------------------------- Total 289790159:100.0% 279280216:100.0% 294500915:100.0% 18021250:100.0% 10562404 -------------------------------------------------------------------------------------------------------------------------------------------- 89.26user 0.61system 1:30.15elapsed 99%CPU (0avgtext+0avgdata 773444maxresident)k