Bug 12850 - memory consumption for heavy template instantiations tripled since 3.3
memory consumption for heavy template instantiations tripled since 3.3
Status: NEW
Product: gcc
Classification: Unclassified
Component: c++
3.4.0
: P2 normal
: ---
Assigned To: Not yet assigned to anyone
: compile-time-hog, memory-hog
Depends on: 18683 22635
Blocks: 47344
  Show dependency treegraph
 
Reported: 2003-10-30 23:40 UTC by Jens Maurer
Modified: 2013-03-06 10:53 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.3
Known to fail:
Last reconfirmed: 2008-09-06 14:12:03


Attachments
preprocessed random_test.cpp for g++ 3.3 (147.85 KB, application/x-gunzip)
2003-10-30 23:42 UTC, Jens Maurer
Details
preprocessed random_test.cpp for g++ 3.4 (150.52 KB, application/x-gunzip)
2003-10-30 23:43 UTC, Jens Maurer
Details
broken patch (368 bytes, patch)
2004-01-07 08:06 UTC, Andrew Pinski
Details | Diff
profi (240.33 KB, text/plain)
2004-01-14 14:47 UTC, Jan Hubicka
Details
unincluded testcase (35.56 KB, application/octet-stream)
2007-09-20 14:10 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Maurer 2003-10-30 23:40:03 UTC
When compiling the attached program with g++ 3.3, the compiler takes about 80 MB
of main memory on Intel/x86. When compiling it with g++ 3.4, the compiler takes
> 400 MB and eventually crashes (potentially due to the Linux kernel killing
processes due to out-of-memory).
Since standard libraries are different between 3.3 and 3.4, I provide two
preprocessed files.  (This is boost random number library random_test.cpp.)

g++ -v rt-3.3.ii
[...]
Configured with: ../gcc-3.3/configure --prefix=/usr/local --enable-threads
--enable-shared
Thread model: posix
gcc version 3.3
(ok)

/opt/exp/gcc-3.4/bin/g++ -v rt-3.4.ii
[...]
g++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
Comment 1 Jens Maurer 2003-10-30 23:42:40 UTC
Created attachment 5021 [details]
preprocessed random_test.cpp for g++ 3.3
Comment 2 Jens Maurer 2003-10-30 23:43:24 UTC
Created attachment 5022 [details]
preprocessed random_test.cpp for g++ 3.4
Comment 3 Andrew Pinski 2003-10-31 01:13:03 UTC
My 3.3.1 (20030707) takes about 260M of memory while 3.4 (20031030) takes about 420M of 
memory.
Comment 4 Andrew Pinski 2003-12-19 08:43:59 UTC
It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB.
Comment 5 Andrew Pinski 2004-01-05 06:43:19 UTC
My patches for saving space in C++ help but it does not fix the problem.
Comment 6 Andrew Pinski 2004-01-05 06:58:30 UTC
I think the problem is that 3.4 is not able to collect garbage while instantiating the 
templates.  Calling ggc_collect while instantiating the templates and at the right level, I 
get
{GC 95280k -> 45466k}
which shows that it gets rid of half of the memory but it crashes right after doing that.
Comment 7 Andrew Pinski 2004-01-05 07:29:41 UTC
I think I have a patch for this, I just call ggc_collect in instantiate_decl if it is okay to do so.
Comment 8 Andrew Pinski 2004-01-06 23:18:35 UTC
Mine, I think.
Comment 9 Andrew Pinski 2004-01-07 07:27:55 UTC
the patch which I had in mind did not work, there is too much stored on the stack for this 
to work correctly.
Comment 10 Steven Bosscher 2004-01-07 08:03:00 UTC
interesting... 
Comment 11 Andrew Pinski 2004-01-07 08:06:41 UTC
Created attachment 5425 [details]
broken patch

This is broken but really it is not the patch itself which is broken but rather
the C++ front-end keeps references to variables on the stack/registers without
references in variables seeable by the GC.
Comment 12 Jan Hubicka 2004-01-10 13:36:13 UTC
I am testing patch that peaks at 28MB with unit-at-a-time.  It seems to be
possible to deffer instantiation of all templates to very last pass where we can
ggc collect.
Comment 13 Andrew Pinski 2004-01-10 17:12:25 UTC
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-01/msg00772.html>.
Comment 14 CVS Commits 2004-01-13 23:59:24 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	hubicka@gcc.gnu.org	2004-01-13 23:59:20

Modified files:
	gcc            : ChangeLog cgraphunit.c 
	gcc/cp         : ChangeLog decl2.c optimize.c 

Log message:
	Partial fix PR c++/12850
	* cgraphunit.c (cgraph_finalize_function): Always ggc_collect when
	at zero nest level.
	
	* decl2.c (mark_used): Do not proactively instantiate templates
	when compiling in unit-at-a-time or not optimizing.
	* optimize.c (maybe_clone_body): Do not increase function depth.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2270&r2=2.2271
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&r1=1.44&r2=1.45
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.3875&r2=1.3876
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl2.c.diff?cvsroot=gcc&r1=1.694&r2=1.695
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/optimize.c.diff?cvsroot=gcc&r1=1.102&r2=1.103

Comment 15 CVS Commits 2004-01-14 11:34:45 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	hubicka@gcc.gnu.org	2004-01-14 11:34:38

Modified files:
	gcc/cp         : ChangeLog pt.c 

Log message:
	PR c++/12850
	* pt.c (instantiate_decl):  Do not increase function_depth.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.3879&r2=1.3880
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/pt.c.diff?cvsroot=gcc&r1=1.813&r2=1.814

Comment 16 Jan Hubicka 2004-01-14 11:37:27 UTC
Subject: Re:  [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-12-19 08:43 -------
> It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB.

I can get about 30MB at -O0, for unit-at-a-time we however still needs
250MB, this is the size of all templates instantiated together.  I don't
think we can reduce this for 3.4 further and it is no longer regression,
in the future we may make trees more compact.  This testcase has also
interesting runtime properties, Mark may want to look at the
for_each_template_param_r problem.

Honza
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
Comment 17 Jan Hubicka 2004-01-14 11:38:48 UTC
Subject: Re:  [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-12-19 08:43 -------
> It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB.
Still GGC memory is only about 100MB, so perhaps we have 150MB memory leak in
non-GGC memory reproduced by this.
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
Comment 18 Jan Hubicka 2004-01-14 14:47:21 UTC
Subject: Re:  [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3

For a record, here is profile of the run.  Lots of overhead is comming
from quadratic behaviour in templates and frineds.
Honza
Comment 19 Jan Hubicka 2004-01-14 14:47:25 UTC
Created attachment 5479 [details]
profi
Comment 20 Andrew Pinski 2004-01-14 18:12:12 UTC
I will take a look for the leak but most likely it is not really a leak.
Comment 21 Andrew Pinski 2004-01-27 16:35:17 UTC
The only memory leak I had was from shorten_branches in final.c which I have a fix for 
now but that does account for the 60M difference between GC and real allocated 
memory (even though I suspect there are large amounts of pages still allocated because 
the GC is spread all over them).  Also malloc only accounts for 20M.
Comment 22 Jan Hubicka 2004-01-27 16:37:43 UTC
Subject: Re:  [3.4/3.5 Regression] memory consumption for heavy template instantiations tripled since 3.3

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-27 16:35 -------
> The only memory leak I had was from shorten_branches in final.c which I have a fix for 
> now but that does account for the 60M difference between GC and real allocated 
> memory (even though I suspect there are large amounts of pages still allocated because 
> the GC is spread all over them).  Also malloc only accounts for 20M.

I have additional patches in testing cutting this into roughtly 118MB,
still there is room for improvement as really we shall be decreasing
amount of memory during the compilation stage that we don't (the parsed
program after template instantiation is slightly over 60MB of GGC memory)
We also burn a lot of unnecesary memory in C++ parser during name
lookup, I am probably not going to address this as I simply don't
understand the issue at all.

Honza
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
Comment 23 CVS Commits 2004-01-29 00:34:15 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	hubicka@gcc.gnu.org	2004-01-29 00:34:09

Modified files:
	gcc            : ChangeLog cgraph.c cgraphunit.c tree-optimize.c 

Log message:
	PR c++/12850
	* cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and
	initial pointers.
	* cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS
	for functions that will be only inlined.
	(cgraph_mark_function_to_output): Likewise.
	(cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear;
	do not clear function body.
	* tree-optimize.c (clear_decl_rtl): Use decl_function_context.
	(tree_rest_of_compilation): Reorganize the logic releasing function
	body to use callgraph datastructure.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2535&r2=2.2536
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&r1=1.42&r2=1.43
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&r1=1.48&r2=1.49
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&r1=2.8&r2=2.9

Comment 24 CVS Commits 2004-01-30 11:46:33 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	hubicka@gcc.gnu.org	2004-01-30 11:46:28

Modified files:
	gcc            : ChangeLog cgraph.c cgraphunit.c tree-optimize.c 

Log message:
	PR c++/12850
	* cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and
	initial pointers.
	* cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS
	for functions that will be only inlined.
	(cgraph_mark_function_to_output): Likewise.
	(cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear;
	do not clear function body.
	* tree-optimize.c (clear_decl_rtl): Use decl_function_context.
	(tree_rest_of_compilation): Reorganize the logic releasing function
	body to use callgraph datastructure.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.110&r2=2.2326.2.111
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.1&r2=1.41.2.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.1&r2=1.46.2.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8&r2=2.8.8.1

Comment 25 CVS Commits 2004-01-31 12:01:28 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	hubicka@gcc.gnu.org	2004-01-31 12:01:25

Modified files:
	gcc            : cgraph.c cgraphunit.c tree-optimize.c ChangeLog 

Log message:
	Revert the following patch until after AIX linker bug is fixed:
	PR c++/12850
	* cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and
	initial pointers.
	* cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS
	for functions that will be only inlined.
	(cgraph_mark_function_to_output): Likewise.
	(cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear;
	do not clear function body.
	* tree-optimize.c (clear_decl_rtl): Use decl_function_context.
	(tree_rest_of_compilation): Reorganize the logic releasing function
	body to use callgraph datastructure.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.2&r2=1.41.2.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.2&r2=1.46.2.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8.8.2&r2=2.8.8.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.121&r2=2.2326.2.122

Comment 26 CVS Commits 2004-02-01 13:01:18 UTC
Subject: Bug 12850

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	hubicka@gcc.gnu.org	2004-02-01 13:01:15

Modified files:
	gcc            : ChangeLog cgraph.c cgraphunit.c tree-optimize.c 
	gcc/cp         : ChangeLog semantics.c 

Log message:
	PR c++/12850
	* cgraph.c (cgraph_remove_node): Clear out saved/insns/arguments and
	initial pointers.
	* cgraphunit.c (cgraph_finalize_function): Clear out DECL_SAVED_INSNS
	for functions that will be only inlined.
	(cgraph_mark_function_to_output): Likewise.
	(cgraph_expand_function): Sanity check that DECL_DEFER_OUTPUT is clear;
	do not clear function body.
	* tree-optimize.c (clear_decl_rtl): Use decl_function_context.
	(tree_rest_of_compilation): Reorganize the logic releasing function
	body to use callgraph datastructure.
	
	* semantics.c (expand_body)  Do emit_associated_thunks before
	expansion.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.127&r2=2.2326.2.128
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraph.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.2.3&r2=1.41.2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cgraphunit.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.2.3&r2=1.46.2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.8.8.3&r2=2.8.8.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3892.2.23&r2=1.3892.2.24
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/semantics.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.381.4.4&r2=1.381.4.5

Comment 27 Jan Hubicka 2004-02-14 14:27:07 UTC
Memory usage is now at 104MB that is still more than 3.3 did, but give that this code is almost perfect testcase where unit-at-a-time shall lose, I think score is not bad.
Mark's patches helped a lot to amount of garbage produced by C++ frontend on mainline now (reducing amount of garbage from 2GB to 700MB), but I think we can do significantly better still.
One problem is large consumption of struct function (about 10% of memory surviving from frontend).  Many of these struct functions are for functions that were never cgraph_finalize_function (either templates or unused functions).  I think these should be freed but I don't know how.
Also C++ frontend still produce a lot of gabrage(39MB of 700MB memory is needed) Major producers are:
varray.c:161 (varray_grow)                                        20496    1473380     401588:1.600%     210256:0.532%
cp/call.c:2181 (add_template_candidate_real)                      96047    2339908      63560:2.051%          0:0.000%
cp/name-lookup.c:1719 (set_identifier_type_value_with_scope)     151575    3031500          0:2.587%          0:0.000%
tree.c:3962 (build_method_type_directly)                          24117    2604636     482340:2.634%    1411456:3.574%
cp/lex.c:773 (copy_decl)                                          31558    3408264          0:2.909%    2979720:7.545%
cp/name-lookup.c:2800 (push_class_level_binding)                 170972    3419440          0:2.918%        140:0.000%
tree-inline.c:1970 (copy_tree_r)                                 176657    3606700       4084:3.081%     551820:1.397%
cp/search.c:1200 (build_baselink)                                168114    4034736          0:3.443%        552:0.001%
cp/pt.c:6252 (tsubst_decl)                                        38668    4176144          0:3.564%    2390580:6.053%
cp/name-lookup.c:4720 (store_bindings)                           221688    4433760          0:3.784%          0:0.000%
cp/pt.c:5738 (tsubst_template_args)                              248319    5966576      74480:5.155%     570592:1.445%
function.c:6397 (allocate_struct_function)                         9158    4688896    1978128:5.689%    4087720:10.351%
cp/pt.c:3814 (coerce_template_parms)                             282818    6655016      65496:5.735%      46156:0.117%
tree.c:3908 (build_function_type)                                 57993    6263244    1159860:6.335%     332672:0.842%
(first percentage is garbage allocated, second percentage is amount of memory surviving to cgraph_optimize)
Backend looks better now, produce about 300MB of additional garbage.  About 10-20% can be saved by better aliasing and moving log links into separate structure.  Overall we went from 4GB garbage to 900MB.
I don't have enought knowledge of templates and name lookup to get things significantly better.
Comment 28 Jan Hubicka 2004-02-14 14:38:32 UTC
Subject: Re:  [3.4/3.5 Regression] memory consumption for heavy template instantiations tripled since 3.3

> 
> ------- Additional Comments From hubicka at gcc dot gnu dot org  2004-02-14 14:27 -------
> Memory usage is now at 104MB that is still more than 3.3 did, but give that this code is almost perfect testcase where unit-at-a-time shall lose, I think score is not bad.
> Mark's patches helped a lot to amount of garbage produced by C++ frontend on mainline now (reducing amount of garbage from 2GB to 700MB), but I think we can do significantly better still.
> One problem is large consumption of struct function (about 10% of memory surviving from frontend).  Many of these struct functions are for functions that were never cgraph_finalize_function (either templates or unused functions).  I think these should be freed but I don't know how.
> Also C++ frontend still produce a lot of gabrage(39MB of 700MB memory is needed) Major producers are:
> varray.c:161 (varray_grow)                                        20496    1473380     401588:1.600%     210256:0.532%
> cp/call.c:2181 (add_template_candidate_real)                      96047    2339908      63560:2.051%          0:0.000%
> cp/name-lookup.c:1719 (set_identifier_type_value_with_scope)     151575    3031500          0:2.587%          0:0.000%
> tree.c:3962 (build_method_type_directly)                          24117    2604636     482340:2.634%    1411456:3.574%
> cp/lex.c:773 (copy_decl)                                          31558    3408264          0:2.909%    2979720:7.545%
> cp/name-lookup.c:2800 (push_class_level_binding)                 170972    3419440          0:2.918%        140:0.000%
> tree-inline.c:1970 (copy_tree_r)                                 176657    3606700       4084:3.081%     551820:1.397%
> cp/search.c:1200 (build_baselink)                                168114    4034736          0:3.443%        552:0.001%
> cp/pt.c:6252 (tsubst_decl)                                        38668    4176144          0:3.564%    2390580:6.053%
> cp/name-lookup.c:4720 (store_bindings)                           221688    4433760          0:3.784%          0:0.000%
> cp/pt.c:5738 (tsubst_template_args)                              248319    5966576      74480:5.155%     570592:1.445%
> function.c:6397 (allocate_struct_function)                         9158    4688896    1978128:5.689%    4087720:10.351%
> cp/pt.c:3814 (coerce_template_parms)                             282818    6655016      65496:5.735%      46156:0.117%
> tree.c:3908 (build_function_type)                                 57993    6263244    1159860:6.335%     332672:0.842%
> (first percentage is garbage allocated, second percentage is amount of memory surviving to cgraph_optimize)
> Backend looks better now, produce about 300MB of additional garbage.  About 10-20% can be saved by better aliasing and moving log links into separate structure.  Overall we went from 4GB garbage to 900MB.
> I don't have enought knowledge of templates and name lookup to get things significantly better.

Actually I messed up the numbers.  We produce 1.1GB of garbage in
frontend and 1.2GB in backend.  I have about 30% rediction of backend
memory by mixture of retirincg line number notes, moving log links away
and fixing some of cselib datastructures.

One big problem is that inlined bodies remain reachable somehow.  Partly
it is because of ABSTRACT_ORIGIN pointers after subsequent inlining but
I am not sure what is really causing the rest.

The amount of memory used by trees grows from 39MB to 100MB during
compilation stage.

Honza
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
Comment 29 Mark Mitchell 2004-03-21 18:58:28 UTC
There's nothing more to be fixed here for 3.4.x, so I've retargeted this at 3.5.
Comment 30 Jan Hubicka 2004-05-05 23:31:07 UTC
We no longer have major memory consumption regression here.  I don't want to 
see it red ;) 
Comment 31 Andrew Pinski 2004-09-23 18:17:35 UTC
could someone test this again (I think Jan's memory tester has the numbers for the mainline but I could 
be wrong).
Comment 32 Andrew Pinski 2004-09-27 04:02:26 UTC
Removing the patch keyword since all the patches referenced here have been applied.
Comment 33 Andrew Pinski 2004-12-03 03:02:50 UTC
I should note that 4.0.0 is like 3x faster than 3.3.2 at -O1 on this test.
Comment 34 Andrew Pinski 2004-12-11 17:42:59 UTC
On the mainline at -O1 (since I cannot compile at -O0 but that is a different bug which I already filed):
cp/lex.c:716 (copy_decl)                             910284: 0.1%          0: 0.0%    6083812:11.3%          0: 0.0%      
56404
ggc-common.c:193 (ggc_calloc)                       3299632: 0.4%   11884228: 2.5%    1207832: 2.2%    
2736868: 1.6%      22853
tree.c:4530 (build_method_type_directly)            1173592: 0.1%          0: 0.0%    2903048: 5.4%     
750960: 0.4%      26820
tree.c:4266 (build_reference_type_for_mode)             456: 0.0%          0: 0.0%     602376: 1.1%     
111048: 0.1%       3966
cp/class.c:2455 (maybe_add_class_template_decl_l          0: 0.0%          0: 0.0%    1061232: 2.0%          
0: 0.0%      44218
tree.c:472 (copy_list)                                 2232: 0.0%          0: 0.0%    1083164: 2.0%          0: 0.0%       
8854
Those are ones which leak still.
Comment 35 Andrew Pinski 2004-12-21 05:53:35 UTC
Here are the results for -O0, now that PR 18683 is now fixed:
cp/lex.c:716 (copy_decl)                            1087604: 0.3%          0: 0.0%    5906492:10.6%          0: 0.0%      
56404
cp/pt.c:3978 (coerce_template_parms)               41586524: 9.6%          0: 0.0%     136540: 0.2%    
3865680: 7.4%    1138236


Though we do create a lot:
cp/parser.c:278 (cp_lexer_new_main)                       0: 0.0%   22585856:36.1%          0: 0.0%    
6332928:12.1%          5

Which is mostly a ggc_realloc of a buffer of all the tokens, maybe there is a better way of allocating this 
buffer as it seems like we create a lot of overhead because ot it.
Comment 36 Steven Bosscher 2004-12-23 12:19:57 UTC
The initial CP lexer bugger size is 10000:

#define CP_LEXER_BUFFER_SIZE 10000

That came in with the lex-all-ahead patch from Matt and Zack,
on 2004-09-20 (parser.c rev. 1.250 for the CVS history diggers)
but it seems a bit low to me if you're going to lex the whole
file up front.  I would not be surprised if the average C++
code with lots of templates has several 100,000 tokens...  Let
me see:
- preprocessed sources for generate.ii from PR8361, blank and
  pound lines stripped:  36200 lines
- an average of 7 tokens per line in the first 500 lines, let's
  assume that's a reasonable average for the whole file (it's
  easy to instrument g++ to get the exact number of tokens, if
  you want more accurate numbers ;-)

That makes it >250,000 tokens for this file.

Since we double the buffer, we have:
10,000 + 20,000 + 40,000 + 80,000 + 160,000 + 320,000 = 630000

That is the number of tokens we have allocate room for, with no
ggc-collect in the middle.  With ggc-page, which has power-of-2
based page sizes, it's safe to assume that each previous buffer
is too small to be reallocated, so a full new buffer is allocated
and the old one is memcpy-ed to the new one.  With checking off,
we ggc_free the old buffer, but with checking enabled we don't
so after finishing the whole lexing process, we have keep around
a buffer of ~380,000*sizeof(cp_token), so that's roughly 10MB
of memory we can't reclaim until the first ggc_collect call.

Maybe buffer should not be in GC memory at all?  We know the
exact live time of buffer, and as far as I can tell we never
ggc_collect while it is live.  According to the comments for
cp_lexer, "Tokens are never added to the cp_lexer after it is
created."  So it may be cheaper to have the buffer xmalloced,
and memcpy-ed to a buffer in GC space just before saving it
in the new cp_lexer object.

So two suggestions for a person who wants to make g++ a little
faster here:
- make CP_LEXER_BUFFER_SIZE larger.  To make it use pages more
  efficiently, look for some ratio of pagesize/(sizeof (cp_token))
- see buffer in parser.c:cp_lexer_new_main can be moved out of GC
  space as suggested above.

Comment 37 Gabriel Dos Reis 2004-12-23 14:42:36 UTC
Subject: Re:  memory consumption for heavy template instantiations tripled since 3.3

"steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

[...]

| Maybe buffer should not be in GC memory at all?  We know the
| exact live time of buffer, and as far as I can tell we never
| ggc_collect while it is live.  According to the comments for
| cp_lexer, "Tokens are never added to the cp_lexer after it is
| created."  So it may be cheaper to have the buffer xmalloced,
| and memcpy-ed to a buffer in GC space just before saving it
| in the new cp_lexer object.

Your analysis makes sense to me.  I never quite understood the
addiction to GC-allocated memory throughout the compiler.

-- Gaby
Comment 38 Andrew Pinski 2004-12-28 16:43:25 UTC
(In reply to comment #36)
> The initial CP lexer bugger size is 10000:

The same amount of garbage is also done for PR 8361.


Also note I could not compile this source again becuase of the use of long double which causes an ICE 
for ppc-darwin but that has been fixed already.
Comment 39 Steven Bosscher 2004-12-29 12:57:42 UTC
I've been looking at a bunch of C++ codes, 160000 or 320000 seems like 
a reasonable value for CP_LEXER_BUFFER_SIZE. 
 
Comment 40 Steven Bosscher 2004-12-29 13:09:46 UTC
Trivial 6MB win: 
 
Index: parser.c 
=================================================================== 
RCS file: /cvs/gcc/gcc/gcc/cp/parser.c,v 
retrieving revision 1.298 
diff -u -r1.298 parser.c 
--- parser.c    23 Dec 2004 22:07:01 -0000      1.298 
+++ parser.c    29 Dec 2004 13:06:30 -0000 
@@ -190,7 +190,7 @@ 
   (cp_token *, cp_token *); 
 
 /* Manifest constants.  */ 
-#define CP_LEXER_BUFFER_SIZE 10000 
+#define CP_LEXER_BUFFER_SIZE 160000 
 #define CP_SAVED_TOKEN_STACK 5 
 
 /* A token type for keywords, as opposed to ordinary identifiers.  */ 
 
This does not fix the underlying problem that the buffer resizing in GC 
space gives a quadratic behavior in storage allocation, but it avoids it 
for most files, and it gives a ~2% speedup at -O0 on my box. 
 
Stats for cp_lexer_new_main for the test case from PR8361 (-O0): 
 
Before: 
source location     Freed        Leak         Overhead       Times 
cp/parser.c:263    728576: 1.2%     0: 0.0%     204288: 0.5%     1 
cp/parser.c:278  45171712:71.7%     0: 0.0%   12665856:29.2%     5 
cp/parser.c:253        72: 0.0%     0: 0.0%          8: 0.0%     1 
 
After: 
source location     Freed        Leak         Overhead       Times 
cp/parser.c:263  11657216:22.4%     0: 0.0%    3268608: 8.1%     1 
cp/parser.c:278  23314432:44.8%     0: 0.0%    6537216:16.2%     1 
cp/parser.c:253        72: 0.0%     0: 0.0%          8: 0.0%     1 
 
Perhaps we should look for an altogether different data structure for the 
token buffer - some kind of vector of smaller buffers perhaps. 
 
Comment 41 Andrew Pinski 2005-07-23 22:14:51 UTC
cp/tree.c:827 (ovl_cons)                           11464712: 3.2%          0: 0.0%     660240: 1.4%    1732136: 
5.2%     433034

Hmm OVERLOAD tree takes 3% of the Garbage which seems like too big, though I don't know how big 
long the OVERLOAD trees are, I might add something to count that.
Comment 42 Gabriel Dos Reis 2005-07-24 03:36:20 UTC
Subject: Re:  memory consumption for heavy template instantiations tripled since 3.3

"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:


| cp/tree.c:827 (ovl_cons)                           11464712: 3.2%          0: 0.0%     660240: 1.4%    1732136: 
| 5.2%     433034
| 
| Hmm OVERLOAD tree takes 3% of the Garbage which seems like too big,
| though I don't know how big  
| long the OVERLOAD trees are, I might add something to count that.

It is not uncommon to have large overload sets in C++ -- that is what
people do when they discover that they can overload in the literal
sense ;-)

-- Gaby
Comment 43 Richard Biener 2007-09-20 14:09:05 UTC
Mainline with release checking uses 520MB ram again on the testcase with -O0
on x86_64 and 650MB with -O2.  time-report with -O2 shows

 df live regs          :   4.84 ( 5%) usr   0.06 ( 1%) sys   4.76 ( 5%) wall       0 kB ( 0%) ggc
 parser                :   3.97 ( 4%) usr   0.49 (10%) sys   4.82 ( 5%) wall  246991 kB (15%) ggc
 expand                :   7.94 ( 8%) usr   0.22 ( 4%) sys   8.30 ( 8%) wall  169886 kB (10%) ggc
 CSE                   :   3.72 ( 4%) usr   0.03 ( 1%) sys   3.94 ( 4%) wall    5135 kB ( 0%) ggc
 global alloc          :   4.91 ( 5%) usr   0.07 ( 1%) sys   5.02 ( 5%) wall   43268 kB ( 3%) ggc
 scheduling 2          :   4.00 ( 4%) usr   0.08 ( 2%) sys   4.07 ( 4%) wall    3173 kB ( 0%) ggc
 TOTAL                 :  95.64             5.07           101.29            1657755 kB

that is, nothing really outstanding.
Comment 44 Richard Biener 2007-09-20 14:10:29 UTC
Created attachment 14232 [details]
unincluded testcase
Comment 45 Jan Hubicka 2008-09-06 14:12:01 UTC
Memory footprint in TOP is about 430MB (64bit machine).

On current mainline we need 191MB before IPA. Top consumers
cfg.c:226 (connect_dest)                             598696: 0.2%     180224: 0.5%    3484960: 1.8%     594504: 1.5%      73663
gimple-low.c:806 (record_vars_into)                       0: 0.0%          0: 0.0%    3825552: 2.0%          0: 0.0%      79699
cp/pt.c:8316 (tsubst_decl)                          2244888: 0.9%          0: 0.0%    4552704: 2.4%     357768: 0.9%      44721
tree.c:6061 (build_method_type_directly)            1946600: 0.8%          0: 0.0%    4703200: 2.5%     265992: 0.7%      33249
tree-inline.c:3589 (copy_tree_r)                    9450136: 3.6%          0: 0.0%    4820840: 2.5%    1248128: 3.2%     187483
cfg.c:142 (alloc_block)                             1046016: 0.4%          0: 0.0%    4988448: 2.6%          0: 0.0%      62859
cgraph.c:638 (cgraph_create_edge)                         0: 0.0%          0: 0.0%    5183328: 2.7%          0: 0.0%      53993
gimplify.c:4314 (gimplify_modify_expr)              1185040: 0.5%          0: 0.0%    5570160: 2.9%     304112: 0.8%      57599
gimple-iterator.c:446 (gsi_insert_after_without_    4904480: 1.9%          0: 0.0%    5843840: 3.1%    2149664: 5.5%     268708
cfg.c:280 (unchecked_make_edge)                           0: 0.0%     783288: 2.2%    5930352: 3.1%     745960: 1.9%      93245
gimple.c:287 (gimple_build_call_1)                   871144: 0.3%          0: 0.0%    6066056: 3.2%     247408: 0.6%      51874
tree.c:962 (build_int_cst_wide)                        6096: 0.0%          0: 0.0%    9716432: 5.1%    3187680: 8.1%       2221
gimplify.c:521 (create_tmp_var_raw)                  452760: 0.2%          0: 0.0%   10597944: 5.5%     526224: 1.3%      65778
cp/lex.c:590 (copy_decl)                              26304: 0.0%          0: 0.0%   13586520: 7.1%    1326296: 3.4%      56894
Total                                             258936448         34882576        191255621         39440157          5928571
source location                                     Garbage            Freed             Leak         Overhead            Times

Apparently largest are the gimple temporaries
after IPA:

cp/lex.c:573 (cxx_dup_lang_specific_decl)               384: 0.0%        896: 0.0%    2770736: 0.9%       2992: 0.0%      43453
cp/lex.c:510 (build_lang_decl)                       805432: 0.2%     209648: 0.2%    3196488: 1.1%     349552: 0.5%      18896
stringpool.c:74 (alloc_node)                        1994400: 0.4%          0: 0.0%    3287712: 1.1%          0: 0.0%      55022
cfg.c:142 (alloc_block)                            10005792: 2.1%          0: 0.0%    3966048: 1.4%          0: 0.0%     145540
cfg.c:280 (unchecked_make_edge)                     4507272: 0.9%    7134984: 5.8%    4456296: 1.5%    1788728: 2.5%     223591
cgraph.c:408 (cgraph_create_node)                   7802208: 1.6%          0: 0.0%    4477248: 1.5%    1364384: 1.9%      42637
cp/pt.c:8316 (tsubst_decl)                          2244888: 0.5%          0: 0.0%    4552704: 1.6%     357768: 0.5%      44721
tree.c:6061 (build_method_type_directly)            1946600: 0.4%          0: 0.0%    4703200: 1.6%     265992: 0.4%      33249
cgraph.c:638 (cgraph_create_edge)                  17196288: 3.5%          0: 0.0%    5254848: 1.8%          0: 0.0%     233866
tree-inline.c:4045 (copy_decl_to_var)                145488: 0.0%          0: 0.0%    5593392: 1.9%     273280: 0.4%      34160
gimple-iterator.c:446 (gsi_insert_after_without_   14342320: 2.9%          0: 0.0%    5650120: 1.9%    3998488: 5.7%     499811
ggc-common.c:187 (ggc_calloc)                      16950080: 3.5%    3025816: 2.4%    6151656: 2.1%     459072: 0.6%      69247
tree-ssanames.c:141 (make_ssa_name_fn)             16930080: 3.5%          0: 0.0%    8363760: 2.9%    1686256: 2.4%     210782
gimplify.c:521 (create_tmp_var_raw)                 5453784: 1.1%          0: 0.0%    9020256: 3.1%     689240: 1.0%      86155
tree.c:962 (build_int_cst_wide)                        6096: 0.0%          0: 0.0%   10131688: 3.5%    3323928: 4.7%       2299
tree-inline.c:3589 (copy_tree_r)                   49631032:10.2%          0: 0.0%   12553384: 4.3%    5797568: 8.2%     800223
tree-dfa.c:150 (create_var_ann)                           0: 0.0%   27303320:22.0%   12672616: 4.3%    3634176: 5.1%     454272
gimple.c:2106 (gimple_copy)                        11226992: 2.3%          0: 0.0%   13146032: 4.5%    1196784: 1.7%     209491
cp/lex.c:590 (copy_decl)                              64104: 0.0%          0: 0.0%   13548720: 4.6%    1326296: 1.9%      56894
tree-inline.c:484 (remap_block)                     1928264: 0.4%          0: 0.0%   14843088: 5.1%    1290104: 1.8%     161263
tree-ssa-operands.c:499 (ssa_operand_alloc)               0: 0.0%   34199342:27.6%   18090837: 6.2%    3566211: 5.0%      11251
tree-inline.c:4088 (copy_decl_no_change)           11756840: 2.4%          0: 0.0%   40988416:14.0%    2425144: 3.4%     317455
Total                                             487237966        123888014        293018044         70674672          9669114
source location                                     Garbage            Freed             Leak         Overhead            Times

so debug info and declarations are quite top.  This is with my DECL_INGORED_P fix I plan to commit to mainline soon.

5MB are also bitmaps
tree-ssa-operands.c:2381 (add_to_addressa  73585    9052240    5946000    4181320     173579

I suspect most of the rest are operand caches, since they are so ineffective for small functions.

at end of compilation:
tree-inline.c:484 (remap_block)                    29218176: 2.1%          0: 0.0%        104: 0.0%    2247560: 1.3%     280945
cselib.c:1155 (cselib_subst_to_values)             31320504: 2.3%          0: 0.0%          0: 0.0%    5958648: 3.4%     838942
cp/call.c:2346 (add_template_candidate_real)       31457040: 2.3%          0: 0.0%          0: 0.0%    3096816: 1.8%     457682
gimple-iterator.c:446 (gsi_insert_after_without_   32515440: 2.3%          0: 0.0%          0: 0.0%    6503088: 3.7%     812886
tree-phinodes.c:157 (allocate_phi_node)            33375352: 2.4%          0: 0.0%          0: 0.0%    1120888: 0.6%     108792
ggc-common.c:187 (ggc_calloc)                      34614992: 2.5%    9072016: 2.6%    1895328: 2.0%     671680: 0.4%     102129
rtl.c:269 (copy_rtx)                               42322896: 3.1%          0: 0.0%          0: 0.0%    8318000: 4.7%    1083689
emit-rtl.c:3348 (make_insn_raw)                    42838312: 3.1%          0: 0.0%         88: 0.0%    3894400: 2.2%     486800
gimple.c:2106 (gimple_copy)                        43173352: 3.1%          0: 0.0%          0: 0.0%    2063688: 1.2%     368502
tree-ssanames.c:141 (make_ssa_name_fn)             73506000: 5.3%          0: 0.0%      26640: 0.0%    4902176: 2.8%     612772
tree-inline.c:4088 (copy_decl_no_change)           93714848: 6.8%          0: 0.0%     176464: 0.2%    4370928: 2.5%     562896
tree-inline.c:3589 (copy_tree_r)                   98165464: 7.1%          0: 0.0%       2352: 0.0%    9178696: 5.2%    1250363
Total                                            1385145407        354509964         93434594        175468533         23822336
source location                                     Garbage            Freed             Leak         Overhead            Times

positive thing is that there are no leaked gimple statements at all.  Most of alocation at the end is:
cp/lex.c:590 (copy_decl)                            1532928: 0.1%          0: 0.0%   12079896:12.9%    1326296: 0.8%      56894
tree.c:962 (build_int_cst_wide)                        6096: 0.0%          0: 0.0%   10359544:11.1%    3388072: 1.9%       3011
tree.c:6061 (build_method_type_directly)            1947800: 0.1%          0: 0.0%    4703200: 5.0%     266040: 0.2%      33255
cp/pt.c:8316 (tsubst_decl)                          2244888: 0.2%          0: 0.0%    4552704: 4.9%     357768: 0.2%      44721

DF and PRE allocate some giant bitmaps:
df-problems.c:308 (df_rd_alloc)           145581   12612800   11870840   11870840     597073
df-problems.c:309 (df_rd_alloc)           145581    8655600    8293080    8293080     108099
df-problems.c:310 (df_rd_alloc)           145581   15585520   14869800   14869800    1391724
tree-ssa-pre.c:584 (bitmap_set_new)       987262   68922080   53349440   53349440    2631124
tree-ssa-pre.c:585 (bitmap_set_new)       987262   69386800   53918200   53918200    3978100
df-problems.c:311 (df_rd_alloc)           145581   74605440   73361600   73361600          0
df-problems.c:539 (df_rd_transfer_functio 100011   63125520   42433280   42433280     148378

My guess is that ssa-operands can be easiest to track if I am right about their memory usage.

Honza
Comment 46 Jan Hubicka 2009-02-23 16:29:30 UTC
So with brand new tuplified world, we need new statistics ;)

After parsing we are still the same:
cfg.c:216 (connect_src)                              608608: 0.2%        520: 0.0%    3028808: 1.6%     519680: 1.3%      64954
cp/lex.c:511 (build_lang_decl)                       805432: 0.3%     209648: 0.6%    3196488: 1.7%     349552: 0.9%      18896
stringpool.c:73 (alloc_node)                          65088: 0.0%          0: 0.0%    3208992: 1.7%          0: 0.0%      34105
fold-const.c:7969 (build_fold_addr_expr_with_typ     530352: 0.2%          0: 0.0%    3440880: 1.8%     441248: 1.1%      55156
cfg.c:226 (connect_dest)                             598696: 0.2%     180224: 0.5%    3484960: 1.8%     594504: 1.5%      73663
cgraph.c:432 (cgraph_create_node)                         0: 0.0%          0: 0.0%    3712320: 1.9%     412480: 1.1%      12890
gimple-low.c:888 (record_vars_into)                       0: 0.0%          0: 0.0%    3808032: 2.0%          0: 0.0%      79334
cp/pt.c:8398 (tsubst_decl)                          2244888: 0.9%          0: 0.0%    4552704: 2.4%     357768: 0.9%      44721
tree.c:6101 (build_method_type_directly)            1946800: 0.8%          0: 0.0%    4704000: 2.5%     266032: 0.7%      33254
tree-inline.c:3595 (copy_tree_r)                    9428112: 3.7%          0: 0.0%    4793248: 2.5%    1243776: 3.2%     186815
cfg.c:142 (alloc_block)                             1046016: 0.4%          0: 0.0%    4988448: 2.6%          0: 0.0%      62859
cgraph.c:681 (cgraph_create_edge)                         0: 0.0%          0: 0.0%    5183328: 2.7%          0: 0.0%      53993
cfg.c:280 (unchecked_make_edge)                           0: 0.0%     696256: 1.9%    5271424: 2.8%          0: 0.0%      93245
gimplify.c:4295 (gimplify_modify_expr)              1183600: 0.5%          0: 0.0%    5519400: 2.9%     300632: 0.8%      57164
gimple-iterator.c:446 (gsi_insert_after_without_    4903960: 1.9%          0: 0.0%    5826440: 3.1%    2146080: 5.5%     268260
gimple.c:287 (gimple_build_call_1)                   871144: 0.3%          0: 0.0%    6066056: 3.2%     247408: 0.6%      51874
tree.c:964 (build_int_cst_wide)                        6096: 0.0%          0: 0.0%   10089256: 5.3%    3310168: 8.6%       2292
gimplify.c:522 (create_tmp_var_raw)                  453768: 0.2%          0: 0.0%   10537632: 5.5%     523400: 1.4%      65425
cp/lex.c:591 (copy_decl)                              26304: 0.0%          0: 0.0%   13586520: 7.1%    1326296: 3.4%      56894
Total                                             254844605         37217584        190405469         38695562          5886026
source location                                     Garbage            Freed             Leak         Overhead            Times

After early optimizations:

tree-inline.c:4051 (copy_decl_to_var)                145488: 0.0%          0: 0.0%    5595072: 2.0%     273360: 0.4%      34170
gimple-iterator.c:446 (gsi_insert_after_without_   14165480: 3.2%          0: 0.0%    5633680: 2.0%    3959832: 5.9%     494979
ggc-common.c:187 (ggc_calloc)                        565872: 0.1%   17553976:12.6%    5692736: 2.1%     423928: 0.6%      67925 
cgraph.c:681 (cgraph_create_edge)                         0: 0.0%          0: 0.0%    5958624: 2.2%          0: 0.0%      62069
tree-ssanames.c:141 (make_ssa_name_fn)             16722480: 3.8%          0: 0.0%    8318040: 3.0%    1669368: 2.5%     208671
gimplify.c:522 (create_tmp_var_raw)                 5067048: 1.2%          0: 0.0%    8897280: 3.2%     664968: 1.0%      83121
tree.c:964 (build_int_cst_wide)                        6096: 0.0%          0: 0.0%   10327480: 3.8%    3388168: 5.1%       2341
tree-dfa.c:150 (create_var_ann)                           0: 0.0%   23279784:16.7%   10578392: 3.8%    3078016: 4.6%     384752
tree-inline.c:3595 (copy_tree_r)                   49340008:11.2%          0: 0.0%   12558984: 4.6%    5762144: 8.6%     796535
gimple.c:2071 (gimple_copy)                        11219128: 2.6%          0: 0.0%   13100280: 4.8%    1193408: 1.8%     209046
cp/lex.c:591 (copy_decl)                              62424: 0.0%          0: 0.0%   13550400: 4.9%    1326296: 2.0%      56894 
tree-inline.c:484 (remap_block)                     1924416: 0.4%          0: 0.0%   14794832: 5.4%    1286096: 1.9%     160762
tree-ssa-operands.c:499 (ssa_operand_alloc)               0: 0.0%   33737566:24.3%   17976728: 6.5%    3547382: 5.3%      11222
tree-inline.c:4094 (copy_decl_no_change)           12372728: 2.8%          0: 0.0%   29185744:10.6%    1892440: 2.8%     250867
Total                                             439703478        139072686        274931123         66651575          9262384
source location                                     Garbage            Freed             Leak         Overhead            Times 

Declarations and debug info being major consumer. We improved 293->274

and final compilation:
cp/call.c:2348 (add_template_candidate_real)       31457040: 2.5%          0: 0.0%          0: 0.0%    3096816: 1.8%     457682
gimple-iterator.c:446 (gsi_insert_after_without_   32206200: 2.5%          0: 0.0%          0: 0.0%    6441240: 3.8%     805155
tree-phinodes.c:157 (allocate_phi_node)            33346952: 2.6%          0: 0.0%          0: 0.0%    1121800: 0.7%     108439
rtl.c:269 (copy_rtx)                               41145992: 3.2%          0: 0.0%          0: 0.0%    8084728: 4.7%    1053375
emit-rtl.c:3502 (make_insn_raw)                    41827016: 3.3%          0: 0.0%         88: 0.0%    3802464: 2.2%     475308
gimple.c:2071 (gimple_copy)                        43033384: 3.4%          0: 0.0%          0: 0.0%    2056248: 1.2%     367296
tree-ssanames.c:141 (make_ssa_name_fn)             72524280: 5.7%          0: 0.0%     148440: 0.1%    4844848: 2.8%     605606
tree-inline.c:4094 (copy_decl_no_change)           74338752: 5.8%          0: 0.0%     226240: 0.2%    3451120: 2.0%     447839
tree-inline.c:3595 (copy_tree_r)                   97555488: 7.7%          0: 0.0%       2792: 0.0%    9116248: 5.4%    1242196
Total                                            1271569773        404722617        103497642        170223016         23365837
source location                                     Garbage            Freed             Leak         Overhead            Times

1.38GB to 1.27GB...
so not much change, but some progress ;)
Comment 47 Steven Bosscher 2012-08-28 08:12:10 UTC
Honza, if you have some time, it'd be interested to see where things stand today.
Comment 48 Richard Biener 2013-03-06 10:53:50 UTC
I've added the testcase to http://gcc.opensuse.org/c++bench/random/