Bug 48481 - C++ overloading memory hog
Summary: C++ overloading memory hog
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Jason Merrill
URL:
Keywords: memory-hog
Depends on:
Blocks:
 
Reported: 2011-04-06 15:32 UTC by Jakub Jelinek
Modified: 2011-06-30 21:10 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-04-08 15:17:30


Attachments
Patch (687 bytes, patch)
2011-04-07 02:03 UTC, Jason Merrill
Details | Diff
additional patch (1.57 KB, patch)
2011-04-07 20:47 UTC, Jason Merrill
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2011-04-06 15:32:08 UTC
#define A(n) \
struct S##n { int i; }; \
S##n v##n;\
extern int foo (S##n, S##n);\
extern void bar (S##n);
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#ifndef N
#define N 10000
#endif
#if N == 1000
#define E(n) D(n##0)
#elif N == 2000
#define E(n) D(n##0) D(n##1)
#elif N == 3000
#define E(n) D(n##0) D(n##1) D(n##2)
#else
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
#endif
E(0)

void
foo ()
{
#undef A
#define A(n) if (foo (v##n, v##n)) bar (v##n);
  E(0)
}

testcase eats huge amounts of memory.  With -DN=1000 -O0 it compiles quickly,
using 432768 kB of memory (852MB in top), with -DN=2000 -O0 it already uses 1672544 kB (3.5GB in top).  With -DN=3000 it took around 8GB in top.

In --enable-gather-detailed-mem-stats -fmem-report, for N=1000 I see:
cp/tree.c:1447 (ovl_cons)                          64032000:15.1%          0: 0.0%     256032: 1.8%          0: 0.0%    2009001
c-family/c-common.c:9556 (make_tree_vector)       159840120:37.8%          0: 0.0%          0: 0.0%          0: 0.0%    3996003
cp/search.c:1135 (build_baselink)                 191952000:45.4%          0: 0.0%          0: 0.0%          0: 0.0%    3999000
Total                                             422660566          8986384         14294815          2724053         10281399
source location                                     Garbage            Freed             Leak         Overhead            Times
and for N=2000:
cp/tree.c:1447 (ovl_cons)                         256064000:15.3%          0: 0.0%     512032: 1.9%          0: 0.0%    8018001
c-family/c-common.c:9556 (make_tree_vector)       639680120:38.1%          0: 0.0%          0: 0.0%          0: 0.0%   15992003
cp/search.c:1135 (build_baselink)                 767904000:45.8%          0: 0.0%          0: 0.0%          0: 0.0%   15998000
Total                                            1677099246         12464328         27061439          3876781         40545425
source location                                     Garbage            Freed             Leak         Overhead            Times
Comment 1 Jakub Jelinek 2011-04-06 15:48:35 UTC
During perform_overload_resolution add_candidates allocates lots of ggc memory
which splice_viable immediately throws away and we don't ggc_collect during parsing.
Comment 2 Jason Merrill 2011-04-07 02:03:19 UTC
Created attachment 23906 [details]
Patch

This patch should avoid much of the baselink and tree vector garbage.  Jakub, can you give it a spin?
Comment 3 Jakub Jelinek 2011-04-07 12:16:23 UTC
Haven't bootstrapped/regtested it, but it is definitely improvement.
With N=1000 and N=2000 the generated assembly is identical, for N=1000
reported TOTAL went down from 432768 kB to 89362 kB, for N=2000
from 1672544 kB to 298232 kB and on a box with 8GB of RAM I can compile even
N=5000 case, which takes 1685817 kB reported TOTAL memory.  N=10000 requires already too much RAM though.

In the -DN=5000 -fmem-report dump the only interesting allocations are:
cp/tree.c:1447 (ovl_cons)                        1600160000:97.9%          0: 0.0%    1280032: 2.0%          0: 0.0%   50045001
Total                                            1634296366         38329920         65115511         11401989         51377483
source location                                     Garbage            Freed             Leak         Overhead            Times
so if even that garbage could be freed, this would be fixed completely.
Even for N=1000 ovl_cons is the only one that really matters:
cp/tree.c:1447 (ovl_cons)                          64032000:90.2%          0: 0.0%     256032: 1.8%          0: 0.0%    2009001
Total                                              71012606          8986384         14294815          2724053          2289400
source location                                     Garbage            Freed             Leak         Overhead            Times

Those ovl_cons calls are from lookup_arg_dependent -> ... -> add_function -> build_overload.  Is it guaranteed that perform_koenig_lookup, if it returns a chain of OVERLOADs, all OVERLOADs have been freshly make_noded and aren't shared with anything else?  If yes, perhaps we could afterwards ggc_free the chain, or move it to some cache of OVERLOAD nodes and make ovl_cons start from that cache.
Comment 4 Jason Merrill 2011-04-07 20:47:14 UTC
Created attachment 23920 [details]
additional patch

This ought to help with the OVERLOAD garbage.
Comment 5 Jason Merrill 2011-04-08 06:08:09 UTC
Author: jason
Date: Fri Apr  8 06:08:04 2011
New Revision: 172162

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172162
Log:
	PR c++/48481
	* call.c (build_user_type_conversion_1): Use lookup_fnfields_slot.
	Release unused vector.

Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/call.c
Comment 6 Jason Merrill 2011-04-08 06:08:16 UTC
Author: jason
Date: Fri Apr  8 06:08:13 2011
New Revision: 172163

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172163
Log:
	PR c++/48481
	* cp-tree.h (OVL_ARG_DEPENDENT): New.
	* name-lookup.c (add_function): Set it.
	* semantics.c (finish_call_expr): Free OVERLOADs if it's set.

Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/cp-tree.h
    trunk/gcc/cp/name-lookup.c
    trunk/gcc/cp/semantics.c
Comment 7 Jason Merrill 2011-04-08 06:08:27 UTC
Author: jason
Date: Fri Apr  8 06:08:21 2011
New Revision: 172164

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172164
Log:
	PR c++/48481
	* tree.c (build_overload): Allow an unwrapped FUNCTION_DECL
	at the end of the chain.
	* pt.c (dependent_template_p): Use OVL_CURRENT/NEXT.
	(iterative_hash_template_arg): Likewise.

Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/name-lookup.c
    trunk/gcc/cp/pt.c
    trunk/gcc/cp/tree.c
Comment 8 Jason Merrill 2011-04-08 15:17:30 UTC
Should be fixed on the trunk.
Comment 9 Jason Merrill 2011-04-28 15:41:02 UTC
The largest source of garbage at N=2000 is now emit_insn_raw at 9.1%, so I'm closing this as fixed.
Comment 10 Jason Merrill 2011-06-30 21:10:07 UTC
Author: jason
Date: Thu Jun 30 21:10:03 2011
New Revision: 175732

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175732
Log:
	PR c++/48481
	* name-lookup.c (struct arg_lookup): Add fn_set.
	(add_function): Check it.
	(lookup_arg_dependent_1): Initialize it.

Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/name-lookup.c
    trunk/gcc/testsuite/g++.dg/template/crash37.C
    trunk/gcc/testsuite/g++.dg/template/ptrmem4.C
    trunk/gcc/testsuite/g++.old-deja/g++.other/pmf3.C