GCC memory consumption increased by recent patch!

Jan Hubicka jh@suse.cz
Tue Sep 14 16:11:00 GMT 2004


> Jan Hubicka wrote:
> 
> >Kenneth,
> >this increse in memory usage seems to be yours (at least I've checked
> >that my patch enabling scev induction variable analysis don't change any
> >increase in ggc memory consumed).
> >While the datastructures used in your patch looks very sane, perhaps
> >there is some room for improvement (at least the 10% in combine.c at -O3
> >looks relatively serious)
> > 
> >
> I am quite surprised at this number.  Combine does have a fair number of 
> static variables so the bitmaps are non trivial but the function does 
> not have a huge number of functions (it does have some very large 
> functions but this is not relevant.)  It is hard to imagine what is 
> being done at the cgraph level in o3 that would cause this kind of 
> behavior. 
> 
> Is it possible that I am adding bit vectors to unreachable cgraph nodes 
> that are the result of more aggressive inlining?

The cgraph after inlining is modified in a way so functions being
inlined are turned into "clones" each having separate cgraph_node entry
and the out of line copies of functions are elliminated on a way if
possible, so you should not be creating bitmaps for unreachable
functions, but you will be creating multiple copies of bitmap for each
clone.

In the case this has such a noticeable memory overhead (tought I do have
hard time to believe it too - the overall memory consumption is coputed
by script that sums mmap/munmap/sbrk calls from strace and theoretically
might be broken some interesting way, but overall the results don't
seems to be seriously off), we can consider way of sharing
datastructures across the nodes, but it will add another complexity.  I
am quite surprised that this won't show up in GGC memory nor it is not
more visible on Gerald's testcase where we do extreme amount of
inlining.  Are you allocating somewhere nontrivial amounts of non-GGC
memory?

I am also getting new failure on -O2 compilation of Gerald's testcase
(PR8361) on i686 - abort on ADDRESSABLE_P bit being zero but address
taken.  Can this be related to your changes?
> 
> >Also would you mind if I moved your code into separate file out of
> >cgraphunit for next developpment period (so on the tree-profiling
> >branch).  If you have some additional changes, I can do this later just
> >when it is convenient.
> >
> > 
> >
> As far as moving it to another file, that is fine.  I put it in 
> cgraphunit because that was the only place where from a temporal point 
> of view I had access to all of the functions before they were compiled 
> and I could attach the bit vectors to the cgraph nodes.  I do have 
> another set of changes that I will get done soon where I will add some 
> more external calls.  Then after that I was planning to make this work 
> in a branch that contained stuart hastings restructuring code and take 
> advantage of that restructuring.   When it is properly restructured, 

Yes, we probably can continue work on this in tree-profiling branch
during 3.5 freeze period.
The actual reason why I would like to split as much as possible out of
cgraphunit (ie make separate file for each optimizer) is that I want to
force myself thinking about inferfaces.  Currently cgraphunit
accumulated bit more magic than I would like.
I also plan to add simple passmanager like we have in tree-optimize.c

Thanks,
Honza
> this will cause the space to drop by a factor of 2 since we will not 
> need two sets of bit vectors, one indexed by var ann uid and one indexed 
> by the decl uid.
> 
> Kenny
> 
> >Honza
> > 
> >
> >>Hi,
> >>Comparing memory consumption on compilation of combine.i and 
> >>generate-3.4.ii I got:
> >>
> >>
> >>comparing combine.c compilation at -O0 level:
> >>   Overall memory needed: 17820k
> >>   Peak memory use before GGC: 9294k
> >>   Peak memory use after GGC: 8606k
> >>   Maximum of released memory in single GGC run: 2867k
> >>   Garbage: 42475k -> 42487k
> >>   Leak: 6107k -> 6087k
> >>   Overhead: 5590k -> 5586k
> >>   GGC runs: 363
> >>
> >>comparing combine.c compilation at -O1 level:
> >> Overall memory allocated via mmap and sbrk increased from 18540k to 
> >> 18652k, overall 0.60%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 9573k to 9710k, overall 1.43%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 8663k to 8800k, overall 1.58%
> >> Amount of produced GGC garbage increased from 78747k to 79460k, overall 
> >> 0.90%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 6483k to 6669k, overall 2.87%
> >>   Overall memory needed: 18540k -> 18652k
> >>   Peak memory use before GGC: 9573k -> 9710k
> >>   Peak memory use after GGC: 8663k -> 8800k
> >>   Maximum of released memory in single GGC run: 2067k -> 2073k
> >>   Garbage: 78747k -> 79460k
> >>   Leak: 6483k -> 6669k
> >>   Overhead: 13868k -> 14291k
> >>   GGC runs: 589 -> 591
> >>
> >>comparing combine.c compilation at -O2 level:
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 12756k to 12769k, overall 0.10%
> >> Amount of produced GGC garbage increased from 94713k to 95369k, overall 
> >> 0.69%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 6304k to 6424k, overall 1.89%
> >>   Overall memory needed: 22076k -> 21988k
> >>   Peak memory use before GGC: 12756k -> 12769k
> >>   Peak memory use after GGC: 12610k
> >>   Maximum of released memory in single GGC run: 2576k -> 2577k
> >>   Garbage: 94713k -> 95369k
> >>   Leak: 6304k -> 6424k
> >>   Overhead: 18777k -> 19165k
> >>   GGC runs: 580 -> 582
> >>
> >>comparing combine.c compilation at -O3 level:
> >> Overall memory allocated via mmap and sbrk increased from 23972k to 
> >> 26384k, overall 10.06%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 13246k to 13426k, overall 1.36%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 12610k to 12742k, overall 1.05%
> >> Amount of produced GGC garbage increased from 126026k to 127206k, 
> >> overall 0.94%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 6852k to 6991k, overall 2.03%
> >>   Overall memory needed: 23972k -> 26384k
> >>   Peak memory use before GGC: 13246k -> 13426k
> >>   Peak memory use after GGC: 12610k -> 12742k
> >>   Maximum of released memory in single GGC run: 3483k
> >>   Garbage: 126026k -> 127206k
> >>   Leak: 6852k -> 6991k
> >>   Overhead: 24652k -> 25258k
> >>   GGC runs: 646 -> 650
> >>
> >>comparing insn-attrtab.c compilation at -O0 level:
> >>   Overall memory needed: 132860k
> >>   Peak memory use before GGC: 76388k
> >>   Peak memory use after GGC: 45185k
> >>   Maximum of released memory in single GGC run: 41417k
> >>   Garbage: 157790k -> 157803k
> >>   Leak: 10620k -> 10618k
> >>   Overhead: 19800k -> 19798k
> >>   GGC runs: 310
> >>
> >>comparing insn-attrtab.c compilation at -O1 level:
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 94307k to 94612k, overall 0.32%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 71401k to 71706k, overall 0.43%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 10968k to 11052k, overall 0.77%
> >>   Overall memory needed: 151600k -> 150756k
> >>   Peak memory use before GGC: 94307k -> 94612k
> >>   Peak memory use after GGC: 71401k -> 71706k
> >>   Maximum of released memory in single GGC run: 40513k
> >>   Garbage: 474239k -> 474241k
> >>   Leak: 10968k -> 11052k
> >>   Overhead: 84931k -> 85779k
> >>   GGC runs: 461 -> 462
> >>
> >>comparing insn-attrtab.c compilation at -O2 level:
> >> Overall memory allocated via mmap and sbrk increased from 237408k to 
> >> 241428k, overall 1.69%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 109880k to 110181k, overall 0.27%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 86974k to 87274k, overall 0.34%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 11150k to 11207k, overall 0.51%
> >>   Overall memory needed: 237408k -> 241428k
> >>   Peak memory use before GGC: 109880k -> 110181k
> >>   Peak memory use after GGC: 86974k -> 87274k
> >>   Maximum of released memory in single GGC run: 35488k -> 35489k
> >>   Garbage: 525180k -> 525164k
> >>   Leak: 11150k -> 11207k
> >>   Overhead: 95096k -> 95934k
> >>   GGC runs: 383
> >>
> >>comparing insn-attrtab.c compilation at -O3 level:
> >> Overall memory allocated via mmap and sbrk increased from 237384k to 
> >> 241444k, overall 1.71%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 109882k to 110181k, overall 0.27%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 86975k to 87275k, overall 0.34%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 11223k to 11271k, overall 0.43%
> >>   Overall memory needed: 237384k -> 241444k
> >>   Peak memory use before GGC: 109882k -> 110181k
> >>   Peak memory use after GGC: 86975k -> 87275k
> >>   Maximum of released memory in single GGC run: 35488k
> >>   Garbage: 527525k -> 527533k
> >>   Leak: 11223k -> 11271k
> >>   Overhead: 95873k -> 96717k
> >>   GGC runs: 392
> >>
> >>comparing Gerald's testcase PR8361 compilation at -O0 level:
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 58324k to 59855k, overall 2.63%
> >>   Overall memory needed: 114844k
> >>   Peak memory use before GGC: 92008k -> 92009k
> >>   Peak memory use after GGC: 90475k -> 90476k
> >>   Maximum of released memory in single GGC run: 20896k -> 20897k
> >>   Garbage: 270774k -> 271018k
> >>   Leak: 58324k -> 59855k
> >>   Overhead: 34949k -> 35160k
> >>   GGC runs: 552 -> 551
> >>
> >>comparing Gerald's testcase PR8361 compilation at -O1 level:
> >> Overall memory allocated via mmap and sbrk increased from 120248k to 
> >> 126228k, overall 4.97%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 96217k to 96401k, overall 0.19%
> >> Amount of produced GGC garbage increased from 671500k to 689097k, 
> >> overall 2.62%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 60665k to 62390k, overall 2.84%
> >>   Overall memory needed: 120248k -> 126228k
> >>   Peak memory use before GGC: 96217k -> 96401k
> >>   Peak memory use after GGC: 89741k
> >>   Maximum of released memory in single GGC run: 20047k -> 20069k
> >>   Garbage: 671500k -> 689097k
> >>   Leak: 60665k -> 62390k
> >>   Overhead: 145200k -> 151336k
> >>   GGC runs: 835 -> 820
> >>
> >>comparing Gerald's testcase PR8361 compilation at -O2 level:
> >> Overall memory allocated via mmap and sbrk increased from 121648k to 
> >> 127504k, overall 4.81%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 96218k to 96401k, overall 0.19%
> >> Amount of produced GGC garbage increased from 732787k to 750297k, 
> >> overall 2.39%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 61238k to 62963k, overall 2.82%
> >>   Overall memory needed: 121648k -> 127504k
> >>   Peak memory use before GGC: 96218k -> 96401k
> >>   Peak memory use after GGC: 89741k
> >>   Maximum of released memory in single GGC run: 20048k -> 20069k
> >>   Garbage: 732787k -> 750297k
> >>   Leak: 61238k -> 62963k
> >>   Overhead: 171765k -> 177962k
> >>   GGC runs: 866 -> 848
> >>
> >>comparing Gerald's testcase PR8361 compilation at -O3 level:
> >> Overall memory allocated via mmap and sbrk increased from 120456k to 
> >> 126044k, overall 4.64%
> >> Peak amount of GGC memory allocated before garbage collecting increased 
> >> from 92007k to 92781k, overall 0.84%
> >> Peak amount of GGC memory still allocated after garbage collectin 
> >> increased from 90540k to 90698k, overall 0.17%
> >> Amount of produced GGC garbage increased from 770315k to 790042k, 
> >> overall 2.56%
> >> Amount of memory still referenced at the end of compilation increased 
> >> from 61606k to 63353k, overall 2.84%
> >>   Overall memory needed: 120456k -> 126044k
> >>   Peak memory use before GGC: 92007k -> 92781k
> >>   Peak memory use after GGC: 90540k -> 90698k
> >>   Maximum of released memory in single GGC run: 20814k
> >>   Garbage: 770315k -> 790042k
> >>   Leak: 61606k -> 63353k
> >>   Overhead: 183430k -> 190014k
> >>   GGC runs: 853 -> 836
> >>
> >>Head of changelog is:
> >>
> >>--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 
> >>2004-09-13 21:44:05.000000000 +0000
> >>+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 
> >>2004-09-14 03:38:36.000000000 +0000
> >>@@ -1,3 +1,69 @@
> >>+2004-09-14  Jan Hubicka  <jh@suse.cz>
> >>+
> >>+	* Makefile.in (predict.o): Depend on tree-scalar-evolution.h
> >>+	* predict.c: Include tree-scalar-evolution.h and cfgloop.h
> >>+	(predict_loops): Use number_of_iterations_exit to predict
> >>+	number of iterations on trees.
> >>+
> >>+2004-09-13  Dale Johannesen  <dalej@apple.com>
> >>+
> >>+	PR 17408
> >>+	PR 17409
> >>+	* c-decl.c (start_decl): Repair TREE_STATIC for initialized
> >>+	objects declared extern.
> >>+
> >>+2004-09-14  Paul Brook  <paul@codesourcery.com>
> >>+
> >>+	* config/arm/arm.c (arm_expand_prologue): Make args_to_push a
> >>+	HOST_WIDE_INT.
> >>+
> >>+2004-09-13  Daniel Jacobowitz  <dan@debian.org>
> >>+
> >>+	* fold-const.c (fold_checksum_tree): Ignore TYPE_CACHED_VALUES.
> >>+	Only use TYPE_BINFO for aggregates.
> >>+
> >>+2004-09-13  Daniel Jacobowitz  <dan@debian.org>
> >>+
> >>+	* expmed.c (synth_mult): Initialize latency.  Check cost before
> >>+	checking ops count.
> >>+
> >>+2004-09-13  Kenneth Zadeck  <Kenneth.Zadeck@NaturalBridge.com>
> >>+
> >>+
> >>+	* tree-ssa-operands.c (get_call_expr_operands): Added parm to
> >>+	add_call_clobber_ops and add_call_read_ops.
> >>+	(add_call_clobber_ops, add_call_read_ops): Added code to reduce
> >>+	the number of vdefs and vuses inserted based on analysis of global
> >>+	variables across calls.  * tree-dfa.c (find_referenced_vars):
> >>+	Needed to reset static var maps before each function is compiled.
> >>+	* cgraphunit.c:
> >>+	(static_vars_to_consider_by_tree,static_vars_to_consider_by_uid,
> >>+	static_vars_info,functions_to_static_vars_info,module_statics_escape,
> >>+	all_module_statics,searchc_env,dfs_info): New fields to support
> >>+	analysis of static global variables.
> >>+	(print_order, convert_UIDs_in_bitmap, new_static_vars_info,
> >>+	cgraph_reset_static_var_maps, get_global_static_vars_info,
> >>+	get_global_statics_not_read, get_global_statics_not_written,
> >>+	searchc, cgraph_reduced_inorder, has_proper_scope_for_analysis,
> >>+	check_rhs_var, check_lhs_var, get_asm_expr_operands,
> >>+	process_call_for_static_vars, scan_for_static_refs,
> >>+	cgraph_characterize_statics_local, cgraph_get_static_name_by_uid,
> >>+	clear_static_vars_maps, cgraph_propagate_bits,
> >>+	cgraph_characterize_statics): New. Functions to support analysis
> >>+	of static global variables.
> >>+	(cgraph_mark_local_and_external_functions): Renamed from:
> >>+	(cgraph_mark_local_functions)
> >>+	(cgraph_expand_all_functions): Remove call to
> >>+	cgraph_mark_local_and_external_functions.
> >>+	(cgraph_optimize): Added driver to analyze static variables whose
> >>+	scope is within the compilation unit.  * cgraph.h (struct
> >>+	cgraph_local_info, GTY): Added statics_read, statics_written,
> >>+	local, calls_read_all, calls_write_all, for_functions_valid.
> >>+	(struct cgraph_node): Added next_cycle.  * cgraph.c
> >>+	(dump_cgraph_node): Added print routines for new fields.  *
> >>+	makefile.in: macroized cgraph.h, added cgraphunit.c to the ggc
> >>+	list.
> >>+
> >>2004-09-13  Joseph S. Myers  <jsm@polyomino.org.uk>
> >>
> >>	* c-decl.c (grokdeclarator): Correct comments about where storage
> >>--- 
> >>/usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog.cp 
> >>2004-09-12 21:43:28.000000000 +0000
> >>+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/cp/ChangeLog 
> >>2004-09-14 03:38:37.000000000 +0000
> >>@@ -1,3 +1,13 @@
> >>+2004-09-13  Mark Mitchell  <mark@codesourcery.com>
> >>+
> >>+	PR c++/16716
> >>+	* parser.c (cp_parser_parse_and_diagnose_invalid_type_name):
> >>+	Robustify.
> >>+
> >>+	PR c++/17327
> >>+	* pt.c (unify): Add ENUMERAL_TYPE case.  Replace sorry with
> >>+	gcc_unreacable.
> >>+
> >>2004-09-12  Richard Henderson  <rth@redhat.com>
> >>
> >>	PR c++/16254
> >>
> >>I am friendly script caring about memory consumption in GCC.  Please 
> >>contact
> >>jh@suse.cz if something is going wrong.
> >>
> >>The results can be reproduced by building compiler with
> >>--enable-gather-detailed-mem-stats targetting x86-64 and compiling 
> >>preprocessed
> >>combine.c or testcase from PR8632 with:
> >>
> >>-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
> >>
> >>The memory consumption summary appears in the dump after detailed listing 
> >>of
> >>the places they are allocated in.  Peak memory consumption is actually 
> >>computed
> >>by looking for maximal value in {GC XXXX -> YYYY} report.
> >>
> >>Yours testing script.
> >>   
> >>
> 



More information about the Gcc-regression mailing list