This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A recent patch increased GCC's memory consumption!


Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing empty function compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 2328k to 2334k, overall 0.26%
  Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 2378k to 2385k, overall 0.31%
    Overall memory needed: 7411k -> 7410k
    Peak memory use before GGC: 2328k -> 2334k
    Peak memory use after GGC: 2001k -> 2007k
    Maximum of released memory in single GGC run: 327k
    Garbage: 480k
    Leak: 2378k -> 2385k
    Overhead: 517k -> 518k
    GGC runs: 3

comparing empty function compilation at -O0 -g level:
  Peak amount of GGC memory allocated before garbage collecting increased from 2356k to 2362k, overall 0.25%
  Peak amount of GGC memory still allocated after garbage collecting increased from 2028k to 2034k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 2410k to 2418k, overall 0.31%
    Overall memory needed: 7427k -> 7426k
    Peak memory use before GGC: 2356k -> 2362k
    Peak memory use after GGC: 2028k -> 2034k
    Maximum of released memory in single GGC run: 328k
    Garbage: 482k
    Leak: 2410k -> 2418k
    Overhead: 521k -> 523k
    GGC runs: 3

comparing empty function compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 2328k to 2334k, overall 0.26%
  Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 2380k to 2387k, overall 0.31%
    Overall memory needed: 7519k -> 7518k
    Peak memory use before GGC: 2328k -> 2334k
    Peak memory use after GGC: 2001k -> 2007k
    Maximum of released memory in single GGC run: 327k
    Garbage: 485k
    Leak: 2380k -> 2387k
    Overhead: 517k -> 519k
    GGC runs: 3

comparing empty function compilation at -O2 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 2329k to 2335k, overall 0.26%
  Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 2380k to 2388k, overall 0.31%
    Overall memory needed: 7527k -> 7526k
    Peak memory use before GGC: 2329k -> 2335k
    Peak memory use after GGC: 2001k -> 2007k
    Maximum of released memory in single GGC run: 328k
    Garbage: 489k
    Leak: 2380k -> 2388k
    Overhead: 518k -> 519k
    GGC runs: 4

comparing empty function compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 2329k to 2335k, overall 0.26%
  Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 2380k to 2388k, overall 0.31%
    Overall memory needed: 7527k -> 7526k
    Peak memory use before GGC: 2329k -> 2335k
    Peak memory use after GGC: 2001k -> 2007k
    Maximum of released memory in single GGC run: 328k
    Garbage: 489k
    Leak: 2380k -> 2388k
    Overhead: 518k -> 519k
    GGC runs: 4

comparing combine.c compilation at -O0 level:
  Amount of memory still referenced at the end of compilation increased from 7003k to 7011k, overall 0.11%
    Overall memory needed: 17671k -> 17682k
    Peak memory use before GGC: 9019k -> 9025k
    Peak memory use after GGC: 8259k -> 8265k
    Maximum of released memory in single GGC run: 1874k
    Garbage: 37665k -> 37649k
    Leak: 7003k -> 7011k
    Overhead: 4751k -> 4753k
    GGC runs: 278

comparing combine.c compilation at -O0 -g level:
  Amount of memory still referenced at the end of compilation increased from 9888k to 9903k, overall 0.16%
    Overall memory needed: 19571k -> 19578k
    Peak memory use before GGC: 10754k -> 10760k
    Peak memory use after GGC: 9978k -> 9984k
    Maximum of released memory in single GGC run: 1558k
    Garbage: 38031k -> 38018k
    Leak: 9888k -> 9903k
    Overhead: 5457k -> 5459k
    GGC runs: 269

comparing combine.c compilation at -O1 level:
  Amount of produced GGC garbage increased from 52529k to 52599k, overall 0.13%
  Amount of memory still referenced at the end of compilation increased from 7056k to 7063k, overall 0.10%
    Overall memory needed: 30003k -> 30010k
    Peak memory use before GGC: 17855k -> 17863k
    Peak memory use after GGC: 17659k -> 17665k
    Maximum of released memory in single GGC run: 1450k -> 1454k
    Garbage: 52529k -> 52599k
    Leak: 7056k -> 7063k
    Overhead: 5924k -> 5931k
    GGC runs: 357

comparing combine.c compilation at -O2 level:
  Amount of produced GGC garbage increased from 69078k to 69719k, overall 0.93%
    Overall memory needed: 34363k -> 34414k
    Peak memory use before GGC: 17879k -> 17891k
    Peak memory use after GGC: 17671k -> 17679k
    Maximum of released memory in single GGC run: 1392k -> 1368k
    Garbage: 69078k -> 69719k
    Leak: 7175k -> 7178k
    Overhead: 8023k -> 8062k
    GGC runs: 413 -> 415

comparing combine.c compilation at -O3 level:
  Peak amount of GGC memory still allocated after garbage collecting increased from 17826k to 17883k, overall 0.32%
  Amount of produced GGC garbage increased from 94366k to 95405k, overall 1.10%
  Amount of memory still referenced at the end of compilation increased from 7275k to 7285k, overall 0.13%
    Overall memory needed: 40711k -> 40750k
    Peak memory use before GGC: 18150k -> 18109k
    Peak memory use after GGC: 17826k -> 17883k
    Maximum of released memory in single GGC run: 3637k
    Garbage: 94366k -> 95405k
    Leak: 7275k -> 7285k
    Overhead: 11304k -> 11329k
    GGC runs: 444 -> 447

comparing insn-attrtab.c compilation at -O0 level:
  Amount of produced GGC garbage increased from 129129k to 129385k, overall 0.20%
    Overall memory needed: 92859k -> 92882k
    Peak memory use before GGC: 58839k -> 58845k
    Peak memory use after GGC: 33335k -> 33341k
    Maximum of released memory in single GGC run: 33674k
    Garbage: 129129k -> 129385k
    Leak: 9840k -> 9607k
    Overhead: 13888k -> 13889k
    GGC runs: 216

comparing insn-attrtab.c compilation at -O0 -g level:
    Overall memory needed: 94135k -> 94150k
    Peak memory use before GGC: 60001k -> 60007k
    Peak memory use after GGC: 34496k -> 34502k
    Maximum of released memory in single GGC run: 33675k
    Garbage: 129348k -> 129348k
    Leak: 11548k -> 11556k
    Overhead: 14285k -> 14287k
    GGC runs: 212 -> 211

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 110311k -> 107726k
    Peak memory use before GGC: 63396k -> 62277k
    Peak memory use after GGC: 60770k -> 59776k
    Maximum of released memory in single GGC run: 24882k -> 24268k
    Garbage: 233061k -> 227713k
    Leak: 9735k -> 9735k
    Overhead: 26100k -> 25391k
    GGC runs: 245 -> 246

comparing insn-attrtab.c compilation at -O2 level:
    Overall memory needed: 169391k -> 165762k
    Peak memory use before GGC: 63531k -> 62706k
    Peak memory use after GGC: 61068k -> 60102k
    Maximum of released memory in single GGC run: 21237k -> 20519k
    Garbage: 269078k -> 263657k
    Leak: 9728k -> 9726k
    Overhead: 31676k -> 30976k
    GGC runs: 266 -> 267

comparing insn-attrtab.c compilation at -O3 level:
    Overall memory needed: 185135k -> 180774k
    Peak memory use before GGC: 75553k -> 75338k
    Peak memory use after GGC: 71473k -> 70907k
    Maximum of released memory in single GGC run: 21970k -> 22137k
    Garbage: 300284k -> 292829k
    Leak: 9732k -> 9730k
    Overhead: 32925k -> 32697k
    GGC runs: 267

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 145852k -> 145853k
    Peak memory use before GGC: 89184k -> 89190k
    Peak memory use after GGC: 88301k -> 88307k
    Maximum of released memory in single GGC run: 18130k
    Garbage: 206731k -> 206715k
    Leak: 51180k -> 51188k
    Overhead: 23432k -> 23433k
    GGC runs: 408

comparing Gerald's testcase PR8361 compilation at -O0 -g level:
    Overall memory needed: 163568k -> 163569k
    Peak memory use before GGC: 101740k -> 101746k
    Peak memory use after GGC: 100733k -> 100738k
    Maximum of released memory in single GGC run: 18433k -> 18434k
    Garbage: 212416k -> 212400k
    Leak: 74495k -> 74502k
    Overhead: 29328k -> 29330k
    GGC runs: 381

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 100612k to 100956k, overall 0.34%
  Peak amount of GGC memory still allocated after garbage collecting increased from 99600k to 99956k, overall 0.36%
  Amount of produced GGC garbage increased from 342613k to 343553k, overall 0.27%
    Overall memory needed: 141736k -> 141721k
    Peak memory use before GGC: 100612k -> 100956k
    Peak memory use after GGC: 99600k -> 99956k
    Maximum of released memory in single GGC run: 17471k
    Garbage: 342613k -> 343553k
    Leak: 51773k -> 51782k
    Overhead: 30661k -> 30699k
    GGC runs: 527 -> 530

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Amount of produced GGC garbage increased from 390024k to 391480k, overall 0.37%
    Overall memory needed: 147180k -> 147205k
    Peak memory use before GGC: 101401k -> 101412k
    Peak memory use after GGC: 100396k -> 100402k
    Maximum of released memory in single GGC run: 17468k
    Garbage: 390024k -> 391480k
    Leak: 52888k -> 52898k
    Overhead: 36191k -> 36263k
    GGC runs: 580 -> 581

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of produced GGC garbage increased from 423975k to 426078k, overall 0.50%
    Overall memory needed: 149484k -> 149561k
    Peak memory use before GGC: 102996k -> 103006k
    Peak memory use after GGC: 101975k -> 101981k
    Maximum of released memory in single GGC run: 17912k -> 17872k
    Garbage: 423975k -> 426078k
    Leak: 53197k -> 53207k
    Overhead: 38815k -> 38917k
    GGC runs: 605

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 244575k -> 244578k
    Peak memory use before GGC: 81022k -> 81028k
    Peak memory use after GGC: 58761k -> 58767k
    Maximum of released memory in single GGC run: 44134k
    Garbage: 144429k -> 144417k
    Leak: 7727k -> 7735k
    Overhead: 23300k -> 23301k
    GGC runs: 79

comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
    Overall memory needed: 245395k -> 245402k
    Peak memory use before GGC: 81668k -> 81674k
    Peak memory use after GGC: 59407k -> 59413k
    Maximum of released memory in single GGC run: 44123k
    Garbage: 144474k -> 144534k
    Leak: 9496k -> 9503k
    Overhead: 23796k -> 23797k
    GGC runs: 87

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 243847k -> 243930k
    Peak memory use before GGC: 83517k -> 83514k
    Peak memory use after GGC: 74903k -> 74909k
    Maximum of released memory in single GGC run: 39415k -> 39406k
    Garbage: 222967k -> 222953k
    Leak: 20971k -> 20978k
    Overhead: 29139k -> 29139k
    GGC runs: 81

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 264011k -> 264214k
    Peak memory use before GGC: 79889k -> 79895k
    Peak memory use after GGC: 74903k -> 74909k
    Maximum of released memory in single GGC run: 33022k -> 33018k
    Garbage: 229691k -> 229676k
    Leak: 21061k -> 21068k
    Overhead: 31163k -> 31163k
    GGC runs: 91

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 1297803k -> 1297770k
    Peak memory use before GGC: 190662k -> 190668k
    Peak memory use after GGC: 178178k -> 178184k
    Maximum of released memory in single GGC run: 80664k
    Garbage: 362947k -> 362940k
    Leak: 46428k -> 46435k
    Overhead: 43819k -> 43819k
    GGC runs: 72

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2007-05-16 21:12:47.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2007-05-17 13:23:54.000000000 +0000
@@ -1,3 +1,98 @@
+2007-05-17  Zdenek Dvorak  <dvorakz@suse.cz>
+
+	* tree-vrp.c (finalize_jump_threads): Do not care about dominance info.
+	(execute_vrp): Preserve loops through jump threading.
+	* tree-ssa-threadupdate.c (thread_single_edge,
+	dbds_continue_enumeration_p, determine_bb_domination_status,
+	thread_through_loop_header): New functions.
+	(create_edge_and_update_destination_phis,
+	create_edge_and_update_destination_phis): Set loops for the new blocks.
+	(prune_undesirable_thread_requests): Removed.
+	(redirect_edges): Do not pretend that redirect_edge_and_branch can
+	create new blocks.
+	(thread_block): Do not call prune_undesirable_thread_requests.
+	Update loops.
+	(mark_threaded_blocks): Select edges to thread here.
+	(thread_through_all_blocks): Take may_peel_loop_headers argument.
+	Thread edges through loop headers independently.
+	* cfgloopmanip.c (create_preheader, mfb_keep_just): Export.
+	* tree-pass.h (TODO_mark_first_instance): New.
+	(first_pass_instance): Declare.
+	* cfghooks.c (duplicate_block): Put the block to the original loop
+	if copy is not specified.
+	* tree-ssa-dom.c (tree_ssa_dominator_optimize): Preserve loops through
+	jump threading.  Pass may_peel_loop_headers to
+	thread_through_all_blocks according to first_pass_instance.
+	* cfgloop.h (create_preheader): Declare.
+	* tree-flow.h (thread_through_all_blocks): Declaration changed.
+	* basic-block.h (mfb_keep_just, mfb_kj_edge): Declare.
+	* passes.c (first_pass_instance): New variable.
+	(next_pass_1): Set TODO_mark_first_instance.
+	(execute_todo): Set first_pass_instance.
+
+2007-05-17  Uros Bizjak  <ubizjak@gmail.com>
+
+	PR tree-optimization/24659
+	* optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi,
+	OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi,
+	OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and
+	OTI_vec_pack_ufix_trunc.
+	(vec_unpacks_float_hi_optab): Define new macro.
+	(vec_unpacks_float_lo_optab): Ditto.
+	(vec_unpacku_float_hi_optab): Ditto.
+	(vec_unpacku_float_lo_optab): Ditto.
+	(vec_pack_sfix_trunc_optab): Ditto.
+	(vec_pack_ufix_trunc_optab): Ditto.
+	* genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab
+	and vec_pack_[s|u]fix_trunc_optab using
+	vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns
+	* tree-vectorizer.c (supportable_widening_operation): Handle
+	FLOAT_EXPR and CONVERT_EXPR.  Update comment.
+	(supportable_narrowing_operation): New function.
+	* tree-vectorizer.h (supportable_narrowing_operation): Prototype.
+	* tree-vect-transform.c (vectorizable_conversion): Handle
+	(nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.
+	(vect_gen_widened_results_half): Move before vectorizable_conversion.
+	(vectorizable_type_demotion): Call supportable_narrowing_operation()
+	to check for target support.
+	* optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab
+	for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab
+	for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab
+	for VEC_PACK_FIX_TRUNC_EXPR.
+	(expand_binop): Special case mode of the result for
+	vec_pack_[u|s]fix_trunc_optab.
+	(init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and
+	vec_pack_[u|s]fix_trunc_optab.
+
+	* tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR,
+	VEC_PACK_FIX_TRUNC_EXPR): New tree codes.
+	* tree-pretty-print.c (dump_generic_node): Handle
+	VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and
+	VEC_PACK_FIX_TRUNC_EXPR.
+	(op_prio): Ditto.
+	* expr.c (expand_expr_real_1): Ditto.
+	* tree-inline.c (estimate_num_insns_1): Ditto.
+	* tree-vect-generic.c (expand_vector_operations_1): Ditto.
+
+	* config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander.
+	(vec_unpacks_float_lo_v8hi): Ditto.
+	(vec_unpacku_float_hi_v8hi): Ditto.
+	(vec_unpacku_float_lo_v8hi): Ditto.
+	(vec_unpacks_float_hi_v4si): Ditto.
+	(vec_unpacks_float_lo_v4si): Ditto.
+	(vec_pack_sfix_trunc_v2df): Ditto.
+
+	* doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]:
+	Document.
+	[VEC_UNPACK_FLOAT_LO_EXPR]: Ditto.
+	[VEC_PACK_FIX_TRUNC_EXPR]: Ditto.
+	* doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document.
+	[vec_pack_ufix_trunc]: Ditto.
+	[vec_unpacks_float_hi]: Ditto.
+	[vec_unpacks_float_lo]: Ditto.
+	[vec_unpacku_float_hi]: Ditto.
+	[vec_unpacku_float_lo]: Ditto.
+
 2007-05-16  Uros Bizjak  <ubizjak@gmail.com>
 
 	* soft-fp/README: Update for new files.
@@ -46,14 +141,15 @@
 
 2007-05-16  Paolo Bonzini  <bonzini@gnu.org>
 
-        * config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
-        calls as pure.
+	* config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
+	calls as pure.
 
 2007-05-16  Eric Christopher  <echristo@apple.com>
 
 	* config/rs6000/rs6000.c (rs6000_emit_prologue): Move altivec register
-        saving after stack push. Set sp_offset whenever we push.
-        (rs6000_emit_epilogue): Move altivec register restore before stack push.
+	saving after stack push. Set sp_offset whenever we push.
+	(rs6000_emit_epilogue): Move altivec register restore before
+	stack push.
 
 2007-05-16  Richard Sandiford  <richard@codesourcery.com>
 
@@ -496,7 +592,7 @@
 	dumps.
 
 2007-05-08  Sandra Loosemore  <sandra@codesourcery.com>
-            Nigel Stephens  <nigel@mips.com>
+	    Nigel Stephens  <nigel@mips.com>
 
 	* config/mips/mips.h (MAX_FPRS_PER_FMT): Renamed from FP_INC.
 	Update comments and all uses.
@@ -563,7 +659,7 @@
 	* configure: Regenerate.
 	* config.in: Regenerate.
 
-2007-05-07   Naveen.H.S  <naveen.hs@kpitcummins.com>
+2007-05-07  Naveen.H.S  <naveen.hs@kpitcummins.com>
 
 	* config/m32c/muldiv.md (mulhisi3_c): Limit the mode of the 2nd
 	operand to HI mode.
@@ -1062,7 +1158,7 @@
 	PR middle-end/22156
 	Temporarily revert:
 	2007-04-06  Andreas Tobler  <a.tobler@schweiz.org>
-        * tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
+	* tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
 	2007-04-05  Alexandre Oliva  <aoliva@redhat.com>
 	* tree-sra.c (try_instantiate_multiple_fields): Needlessly
 	initialize align to silence bogus warning.
@@ -1274,17 +1370,17 @@
 	PR tree-optimization/30965
 	PR tree-optimization/30978
 	* Makefile.in (tree-ssa-forwprop.o): Depend on $(FLAGS_H).
-        * tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
-        (find_equivalent_equality_comparison): Likewise.
-        (simplify_cond): Likewise.
-        (get_prop_source_stmt): New helper.
-        (get_prop_dest_stmt): Likewise.
+	* tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
+	(find_equivalent_equality_comparison): Likewise.
+	(simplify_cond): Likewise.
+	(get_prop_source_stmt): New helper.
+	(get_prop_dest_stmt): Likewise.
 	(can_propagate_from): Likewise.
 	(remove_prop_source_from_use): Likewise.
-        (combine_cond_expr_cond): Likewise.
-        (forward_propagate_comparison): New function.
-        (forward_propagate_into_cond): Rewrite to use fold for
-        tree combining.
+	(combine_cond_expr_cond): Likewise.
+	(forward_propagate_comparison): New function.
+	(forward_propagate_into_cond): Rewrite to use fold for
+	tree combining.
 	(tree_ssa_forward_propagate_single_use_vars): Call
 	forward_propagate_comparison to propagate comparisons.
 


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]