This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A recent patch increased GCC's memory consumption!


Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing combine.c compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 9293k to 9304k, overall 0.12%
  Peak amount of GGC memory still allocated after garbage collectin increased from 8832k to 8843k, overall 0.12%
  Amount of memory still referenced at the end of compilation increased from 6441k to 6454k, overall 0.21%
    Overall memory needed: 28374k -> 28410k
    Peak memory use before GGC: 9293k -> 9304k
    Peak memory use after GGC: 8832k -> 8843k
    Maximum of released memory in single GGC run: 2666k
    Garbage: 36856k -> 36845k
    Leak: 6441k -> 6454k
    Overhead: 4860k -> 4862k
    GGC runs: 280

comparing combine.c compilation at -O1 level:
  Amount of memory still referenced at the end of compilation increased from 6495k to 6517k, overall 0.33%
    Overall memory needed: 40218k -> 40254k
    Peak memory use before GGC: 17281k -> 17292k
    Peak memory use after GGC: 17106k -> 17117k
    Maximum of released memory in single GGC run: 2363k
    Garbage: 57582k -> 57577k
    Leak: 6495k -> 6517k
    Overhead: 6224k -> 6226k
    GGC runs: 355

comparing combine.c compilation at -O2 level:
    Overall memory needed: 29790k -> 29802k
    Peak memory use before GGC: 17277k -> 17288k
    Peak memory use after GGC: 17106k -> 17117k
    Maximum of released memory in single GGC run: 2803k
    Garbage: 74892k -> 74893k
    Leak: 6603k -> 6609k
    Overhead: 8470k -> 8473k
    GGC runs: 413

comparing combine.c compilation at -O3 level:
  Amount of memory still referenced at the end of compilation increased from 6668k to 6681k, overall 0.20%
    Overall memory needed: 28894k -> 28902k
    Peak memory use before GGC: 18218k -> 18229k
    Peak memory use after GGC: 17834k -> 17845k
    Maximum of released memory in single GGC run: 4104k
    Garbage: 104223k -> 104223k
    Leak: 6668k -> 6681k
    Overhead: 11907k -> 11909k
    GGC runs: 462

comparing insn-attrtab.c compilation at -O0 level:
  Amount of memory still referenced at the end of compilation increased from 9501k to 9514k, overall 0.14%
    Overall memory needed: 88230k -> 88242k
    Peak memory use before GGC: 69777k -> 69788k
    Peak memory use after GGC: 44187k -> 44198k
    Maximum of released memory in single GGC run: 36963k
    Garbage: 129065k -> 129062k
    Leak: 9501k -> 9514k
    Overhead: 16993k -> 16996k
    GGC runs: 216

comparing insn-attrtab.c compilation at -O1 level:
  Amount of memory still referenced at the end of compilation increased from 9343k to 9357k, overall 0.14%
    Overall memory needed: 113902k -> 114174k
    Peak memory use before GGC: 90363k -> 90374k
    Peak memory use after GGC: 83725k -> 83736k
    Maximum of released memory in single GGC run: 31852k
    Garbage: 277769k -> 277769k
    Leak: 9343k -> 9357k
    Overhead: 29778k -> 29780k
    GGC runs: 223 -> 222

comparing insn-attrtab.c compilation at -O2 level:
  Amount of memory still referenced at the end of compilation increased from 9345k to 9359k, overall 0.14%
    Overall memory needed: 120390k -> 120406k
    Peak memory use before GGC: 92593k -> 92604k
    Peak memory use after GGC: 84705k -> 84716k
    Maximum of released memory in single GGC run: 30394k
    Garbage: 317192k -> 317192k
    Leak: 9345k -> 9359k
    Overhead: 36353k -> 36356k
    GGC runs: 246 -> 245

comparing insn-attrtab.c compilation at -O3 level:
  Amount of memory still referenced at the end of compilation increased from 9348k to 9362k, overall 0.14%
    Overall memory needed: 134218k -> 134226k
    Peak memory use before GGC: 92618k -> 92629k
    Peak memory use after GGC: 84731k -> 84742k
    Maximum of released memory in single GGC run: 30584k
    Garbage: 317844k -> 317844k
    Leak: 9348k -> 9362k
    Overhead: 36551k -> 36554k
    GGC runs: 250 -> 249

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 119538k -> 119550k
    Peak memory use before GGC: 92680k -> 92691k
    Peak memory use after GGC: 91760k -> 91771k
    Maximum of released memory in single GGC run: 19314k
    Garbage: 205600k -> 205599k
    Leak: 47677k -> 47691k
    Overhead: 20817k -> 20819k
    GGC runs: 402 -> 401

comparing Gerald's testcase PR8361 compilation at -O1 level:
    Overall memory needed: 119278k -> 119302k
    Peak memory use before GGC: 97848k -> 97860k
    Peak memory use after GGC: 95638k -> 95650k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 444206k -> 444236k
    Leak: 50011k -> 50025k
    Overhead: 32784k -> 32786k
    GGC runs: 552 -> 551

comparing Gerald's testcase PR8361 compilation at -O2 level:
    Overall memory needed: 119286k -> 119294k
    Peak memory use before GGC: 97848k -> 97860k
    Peak memory use after GGC: 95638k -> 95650k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 503803k -> 503803k
    Leak: 50716k -> 50729k
    Overhead: 40089k -> 40091k
    GGC runs: 608 -> 607

comparing Gerald's testcase PR8361 compilation at -O3 level:
    Overall memory needed: 118930k -> 118926k
    Peak memory use before GGC: 97894k -> 97906k
    Peak memory use after GGC: 96924k -> 96936k
    Maximum of released memory in single GGC run: 18847k
    Garbage: 523450k -> 523447k
    Leak: 50291k -> 50304k
    Overhead: 40598k -> 40603k
    GGC runs: 621 -> 620

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
  Amount of memory still referenced at the end of compilation increased from 7522k to 7536k, overall 0.18%
    Overall memory needed: 137946k -> 137786k
    Peak memory use before GGC: 81898k -> 81909k
    Peak memory use after GGC: 58777k -> 58788k
    Maximum of released memory in single GGC run: 45493k
    Garbage: 147195k -> 147243k
    Leak: 7522k -> 7536k
    Overhead: 25300k -> 25302k
    GGC runs: 83 -> 82

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 423030k -> 423186k
    Peak memory use before GGC: 205260k -> 205271k
    Peak memory use after GGC: 201036k -> 201047k
    Maximum of released memory in single GGC run: 101714k
    Garbage: 271706k -> 271706k
    Leak: 47588k -> 47601k
    Overhead: 30829k -> 30831k
    GGC runs: 101

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 352334k -> 352070k
    Peak memory use before GGC: 206001k -> 206012k
    Peak memory use after GGC: 201777k -> 201788k
    Maximum of released memory in single GGC run: 108617k
    Garbage: 351905k -> 351905k
    Leak: 48171k -> 48184k
    Overhead: 46573k -> 46575k
    GGC runs: 110

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 781042k -> 781150k
    Peak memory use before GGC: 314916k -> 314927k
    Peak memory use after GGC: 293259k -> 293270k
    Maximum of released memory in single GGC run: 165331k
    Garbage: 494299k -> 494299k
    Leak: 65503k -> 65517k
    Overhead: 59714k -> 59716k
    GGC runs: 98

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2006-11-08 01:38:17.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2006-11-08 11:39:57.000000000 +0000
@@ -1,3 +1,185 @@
+2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p): Take 
+	enum argument instead of bool.
+	(vect_analyze_operations): Call vectorizable_type_promotion.
+	* tree-vectorizer.h (type_promotion_vec_info_type): New enum
+	stmt_vec_info_type value.
+	(supportable_widening_operation, vectorizable_type_promotion): New
+	function declarations.
+	* tree-vect-transform.c (vect_gen_widened_results_half): New function.
+	(vectorizable_type_promotion): New function.
+	(vect_transform_stmt): Call vectorizable_type_promotion.
+	* tree-vect-analyze.c (supportable_widening_operation): New function.
+	* tree-vect-patterns.c (vect_recog_dot_prod_pattern): 
+	Add implementation.
+	* tree-vect-generic.c (expand_vector_operations_1): Consider correct
+	mode.
+	
+	* tree.def (VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR):
+	(VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR): New tree-codes.
+	* tree-inline.c (estimate_num_insns_1): Add cases for above new 
+	tree-codes.
+	* tree-pretty-print.c (dump_generic_node, op_prio): Likewise.
+	* expr.c (expand_expr_real_1): Likewise.
+	* optabs.c (optab_for_tree_code): Likewise.
+	(init_optabs): Initialize new optabs.
+	* genopinit.c (vec_widen_umult_hi_optab, vec_widen_smult_hi_optab,
+	vec_widen_smult_hi_optab, vec_widen_smult_lo_optab,
+	vec_unpacks_hi_optab, vec_unpacks_lo_optab, vec_unpacku_hi_optab,
+	vec_unpacku_lo_optab): Initialize new optabs.
+	* optabs.h (OTI_vec_widen_umult_hi, OTI_vec_widen_umult_lo):
+	(OTI_vec_widen_smult_h, OTI_vec_widen_smult_lo, OTI_vec_unpacks_hi,
+	OTI_vec_unpacks_lo, OTI_vec_unpacku_hi, OTI_vec_unpacku_lo): New 
+	optab indices.
+	(vec_widen_umult_hi_optab, vec_widen_umult_lo_optab):
+	(vec_widen_smult_hi_optab, vec_widen_smult_lo_optab):
+	(vec_unpacks_hi_optab, vec_unpacku_hi_optab, vec_unpacks_lo_optab):
+	(vec_unpacku_lo_optab): New optabs.
+	* doc/md.texi (vec_unpacks_hi, vec_unpacks_lo, vec_unpacku_hi): 
+	(vec_unpacku_lo, vec_widen_umult_hi, vec_widen_umult_lo): 
+	(vec_widen_smult_hi, vec_widen_smult_lo): New.
+	* doc/c-tree.texi (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR):
+	(VEC_WIDEN_MULT_HI_EXPR, VEC_WIDEN_MULT_LO_EXPR, VEC_UNPACK_HI_EXPR):
+	(VEC_UNPACK_LO_EXPR, VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New.
+	 
+	* config/rs6000/altivec.md (UNSPEC_VMULWHUB, UNSPEC_VMULWLUB):
+	(UNSPEC_VMULWHSB, UNSPEC_VMULWLSB, UNSPEC_VMULWHUH, UNSPEC_VMULWLUH):
+	(UNSPEC_VMULWHSH, UNSPEC_VMULWLSH): New.
+	(UNSPEC_VPERMSI, UNSPEC_VPERMHI): New.
+	(vec_vperm_v8hiv4si, vec_vperm_v16qiv8hi): New patterns used to
+	implement the unsigned unpacking patterns.
+	(vec_unpacks_hi_v16qi, vec_unpacks_hi_v8hi, vec_unpacks_lo_v16qi):
+	(vec_unpacks_lo_v8hi): New signed unpacking patterns.
+	(vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi):
+	(vec_unpacku_lo_v8hi): New unsigned unpacking patterns.
+	(vec_widen_umult_hi_v16qi, vec_widen_umult_lo_v16qi):
+	(vec_widen_smult_hi_v16qi, vec_widen_smult_lo_v16qi): 
+	(vec_widen_umult_hi_v8hi, vec_widen_umult_lo_v8hi):
+	(vec_widen_smult_hi_v8hi, vec_widen_smult_lo_v8hi): New widening
+	multiplication patterns.
+
+	* target.h (builtin_mul_widen_even, builtin_mul_widen_odd): New.
+	* target-def.h (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN):
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New.
+	* config/rs6000/rs6000.c (rs6000_builtin_mul_widen_even): New.
+	(rs6000_builtin_mul_widen_odd): New.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Defined.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Defined.
+	* tree-vectorizer.h (enum vect_relevant): New enum type.
+	(_stmt_vec_info): Field relevant chaned from bool to enum
+	vect_relevant.
+	(STMT_VINFO_RELEVANT_P): Updated.
+	(STMT_VINFO_RELEVANT): New.
+	* tree-vectorizer.c (new_stmt_vec_info): Use STMT_VINFO_RELEVANT
+	instead of STMT_VINFO_RELEVANT_P.
+	* tree-vect-analyze.c (vect_mark_relevant, vect_stmt_relevant_p):
+	Replace calls to STMT_VINFO_RELEVANT_P with STMT_VINFO_RELEVANT,
+	and boolean variable with enum vect_relevant.
+	(vect_mark_stmts_to_be_vectorized): Likewise + update documentation.
+	* doc/tm.texi (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): New.
+	(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): New.
+
+	2006-11-08  Richard Henderson  <rth@redhat.com>
+
+	* config/i386/sse.md (vec_widen_umult_hi_v8hi,
+	vec_widen_umult_lo_v8hi): New.
+	(vec_widen_smult_hi_v4si, vec_widen_smult_lo_v4si,
+	vec_widen_umult_hi_v4si, vec_widen_umult_lo_v4si): New.
+
+	* config/i386/i386.c (ix86_expand_sse_unpack): New. 
+	* config/i386/i386-protos.h (ix86_expand_sse_unpack): New. 
+	* config/i386/sse.md (vec_unpacku_hi_v16qi, vec_unpacks_hi_v16qi,
+	vec_unpacku_lo_v16qi, vec_unpacks_lo_v16qi, vec_unpacku_hi_v8hi,
+	vec_unpacks_hi_v8hi, vec_unpacku_lo_v8hi, vec_unpacks_lo_v8hi,
+	vec_unpacku_hi_v4si, vec_unpacks_hi_v4si, vec_unpacku_lo_v4si,
+	vec_unpacks_lo_v4si): New.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_type_demotion): New function.
+	(vect_transform_stmt): Add case for type_demotion_vec_info_type.
+	(vect_analyze_operations): Call vectorizable_type_demotion.
+	* tree-vectorizer.h (type_demotion_vec_info_type): New enum 
+	stmt_vec_info_type value.
+	(vectorizable_type_demotion): New function declaration.
+	* tree-vect-generic.c (expand_vector_operations_1): Consider correct
+	mode.
+
+	* tree.def (VEC_PACK_MOD_EXPR, VEC_PACK_SAT_EXPR): New tree-codes.
+	* expr.c (expand_expr_real_1): Add case for VEC_PACK_MOD_EXPR and
+	VEC_PACK_SAT_EXPR.
+	* tree-iniline.c (estimate_num_insns_1): Likewise.
+	* tree-pretty-print.c (dump_generic_node, op_prio): Likewise.
+	* optabs.c (optab_for_tree_code): Likewise.
+
+	* optabs.c (expand_binop): In case of vec_pack_*_optabs the mode 
+	compared against the predicate of the result is not 'mode' (the input 
+	to the function) but a mode with half the size of 'mode'.
+	(init_optab): Initialize new optabs.
+	* optabs.h (OTI_vec_pack_mod, OTI_vec_pack_ssat, OTI_vec_pack_usat):
+	New optab indices.
+	(vec_pack_mod_optab, vec_pack_ssat_optab,  vec_pack_usat_optab): New
+	optabs.
+	* genopinit.c (vec_pack_mod_optab, vec_pack_ssat_optab):
+	(vec_pack_usat_optab): Initialize new optabs.
+	* doc/md.texi (vec_pack_mod, vec_pack_ssat, vec_pack_usat): New.
+	* config/rs6000/altivec.md (vec_pack_mod_v8hi, vec_pack_mod_v4si): New.
+
+	2006-11-08  Richard Henderson  <rth@redehat.com>
+
+	* config/i386/sse.md (vec_pack_mod_v8hi, vec_pack_mod_v4si):
+	(vec_pack_mod_v2di, vec_interleave_highv16qi, vec_interleave_lowv16qi):
+	(vec_interleave_highv8hi, vec_interleave_lowv8hi):
+	(vec_interleave_highv4si, vec_interleave_lowv4si):
+	(vec_interleave_highv2di, vec_interleave_lowv2di): New.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_reduction): Support multiple 
+	datatypes.
+	(vect_transform_stmt): Removed redundant code.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vectorizable_operation): Support multiple 
+	datatypes.
+
+	2006-11-08  Dorit Nuzman  <dorit@il.ibm.com>
+
+	* tree-vect-transform.c (vect_align_data_ref): Removed.
+	(vect_create_data_ref_ptr): Added additional argument - ptr_incr. 
+	Updated function documentation. Return the increment stmt in ptr_incr.
+	(bump_vector_ptr): New function.
+	(vect_get_vec_def_for_stmt_copy): New function.
+	(vect_finish_stmt_generation): Create a stmt_info to newly created
+	vector stmts.
+	(vect_setup_realignment): Call vect_create_data_ref_ptr with additional
+	argument.
+	(vectorizable_reduction, vectorizable_assignment): Not supported yet if
+	VF is greater than the number of elements that can fit in one vector
+	word.
+	(vectorizable_operation, vectorizable_condition): Likewise.
+	(vectorizable_store, vectorizable_load): Support the case that the VF
+	is greater than the number of elements that can fit in one vector word.
+	(vect_transform_loop): Don't fail in case of multiple data-types.
+	* tree-vect-analyze.c (vect_determine_vectorization_factor): Don't fail 
+	in case of multiple data-types; the smallest type determines the VF.
+	(vect_analyze_data_ref_dependence): Don't record datarefs as same_align
+	if they are of different sizes.
+	(vect_update_misalignment_for_peel): Compare misalignments in terms of
+	number of elements rather than number of bytes.
+	(vect_enhance_data_refs_alignment): Fix/Add dump printouts.
+	(vect_can_advance_ivs_p): Fix a dump printout
+
+2006-11-07  Eric Christopher  <echristo@apple.com>
+
+	* libgcc2.c (__bswapdi2): Rename from bswapDI2.
+	(__bswapsi2): Ditto.
+	* libgcc2.h: Remove transformation of bswap routines.
+	* config/i386/i386.md (bswapsi2): New.
+	(bswapdi2): Ditto.
+
 2006-11-07  Jakub Jelinek  <jakub@redhat.com>
 
 	* c-common.c (c_common_attributes): Add gnu_inline attribyte.
@@ -14,7 +196,7 @@
 
 2006-11-06  Anatoly Sokolov <aesok@post.ru>
 
-	* config/avr/avr-protos.h (mask_one_bit_p, const_int_pow2_p): Remove 
+	* config/avr/avr-protos.h (mask_one_bit_p, const_int_pow2_p): Remove
 	prototype.
 	* config/avr/avr.c (mask_one_bit_p, const_int_pow2_p): Remove.
 	(output_movhi, ashlhi3_out, ashlsi3_out, ashrhi3_out, ashrsi3_out,
@@ -50,9 +232,9 @@
 
 	* gcc.c (process_command): Treat -b as normal switch if its argument
 	has no dash.
-	
+
 2006-11-07  David Ung  <davidu@mips.com>
-	
+
 	* config/mips/mips.h (ISA_HAS_PREFETCHX): Add ISA_MIPS32R2 to the
 	list.
 


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]