This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A recent patch increased GCC's memory consumption in some cases!


> 
> comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
>   Ovarall memory allocated via mmap and sbrk decreased from 365202k to 192018k, overall -90.19%
>   Peak amount of GGC memory allocated before garbage collecting increased from 207227k to 302461k, overall 45.96%
>   Peak amount of GGC memory still allocated after garbage collecting decreased from 192563k to 178766k, overall -7.72%
>   Amount of produced GGC garbage increased from 355277k to 586684k, overall 65.13%
>   Amount of memory still referenced at the end of compilation decreased from 30387k to 27902k, overall -8.91%

This result is somewhat bogus, the memory consumption indeed peaks after
360MB.  Richard, is it possible for your script to miss the real peak
somehow? See  http://www.suse.de/~aj/SPEC/amd64/memory/pr28071-O2.rep
All the 240MB of memory are produced by schedule2 pass and we GGC it
immediately, so I guess the peak happens just for a short time.

Anyway, the savings are in general real - we manage to optimize the
function before inlining better and produce a lot better resulting
assembly.  Originally dominator optimization reorganized stuff in so
weird way it wasn't possible to cleanup (I think it is mentioned in the
PR trail).

Honza
>     Overall memory needed: 365202k -> 192018k
>     Peak memory use before GGC: 207227k -> 302461k
>     Peak memory use after GGC: 192563k -> 178766k
>     Maximum of released memory in single GGC run: 140362k -> 241049k
>     Garbage: 355277k -> 586684k
>     Leak: 30387k -> 27902k
>     Overhead: 47185k -> 95313k
>     GGC runs: 98 -> 83
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
>   Overall memory allocated via mmap and sbrk increased from 572546k to 586858k, overall 2.50%
>   Peak amount of GGC memory allocated before garbage collecting increased from 282245k to 283571k, overall 0.47%
>   Peak amount of GGC memory still allocated after garbage collectin increased from 273215k to 276589k, overall 1.23%
>   Amount of produced GGC garbage increased from 448504k to 451307k, overall 0.62%
>   Amount of memory still referenced at the end of compilation increased from 45440k to 48594k, overall 6.94%
>     Overall memory needed: 572546k -> 586858k
>     Peak memory use before GGC: 282245k -> 283571k
>     Peak memory use after GGC: 273215k -> 276589k
>     Maximum of released memory in single GGC run: 138326k -> 138367k
>     Garbage: 448504k -> 451307k
>     Leak: 45440k -> 48594k
>     Overhead: 56089k -> 56724k
>     GGC runs: 97 -> 72
> 
> Head of the ChangeLog is:
> 
> --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2007-01-16 20:23:49.000000000 +0000
> +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2007-01-18 04:56:20.000000000 +0000
> @@ -1,3 +1,135 @@
> +2007-01-18  Ben Elliston  <bje@au.ibm.com>
> +
> +	* genautomata.c (write_automata): Include xstrerror output in the
> +	error message if writing the DFA description file fails.
> +
> +2007-01-17  H.J. Lu  <hongjiu.lu@intel.com>
> +
> +	* config/mips/mips-protos.h (mips_output_external): Make it
> +	return void.
> +	* config/mips/iris.h (TARGET_ASM_EXTERNAL_LIBCALL): Removed.
> +	* config/mips/mips.c (irix_output_external_libcall): Likewise.
> +	(extern_list): Likewise.
> +	(extern_head): Likewise.
> +	(TARGET_ASM_FILE_END): Likewise.
> +	(mips_file_end): Likewise.
> +	(mips_output_external): Rewritten.
> +
> +2007-01-18  Ben Elliston  <bje@au.ibm.com>
> +
> +	* genpreds.c (write_insn_preds_c): Only write out the function
> +	body for regclass_for_constraint if we have register constraints.
> +
> +2007-01-17  Tom Tromey  <tromey@redhat.com>
> +
> +	* doc/sourcebuild.texi (libgcj Tests): Use sourceware.org.
> +	* doc/install.texi (Testing): Use sourceware.org.
> +	(Binaries): Likewise.
> +	(Specific): Likewise.
> +	* doc/contrib.texi (Contributors): Use sourceware.org.
> +
> +2007-01-17  Anatoly Sokolov <aesok@post.ru>
> +
> +	* config/avr/avr.h (AVR_HAVE_LPMX): New macro.
> +	(AVR_ENHANCED): Rename to ...
> +	(AVR_HAVE_MUL): ... new.
> +	(avr_enhanced_p): Rename to ...
> +	(avr_have_mul_p): ... new.
> +	(TARGET_CPU_CPP_BUILTINS): Use 'avr_have_mul_p' instead of 
> +	'avr_enhanced_p' for "__AVR_ENHANCED__". Define "__AVR_HAVE_MUL__".
> +	* config/avr/avr.c (avr_enhanced_p): Rename to ...
> +	(avr_have_mul_p): ... new.
> +	(base_arch_s): Rename 'enhanced' to 'have_mul'.
> +	(avr_override_options): Use 'avr_have_mul_p' and 'have_mul' instead of
> +	'avr_enhanced_p' and 'enhanced'.
> +	(ashlhi3_out, ashrhi3_out, lshrhi3_out, avr_rtx_costs): Use 
> +	AVR_HAVE_MUL instead of AVR_ENHANCED.
> +	* avr.md (*tablejump_enh): Use AVR_HAVE_LPMX instead of AVR_ENHANCED.
> +	(mulqi3, *mulqi3_enh, *mulqi3_call, mulqihi3, umulqihi3, mulhi3, 
> +	*mulhi3_enh, *mulhi3_call, mulsi3, *mulsi3_call): Use AVR_HAVE_MUL 
> +	instead of AVR_ENHANCED.
> +	(*tablejump_enh): Use AVR_HAVE_LPMX instead of AVR_ENHANCED.
> +	* libgcc.S: Use __AVR_HAVE_MUL__ instead of __AVR_ENHANCED__.
> +	(__tablejump__): Use __AVR_HAVE_LPMX__ instead of __AVR_ENHANCED__.
> +
> +2007-01-17  Ian Lance Taylor  <iant@google.com>
> +
> +	* vec.h (VEC_reserve_exact): Define.
> +	(vec_gc_p_reserve_exact): Declare.
> +	(vec_gc_o_reserve_exact): Declare.
> +	(vec_heap_p_reserve_exact): Declare.
> +	(vec_heap_o_reserve_exact): Declare.
> +	(VEC_OP (T,A,reserve_exact)): New static inline function, three
> +	versions.
> +	(VEC_OP (T,A,reserve)) [all versions]: Remove handling of
> +	negative parameter.
> +	(VEC_OP (T,A,alloc)) [all versions]: Call ...reserve_exact.
> +	(VEC_OP (T,A,copy)) [all versions]: Likewise.
> +	(VEC_OP (T,a,safe_grow)) [all versions]: Likewise.
> +	* vec.c (calculate_allocation): Add exact parameter.  Change all
> +	callers.
> +	(vec_gc_o_reserve_1): New static function, from vec_gc_o_reserve.
> +	(vec_gc_p_reserve, vec_gc_o_reserve): Call vec_gc_o_reserve_1.
> +	(vec_gc_p_reserve_exact, vec_gc_o_reserve_exact): New functions.
> +	(vec_heap_o_reserve_1): New static function, from vec_heap_o_reserve.
> +	(vec_heap_p_reserve, vec_heap_o_reserve): Call vec_heap_o_reserve_1.
> +	(vec_heap_p_reserve_exact): New function.
> +	(vec_heap_o_reserve_exact): New function.
> +
> +2007-01-17  Jan Hubicka  <jh@suse.cz>
> +
> +	* ipa-type-escape.c (look_for_casts): Revamp using handled_component_p.
> +
> +2007-01-17  Eric Christopher  <echristo@apple.com>
> +
> +	* config.gcc: Support core2 processor.
> +
> +2007-01-16  Jan Hubicka  <jh@suse.cz>
> +
> +	* tree-ssanames.c (release_dead_ssa_names): Instead of ggc_freeing
> +	the names, just unlink the chain so we don't crash on dangling pointers
> +	to dead SSA names.
> +
> +2007-01-16  Jan Hubicka  <jh@suse.cz>
> +
> +	* cgraph.h (cgraph_decide_inlining_incrementally): Kill.
> +	* tree-pass.h: Reorder to make IPA passes appear toegher.
> +	(pass_early_inline, pass_inline_parameters, pass_apply_inline): Declare.
> +	* cgraphunit.c (cgraph_finalize_function): Do not compute inling
> +	parameters, do not call early inliner.
> +	* ipa-inline.c: Update comments.  Include tree-flow.h
> +	(cgraph_decide_inlining): Do not compute inlining parameters.
> +	(cgraph_decide_inlining_incrementally): Return TODOs; assume to
> +	be called with function context set up.
> +	(pass_ipa_inline): Remove unreachable functions before pass.
> +	(cgraph_early_inlining): Simplify assuming to be called from the
> +	PM as local pass.
> +	(pass_early_inline): New pass.
> +	(cgraph_gate_ipa_early_inlining): New gate.
> +	(pass_ipa_early_inline): Turn into simple wrapper.
> +	(compute_inline_parameters): New function.
> +	(gate_inline_passes): New gate.
> +	(pass_inline_parameters): New pass.
> +	(apply_inline): Move here from tree-optimize.c
> +	(pass_apply_inline): New pass.
> +	* ipa.c (cgraph_remove_unreachable_nodes): Verify cgraph after
> +	transforming.
> +	* tree-inline.c (optimize_inline_calls): Return TODOs rather than
> +	doing them by hand.
> +	(tree_function_versioning): Do not allocate dummy struct function.
> +	* tree-inline.h (optimize_inline_calls): Update prototype.
> +	* tree-optimize.c (execute_fixup_cfg): Export.
> +	(pass_fixup_cfg): Remove
> +	(tree_rest_of_compilation): Do not apply inlines.
> +	* tree-flow.h (execute_fixup_cfg): Declare.
> +	* Makefile.in (gt-passes.c): New.
> +	* passes.c: Include gt-passes.h
> +	(init_optimization_passes): New passes.
> +	(nnodes, order): New static vars.
> +	(do_per_function_toporder): New function.
> +	(execute_one_pass): Dump current pass here.
> +	(execute_ipa_pass_list): Don't dump current pass here.
> +
>  2007-01-16  Janis Johnson  <janis187@us.ibm.com>
>  
>  	* config/dfp-bit.c (dfp_compare_op): Return separate value for NaN.
> --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog.cp	2007-01-12 08:03:04.000000000 +0000
> +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/cp/ChangeLog	2007-01-18 04:56:19.000000000 +0000
> @@ -1,3 +1,8 @@
> +2007-01-17  Ian Lance Taylor  <iant@google.com>
> +
> +	* class.c (add_method): Call VEC_reserve_exact rather than passing
> +	a negative size to VEC_reserve.
> +
>  2007-01-11  Simon Martin  <simartin@users.sourceforge.net>
>  
>  	PR c++/29573
> 
> 
> The results can be reproduced by building a compiler with
> 
> --enable-gather-detailed-mem-stats targetting x86-64
> 
> and compiling preprocessed combine.c or testcase from PR8632 with:
> 
> -fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
> 
> The memory consumption summary appears in the dump after detailed listing
> of the places they are allocated in.  Peak memory consumption is actually
> computed by looking for maximal value in {GC XXXX -> YYYY} report.
> 
> Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]