This is the mail archive of the
gcc-regression@gcc.gnu.org
mailing list for the GCC project.
Re: A recent patch increased GCC's memory consumption in some cases!
- From: Jan Hubicka <jh at suse dot cz>
- To: gcctest at suse dot de, rguenther at suse dot de
- Cc: jh at suse dot cz, gcc-regression at gcc dot gnu dot org
- Date: Thu, 18 Jan 2007 10:02:52 +0100
- Subject: Re: A recent patch increased GCC's memory consumption in some cases!
- References: <45AF1ABE.mail33C1LFAOG@suse.de>
>
> comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
> Ovarall memory allocated via mmap and sbrk decreased from 365202k to 192018k, overall -90.19%
> Peak amount of GGC memory allocated before garbage collecting increased from 207227k to 302461k, overall 45.96%
> Peak amount of GGC memory still allocated after garbage collecting decreased from 192563k to 178766k, overall -7.72%
> Amount of produced GGC garbage increased from 355277k to 586684k, overall 65.13%
> Amount of memory still referenced at the end of compilation decreased from 30387k to 27902k, overall -8.91%
This result is somewhat bogus, the memory consumption indeed peaks after
360MB. Richard, is it possible for your script to miss the real peak
somehow? See http://www.suse.de/~aj/SPEC/amd64/memory/pr28071-O2.rep
All the 240MB of memory are produced by schedule2 pass and we GGC it
immediately, so I guess the peak happens just for a short time.
Anyway, the savings are in general real - we manage to optimize the
function before inlining better and produce a lot better resulting
assembly. Originally dominator optimization reorganized stuff in so
weird way it wasn't possible to cleanup (I think it is mentioned in the
PR trail).
Honza
> Overall memory needed: 365202k -> 192018k
> Peak memory use before GGC: 207227k -> 302461k
> Peak memory use after GGC: 192563k -> 178766k
> Maximum of released memory in single GGC run: 140362k -> 241049k
> Garbage: 355277k -> 586684k
> Leak: 30387k -> 27902k
> Overhead: 47185k -> 95313k
> GGC runs: 98 -> 83
>
> comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
> Overall memory allocated via mmap and sbrk increased from 572546k to 586858k, overall 2.50%
> Peak amount of GGC memory allocated before garbage collecting increased from 282245k to 283571k, overall 0.47%
> Peak amount of GGC memory still allocated after garbage collectin increased from 273215k to 276589k, overall 1.23%
> Amount of produced GGC garbage increased from 448504k to 451307k, overall 0.62%
> Amount of memory still referenced at the end of compilation increased from 45440k to 48594k, overall 6.94%
> Overall memory needed: 572546k -> 586858k
> Peak memory use before GGC: 282245k -> 283571k
> Peak memory use after GGC: 273215k -> 276589k
> Maximum of released memory in single GGC run: 138326k -> 138367k
> Garbage: 448504k -> 451307k
> Leak: 45440k -> 48594k
> Overhead: 56089k -> 56724k
> GGC runs: 97 -> 72
>
> Head of the ChangeLog is:
>
> --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 2007-01-16 20:23:49.000000000 +0000
> +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 2007-01-18 04:56:20.000000000 +0000
> @@ -1,3 +1,135 @@
> +2007-01-18 Ben Elliston <bje@au.ibm.com>
> +
> + * genautomata.c (write_automata): Include xstrerror output in the
> + error message if writing the DFA description file fails.
> +
> +2007-01-17 H.J. Lu <hongjiu.lu@intel.com>
> +
> + * config/mips/mips-protos.h (mips_output_external): Make it
> + return void.
> + * config/mips/iris.h (TARGET_ASM_EXTERNAL_LIBCALL): Removed.
> + * config/mips/mips.c (irix_output_external_libcall): Likewise.
> + (extern_list): Likewise.
> + (extern_head): Likewise.
> + (TARGET_ASM_FILE_END): Likewise.
> + (mips_file_end): Likewise.
> + (mips_output_external): Rewritten.
> +
> +2007-01-18 Ben Elliston <bje@au.ibm.com>
> +
> + * genpreds.c (write_insn_preds_c): Only write out the function
> + body for regclass_for_constraint if we have register constraints.
> +
> +2007-01-17 Tom Tromey <tromey@redhat.com>
> +
> + * doc/sourcebuild.texi (libgcj Tests): Use sourceware.org.
> + * doc/install.texi (Testing): Use sourceware.org.
> + (Binaries): Likewise.
> + (Specific): Likewise.
> + * doc/contrib.texi (Contributors): Use sourceware.org.
> +
> +2007-01-17 Anatoly Sokolov <aesok@post.ru>
> +
> + * config/avr/avr.h (AVR_HAVE_LPMX): New macro.
> + (AVR_ENHANCED): Rename to ...
> + (AVR_HAVE_MUL): ... new.
> + (avr_enhanced_p): Rename to ...
> + (avr_have_mul_p): ... new.
> + (TARGET_CPU_CPP_BUILTINS): Use 'avr_have_mul_p' instead of
> + 'avr_enhanced_p' for "__AVR_ENHANCED__". Define "__AVR_HAVE_MUL__".
> + * config/avr/avr.c (avr_enhanced_p): Rename to ...
> + (avr_have_mul_p): ... new.
> + (base_arch_s): Rename 'enhanced' to 'have_mul'.
> + (avr_override_options): Use 'avr_have_mul_p' and 'have_mul' instead of
> + 'avr_enhanced_p' and 'enhanced'.
> + (ashlhi3_out, ashrhi3_out, lshrhi3_out, avr_rtx_costs): Use
> + AVR_HAVE_MUL instead of AVR_ENHANCED.
> + * avr.md (*tablejump_enh): Use AVR_HAVE_LPMX instead of AVR_ENHANCED.
> + (mulqi3, *mulqi3_enh, *mulqi3_call, mulqihi3, umulqihi3, mulhi3,
> + *mulhi3_enh, *mulhi3_call, mulsi3, *mulsi3_call): Use AVR_HAVE_MUL
> + instead of AVR_ENHANCED.
> + (*tablejump_enh): Use AVR_HAVE_LPMX instead of AVR_ENHANCED.
> + * libgcc.S: Use __AVR_HAVE_MUL__ instead of __AVR_ENHANCED__.
> + (__tablejump__): Use __AVR_HAVE_LPMX__ instead of __AVR_ENHANCED__.
> +
> +2007-01-17 Ian Lance Taylor <iant@google.com>
> +
> + * vec.h (VEC_reserve_exact): Define.
> + (vec_gc_p_reserve_exact): Declare.
> + (vec_gc_o_reserve_exact): Declare.
> + (vec_heap_p_reserve_exact): Declare.
> + (vec_heap_o_reserve_exact): Declare.
> + (VEC_OP (T,A,reserve_exact)): New static inline function, three
> + versions.
> + (VEC_OP (T,A,reserve)) [all versions]: Remove handling of
> + negative parameter.
> + (VEC_OP (T,A,alloc)) [all versions]: Call ...reserve_exact.
> + (VEC_OP (T,A,copy)) [all versions]: Likewise.
> + (VEC_OP (T,a,safe_grow)) [all versions]: Likewise.
> + * vec.c (calculate_allocation): Add exact parameter. Change all
> + callers.
> + (vec_gc_o_reserve_1): New static function, from vec_gc_o_reserve.
> + (vec_gc_p_reserve, vec_gc_o_reserve): Call vec_gc_o_reserve_1.
> + (vec_gc_p_reserve_exact, vec_gc_o_reserve_exact): New functions.
> + (vec_heap_o_reserve_1): New static function, from vec_heap_o_reserve.
> + (vec_heap_p_reserve, vec_heap_o_reserve): Call vec_heap_o_reserve_1.
> + (vec_heap_p_reserve_exact): New function.
> + (vec_heap_o_reserve_exact): New function.
> +
> +2007-01-17 Jan Hubicka <jh@suse.cz>
> +
> + * ipa-type-escape.c (look_for_casts): Revamp using handled_component_p.
> +
> +2007-01-17 Eric Christopher <echristo@apple.com>
> +
> + * config.gcc: Support core2 processor.
> +
> +2007-01-16 Jan Hubicka <jh@suse.cz>
> +
> + * tree-ssanames.c (release_dead_ssa_names): Instead of ggc_freeing
> + the names, just unlink the chain so we don't crash on dangling pointers
> + to dead SSA names.
> +
> +2007-01-16 Jan Hubicka <jh@suse.cz>
> +
> + * cgraph.h (cgraph_decide_inlining_incrementally): Kill.
> + * tree-pass.h: Reorder to make IPA passes appear toegher.
> + (pass_early_inline, pass_inline_parameters, pass_apply_inline): Declare.
> + * cgraphunit.c (cgraph_finalize_function): Do not compute inling
> + parameters, do not call early inliner.
> + * ipa-inline.c: Update comments. Include tree-flow.h
> + (cgraph_decide_inlining): Do not compute inlining parameters.
> + (cgraph_decide_inlining_incrementally): Return TODOs; assume to
> + be called with function context set up.
> + (pass_ipa_inline): Remove unreachable functions before pass.
> + (cgraph_early_inlining): Simplify assuming to be called from the
> + PM as local pass.
> + (pass_early_inline): New pass.
> + (cgraph_gate_ipa_early_inlining): New gate.
> + (pass_ipa_early_inline): Turn into simple wrapper.
> + (compute_inline_parameters): New function.
> + (gate_inline_passes): New gate.
> + (pass_inline_parameters): New pass.
> + (apply_inline): Move here from tree-optimize.c
> + (pass_apply_inline): New pass.
> + * ipa.c (cgraph_remove_unreachable_nodes): Verify cgraph after
> + transforming.
> + * tree-inline.c (optimize_inline_calls): Return TODOs rather than
> + doing them by hand.
> + (tree_function_versioning): Do not allocate dummy struct function.
> + * tree-inline.h (optimize_inline_calls): Update prototype.
> + * tree-optimize.c (execute_fixup_cfg): Export.
> + (pass_fixup_cfg): Remove
> + (tree_rest_of_compilation): Do not apply inlines.
> + * tree-flow.h (execute_fixup_cfg): Declare.
> + * Makefile.in (gt-passes.c): New.
> + * passes.c: Include gt-passes.h
> + (init_optimization_passes): New passes.
> + (nnodes, order): New static vars.
> + (do_per_function_toporder): New function.
> + (execute_one_pass): Dump current pass here.
> + (execute_ipa_pass_list): Don't dump current pass here.
> +
> 2007-01-16 Janis Johnson <janis187@us.ibm.com>
>
> * config/dfp-bit.c (dfp_compare_op): Return separate value for NaN.
> --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog.cp 2007-01-12 08:03:04.000000000 +0000
> +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/cp/ChangeLog 2007-01-18 04:56:19.000000000 +0000
> @@ -1,3 +1,8 @@
> +2007-01-17 Ian Lance Taylor <iant@google.com>
> +
> + * class.c (add_method): Call VEC_reserve_exact rather than passing
> + a negative size to VEC_reserve.
> +
> 2007-01-11 Simon Martin <simartin@users.sourceforge.net>
>
> PR c++/29573
>
>
> The results can be reproduced by building a compiler with
>
> --enable-gather-detailed-mem-stats targetting x86-64
>
> and compiling preprocessed combine.c or testcase from PR8632 with:
>
> -fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
>
> The memory consumption summary appears in the dump after detailed listing
> of the places they are allocated in. Peak memory consumption is actually
> computed by looking for maximal value in {GC XXXX -> YYYY} report.
>
> Your testing script.