This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A recent patch increased GCC's memory consumption in some cases!


On Thu, 8 May 2008, Jan Hubicka wrote:

> Hi,
> this seems really nice ;)

Indeed.  And I was thinking we nearly got all benefit from only
disabling SFTs as well... ;)

> > 
> > comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
> >   Ovarall memory allocated via mmap and sbrk decreased from 380351k to 285391k, overall -33.27%
> >   Peak amount of GGC memory allocated before garbage collecting run decreased from 100958k to 62096k, overall -62.58%
> >   Peak amount of GGC memory still allocated after garbage collecting decreased from 56611k to 40417k, overall -40.07%
> >   Amount of produced GGC garbage decreased from 178452k to 118819k, overall -50.19%
> >   Amount of memory still referenced at the end of compilation decreased from 6103k to 5336k, overall -14.37%
> >     Overall memory needed: 380351k -> 285391k
> >     Peak memory use before GGC: 100958k -> 62096k
> >     Peak memory use after GGC: 56611k -> 40417k
> >     Maximum of released memory in single GGC run: 50583k -> 31619k
> >     Garbage: 178452k -> 118819k
> >     Leak: 6103k -> 5336k
> >     Overhead: 30540k -> 18233k
> >     GGC runs: 107 -> 106
> > Testing has produced no results
> > Testing has produced no results
> > 
> > comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
> >   Ovarall memory allocated via mmap and sbrk decreased from 381191k to 286243k, overall -33.17%
> >   Peak amount of GGC memory allocated before garbage collecting run decreased from 101651k to 62789k, overall -61.89%
> >   Peak amount of GGC memory still allocated after garbage collecting decreased from 57304k to 41110k, overall -39.39%
> >   Amount of produced GGC garbage decreased from 178616k to 118983k, overall -50.12%
> >   Amount of memory still referenced at the end of compilation decreased from 8132k to 7365k, overall -10.41%
> >     Overall memory needed: 381191k -> 286243k
> >     Peak memory use before GGC: 101651k -> 62789k
> >     Peak memory use after GGC: 57304k -> 41110k
> >     Maximum of released memory in single GGC run: 50583k -> 31695k
> >     Garbage: 178616k -> 118983k
> >     Leak: 8132k -> 7365k
> >     Overhead: 31123k -> 18816k
> >     GGC runs: 110 -> 108
> > Testing has produced no results
> > Testing has produced no results
> > 
> > comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
> >   Peak amount of GGC memory allocated before garbage collecting run decreased from 76380k to 70580k, overall -8.22%
> >   Peak amount of GGC memory still allocated after garbage collecting decreased from 70370k to 61354k, overall -14.70%
> >   Amount of produced GGC garbage decreased from 238003k to 192311k, overall -23.76%
> >   Amount of memory still referenced at the end of compilation decreased from 13677k to 12432k, overall -10.01%
> >     Overall memory needed: 393123k -> 382403k
> >     Peak memory use before GGC: 76380k -> 70580k
> >     Peak memory use after GGC: 70370k -> 61354k
> >     Maximum of released memory in single GGC run: 35019k -> 29401k
> >     Garbage: 238003k -> 192311k
> >     Leak: 13677k -> 12432k
> >     Overhead: 32125k -> 24677k
> >     GGC runs: 105 -> 107
> >   Amount of produced pre-ipa-GGC garbage decreased from 47276k to 39611k, overall -19.35%
> >   Amount of memory referenced pre-ipa decreased from 67562k to 59927k, overall -12.74%
> >     Pre-IPA-Garbage: 47276k -> 39611k
> >     Pre-IPA-Leak: 67562k -> 59927k
> >     Pre-IPA-Overhead: 7504k -> 5628k
> >   Amount of produced post-ipa-GGC garbage decreased from 47276k to 39611k, overall -19.35%
> >   Amount of memory referenced post-ipa decreased from 67562k to 59927k, overall -12.74%
> >     Post-IPA-Garbage: 47276k -> 39611k
> >     Post-IPA-Leak: 67562k -> 59927k
> >     Post-IPA-Overhead: 7504k -> 5628k
> > 
> > comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
> >   Ovarall memory allocated via mmap and sbrk decreased from 309479k to 238731k, overall -29.64%
> >   Peak amount of GGC memory allocated before garbage collecting run decreased from 76380k to 70331k, overall -8.60%
> >   Peak amount of GGC memory still allocated after garbage collecting decreased from 70370k to 61355k, overall -14.69%
> >   Amount of produced GGC garbage decreased from 252446k to 208026k, overall -21.35%
> >   Amount of memory still referenced at the end of compilation decreased from 13851k to 12605k, overall -9.88%
> >     Overall memory needed: 309479k -> 238731k
> >     Peak memory use before GGC: 76380k -> 70331k
> >     Peak memory use after GGC: 70370k -> 61355k
> >     Maximum of released memory in single GGC run: 31602k -> 25655k
> >     Garbage: 252446k -> 208026k
> >     Leak: 13851k -> 12605k
> >     Overhead: 35239k -> 28695k
> >     GGC runs: 118 -> 117
> >   Amount of produced pre-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> >   Amount of memory referenced pre-ipa decreased from 77323k to 72346k, overall -6.88%
> >     Pre-IPA-Garbage: 99865k -> 80833k
> >     Pre-IPA-Leak: 77323k -> 72346k
> >     Pre-IPA-Overhead: 12142k -> 8403k
> >   Amount of produced post-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> >   Amount of memory referenced post-ipa decreased from 77323k to 72346k, overall -6.88%
> >     Post-IPA-Garbage: 99865k -> 80833k
> >     Post-IPA-Leak: 77323k -> 72346k
> >     Post-IPA-Overhead: 12142k -> 8403k
> > 
> > comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
> >   Peak amount of GGC memory allocated before garbage collecting run decreased from 138642k to 116511k, overall -18.99%
> >   Peak amount of GGC memory still allocated after garbage collecting decreased from 127952k to 109476k, overall -16.88%
> >   Amount of produced GGC garbage decreased from 374831k to 353887k, overall -5.92%
> >   Amount of memory still referenced at the end of compilation decreased from 24124k to 21397k, overall -12.75%
> >     Overall memory needed: 1200099k -> 1192295k
> >     Peak memory use before GGC: 138642k -> 116511k
> >     Peak memory use after GGC: 127952k -> 109476k
> >     Maximum of released memory in single GGC run: 59910k -> 43506k
> >     Garbage: 374831k -> 353887k
> >     Leak: 24124k -> 21397k
> >     Overhead: 49858k -> 46186k
> >     GGC runs: 104 -> 110
> >   Amount of produced pre-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> >   Amount of memory referenced pre-ipa decreased from 77323k to 72346k, overall -6.88%
> >     Pre-IPA-Garbage: 99865k -> 80833k
> >     Pre-IPA-Leak: 77323k -> 72346k
> >     Pre-IPA-Overhead: 12142k -> 8403k
> >   Amount of produced post-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> >   Amount of memory referenced post-ipa decreased from 77323k to 72346k, overall -6.88%
> >     Post-IPA-Garbage: 99865k -> 80833k
> >     Post-IPA-Leak: 77323k -> 72346k
> >     Post-IPA-Overhead: 12142k -> 8403k
> > 
> > Head of the ChangeLog is:
> > 
> > --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2008-05-08 02:12:21.000000000 +0000
> > +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2008-05-08 09:42:11.000000000 +0000
> > @@ -1,3 +1,151 @@
> > +2008-05-08  Richard Guenther  <rguenther@suse.de>
> > +
> > +	* tree-data-ref.c (dr_analyze_alias): Do not set DR_SUBVARS.
> > +	* tree-data-ref.h (struct dr_alias): Remove subvars field.
> > +	(DR_SUBVARS): Remove.
> > +	* tree-dfa.c (dump_subvars_for): Remove.
> > +	(debug_subvars_for): Likewise.
> > +	(dump_variable): Do not dump subvars.
> > +	(remove_referenced_var): Do not remove subvars.
> > +	* tree-flow-inline.h (clear_call_clobbered): SFTs no longer exist.
> > +	(lookup_subvars_for_var): Remove.
> > +	(get_subvars_for_var): Likewise.
> > +	(get_subvars_at): Likewise.
> > +	(get_first_overlapping_subvar): Likewise.
> > +	(overlap_subvar): Likewise.
> > +	* tree-flow.h (subvar_t): Remove.
> > +	(struct var_ann_d): Remove subvars field.
> > +	* tree-ssa-alias.c (mark_aliases_call_clobbered): Remove queued
> > +	argument.  Remove special handling of SFTs.
> > +	(compute_tag_properties): Likewise.
> > +	(set_initial_properties): Likewise.
> > +	(compute_call_clobbered): Likewise.
> > +	(count_mem_refs): Likewise.
> > +	(compute_memory_partitions): Likewise.
> > +	(compute_flow_insensitive_aliasing): Likewise.
> > +	(setup_pointers_and_addressables): Likewise.
> > +	(new_type_alias): Likewise.
> > +	(struct used_part): Remove.
> > +	(used_portions): Likewise.
> > +	(struct used_part_map): Likewise.
> > +	(used_part_map_eq): Likewise.
> > +	(used_part_map_hash): Likewise.
> > +	(free_used_part_map): Likewise.
> > +	(up_lookup): Likewise.
> > +	(up_insert): Likewise.
> > +	(get_or_create_used_part_for): Likewise.
> > +	(create_sft): Likewise.
> > +	(create_overlap_variables_for): Likewise.
> > +	(find_used_portions): Likewise.
> > +	(create_structure_vars): Likewise.
> > +	* tree.def (STRUCT_FIELD_TAG): Remove.
> > +	* tree.h (MTAG_P): Adjust.
> > +	(struct tree_memory_tag): Remove base_for_components and
> > +	unpartitionable flags.
> > +	(struct tree_struct_field_tag): Remove.
> > +	(SFT_PARENT_VAR): Likewise.
> > +	(SFT_OFFSET): Likewise.
> > +	(SFT_SIZE): Likewise.
> > +	(SFT_NONADDRESSABLE_P): Likewise.
> > +	(SFT_ALIAS_SET): Likewise.
> > +	(SFT_UNPARTITIONABLE_P): Likewise.
> > +	(SFT_BASE_FOR_COMPONENTS_P): Likewise.
> > +	(union tree_node): Remove sft field.
> > +	* alias.c (get_alias_set): Remove special handling of SFTs.
> > +	* print-tree.c (print_node): Remove handling of SFTs.
> > +	* tree-dump.c (dequeue_and_dump): Likewise.
> > +	* tree-into-ssa.c (mark_sym_for_renaming): Likewise.
> > +	* tree-nrv.c (dest_safe_for_nrv_p): Remove special handling of SFTs.
> > +	* tree-predcom.c (set_alias_info): Do not set subvars.
> > +	* tree-pretty-print.c (dump_generic_node): Do not handle SFTs.
> > +	* tree-ssa-loop-ivopts.c (get_ref_tag): Likewise.
> > +	* tree-ssa-operands.c (access_can_touch_variable): Likewise.
> > +	(add_vars_for_offset): Remove.
> > +	(add_virtual_operand): Remove special handling of SFTs.
> > +	(add_call_clobber_ops): Likewise.
> > +	(add_call_read_ops): Likewise.
> > +	(get_asm_expr_operands): Likewise.
> > +	(get_modify_stmt_operands): Likewise.
> > +	(get_expr_operands): Likewise.
> > +	(add_to_addressable_set): Likewise.
> > +	* tree-ssa.c (verify_ssa_name): Do not handle SFTs.
> > +	* tree-tailcall.c (suitable_for_tail_opt_p): Likewise.
> > +	* tree-vect-transform.c (vect_create_data_ref_ptr): Do not
> > +	set subvars.
> > +	* tree.c (init_ttree): Remove STRUCT_FIELD_TAG initialization.
> > +	(tree_code_size): Remove STRUCT_FIELD_TAG handling.
> > +	(tree_node_structure): Likewise.
> > +	* tree-ssa-structalias.c (set_uids_in_ptset): Remove special
> > +	handling of SFTs.
> > +	(find_what_p_points_to): Likewise.
> > +
> > +2008-05-08  Sa Liu  <saliu@de.ibm.com>
> > +
> > +	* config/spu/spu.md: Fixed subti3 pattern.
> > +	* testsuite/gcc.target/spu/subti3.c: New.
> > +
> > +2008-05-08  Richard Guenther  <rguenther@suse.de>
> > +
> > +	PR middle-end/36154
> > +	* tree-ssa-structalias.c (push_fields_onto_fieldstack): Make
> > +	sure to create a representative for trailing arrays for PTA.
> > +
> > +2008-05-08  Richard Guenther  <rguenther@suse.de>
> > +
> > +	PR middle-end/36172
> > +	* fold-const.c (operand_equal_p): Two objects which types
> > +	differ in pointerness are not equal.
> > +
> > +2008-05-08  Kai Tietz  <kai,tietz@onevision.com>
> > +
> > +	* calls.c (compute_argument_block_size): Add argument tree fndecl.
> > +	(OUTGOING_REG_PARM_STACK_SPACE): Add function type argument.
> > +	(emit_library_call_value_1): Add new variable fndecl initialized by
> > +	NULL_TREE. It should be the decl type of orgfun, but this information
> > +	seems not to be available here, so it uses the default calling abi.
> > +	* config/arm/arm.c (arm_return_in_memory): Add fntype argumen.
> > +	* config/arm/arm.h (RETURN_IN_MEMORY): Replace RETURN_IN_MEMORY
> > +	by TARGET_RETURN_IN_MEMORY.
> > +	* config/i386/i386-interix.h: Likewise.
> > +	* config/i386/i386.h: Likewise.
> > +	* config/i386/i386elf.h: Likewise.
> > +	* config/i386/ptx4-i.h: Likewise.
> > +	* config/i386/sol2-10.h: Likewise.
> > +	* config/i386/sysv4.h: Likewise.
> > +	* config/i386/vx-common.h: Likewise.
> > +	* config/cris/cris.h: Removed #if 0 clause.
> > +	* config/arm/arm-protos.h (arm_return_in_memory): Add fntype
> > +	argument.
> > +	* config/i386/i386-protos.h (ix86_return_in_memory): Add fntype
> > +	argument.
> > +	(ix86_sol10_return_in_memory): Likewise.
> > +	(ix86_i386elf_return_in_memory): New.
> > +	(ix86_i386interix_return_in_memory): New.
> > +	* config/mt/mt-protos.h (mt_return_in_memory): New.
> > +	* config/mt/mt.c: Likewise.
> > +	* config/mt/mt.h (OUTGOING_REG_PARM_STACK_SPACE): Add FNTYPE argument.
> > +	(RETURN_IN_MEMORY):  Replace by TARGET_RETURN_IN_MEMORY.
> > +	* config/bfin/bfin.h: Likewise.
> > +	* config/bfin/bfin-protos.h (bfin_return_in_memory): Add fntype
> > +	argument.
> > +	* config/bfin/bfin.c: Likewise.
> > +	* config/pa/pa.h (OUTGOING_REG_PARM_STACK_SPACE): Add FNTYPE argument.
> > +	* config/alpha/unicosmk.h: Likewise.
> > +	* config/i386/cygming.h: Likewise.
> > +	* config/iq2000/iq2000.h: Likewise.
> > +	* config/mips/mips.h: Likewise.
> > +	* config/mn10300/mn10300.h: Likewise.
> > +	* config/rs6000/rs6000.h: Likewise.
> > +	* config/score/score.h: Likewise.
> > +	* config/spu/spu.h: Likewise.
> > +	* config/v850/v850.h: Likewise.
> > +	* defaults.h: Likewise.
> > +	* doc/tm.texi (OUTGOING_REG_PARM_STACK_SPACE): Adjust documentation.
> > +	* expr.c (emit_block_move): Adjust use of OUTGOING_REG_PARM_STACK_SPACE.
> > +	* function.c (STACK_DYNAMIC_OFFSET): Adjust use of
> > +	OUTGOING_REG_PARM_STACK_SPACE.
> > +	* targhooks.c (default_return_in_memory): Remove RETURN_IN_MEMORY.
> > +
> >  2008-05-08  Jakub Jelinek  <jakub@redhat.com>
> >  
> >  	* tree-parloops.c (create_parallel_loop): Set OMP_RETURN_NOWAIT
> > 
> > 
> > The results can be reproduced by building a compiler with
> > 
> > --enable-gather-detailed-mem-stats targetting x86-64
> > 
> > and compiling preprocessed combine.c or testcase from PR8632 with:
> > 
> > -fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
> > 
> > The memory consumption summary appears in the dump after detailed listing
> > of the places they are allocated in.  Peak memory consumption is actually
> > computed by looking for maximal value in {GC XXXX -> YYYY} report.
> > 
> > Your testing script.
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]