This is the mail archive of the
gcc-regression@gcc.gnu.org
mailing list for the GCC project.
Re: A recent patch increased GCC's memory consumption in some cases!
- From: Richard Guenther <rguenther at suse dot de>
- To: Jan Hubicka <jh at suse dot cz>
- Cc: gcctest at suse dot de, gcc-regression at gcc dot gnu dot org
- Date: Fri, 9 May 2008 10:12:53 +0200 (CEST)
- Subject: Re: A recent patch increased GCC's memory consumption in some cases!
- References: <4822F2A2.mailAUK15TCEB@suse.de> <20080508165951.GA972@kam.mff.cuni.cz>
On Thu, 8 May 2008, Jan Hubicka wrote:
> Hi,
> this seems really nice ;)
Indeed. And I was thinking we nearly got all benefit from only
disabling SFTs as well... ;)
> >
> > comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
> > Ovarall memory allocated via mmap and sbrk decreased from 380351k to 285391k, overall -33.27%
> > Peak amount of GGC memory allocated before garbage collecting run decreased from 100958k to 62096k, overall -62.58%
> > Peak amount of GGC memory still allocated after garbage collecting decreased from 56611k to 40417k, overall -40.07%
> > Amount of produced GGC garbage decreased from 178452k to 118819k, overall -50.19%
> > Amount of memory still referenced at the end of compilation decreased from 6103k to 5336k, overall -14.37%
> > Overall memory needed: 380351k -> 285391k
> > Peak memory use before GGC: 100958k -> 62096k
> > Peak memory use after GGC: 56611k -> 40417k
> > Maximum of released memory in single GGC run: 50583k -> 31619k
> > Garbage: 178452k -> 118819k
> > Leak: 6103k -> 5336k
> > Overhead: 30540k -> 18233k
> > GGC runs: 107 -> 106
> > Testing has produced no results
> > Testing has produced no results
> >
> > comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
> > Ovarall memory allocated via mmap and sbrk decreased from 381191k to 286243k, overall -33.17%
> > Peak amount of GGC memory allocated before garbage collecting run decreased from 101651k to 62789k, overall -61.89%
> > Peak amount of GGC memory still allocated after garbage collecting decreased from 57304k to 41110k, overall -39.39%
> > Amount of produced GGC garbage decreased from 178616k to 118983k, overall -50.12%
> > Amount of memory still referenced at the end of compilation decreased from 8132k to 7365k, overall -10.41%
> > Overall memory needed: 381191k -> 286243k
> > Peak memory use before GGC: 101651k -> 62789k
> > Peak memory use after GGC: 57304k -> 41110k
> > Maximum of released memory in single GGC run: 50583k -> 31695k
> > Garbage: 178616k -> 118983k
> > Leak: 8132k -> 7365k
> > Overhead: 31123k -> 18816k
> > GGC runs: 110 -> 108
> > Testing has produced no results
> > Testing has produced no results
> >
> > comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
> > Peak amount of GGC memory allocated before garbage collecting run decreased from 76380k to 70580k, overall -8.22%
> > Peak amount of GGC memory still allocated after garbage collecting decreased from 70370k to 61354k, overall -14.70%
> > Amount of produced GGC garbage decreased from 238003k to 192311k, overall -23.76%
> > Amount of memory still referenced at the end of compilation decreased from 13677k to 12432k, overall -10.01%
> > Overall memory needed: 393123k -> 382403k
> > Peak memory use before GGC: 76380k -> 70580k
> > Peak memory use after GGC: 70370k -> 61354k
> > Maximum of released memory in single GGC run: 35019k -> 29401k
> > Garbage: 238003k -> 192311k
> > Leak: 13677k -> 12432k
> > Overhead: 32125k -> 24677k
> > GGC runs: 105 -> 107
> > Amount of produced pre-ipa-GGC garbage decreased from 47276k to 39611k, overall -19.35%
> > Amount of memory referenced pre-ipa decreased from 67562k to 59927k, overall -12.74%
> > Pre-IPA-Garbage: 47276k -> 39611k
> > Pre-IPA-Leak: 67562k -> 59927k
> > Pre-IPA-Overhead: 7504k -> 5628k
> > Amount of produced post-ipa-GGC garbage decreased from 47276k to 39611k, overall -19.35%
> > Amount of memory referenced post-ipa decreased from 67562k to 59927k, overall -12.74%
> > Post-IPA-Garbage: 47276k -> 39611k
> > Post-IPA-Leak: 67562k -> 59927k
> > Post-IPA-Overhead: 7504k -> 5628k
> >
> > comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
> > Ovarall memory allocated via mmap and sbrk decreased from 309479k to 238731k, overall -29.64%
> > Peak amount of GGC memory allocated before garbage collecting run decreased from 76380k to 70331k, overall -8.60%
> > Peak amount of GGC memory still allocated after garbage collecting decreased from 70370k to 61355k, overall -14.69%
> > Amount of produced GGC garbage decreased from 252446k to 208026k, overall -21.35%
> > Amount of memory still referenced at the end of compilation decreased from 13851k to 12605k, overall -9.88%
> > Overall memory needed: 309479k -> 238731k
> > Peak memory use before GGC: 76380k -> 70331k
> > Peak memory use after GGC: 70370k -> 61355k
> > Maximum of released memory in single GGC run: 31602k -> 25655k
> > Garbage: 252446k -> 208026k
> > Leak: 13851k -> 12605k
> > Overhead: 35239k -> 28695k
> > GGC runs: 118 -> 117
> > Amount of produced pre-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> > Amount of memory referenced pre-ipa decreased from 77323k to 72346k, overall -6.88%
> > Pre-IPA-Garbage: 99865k -> 80833k
> > Pre-IPA-Leak: 77323k -> 72346k
> > Pre-IPA-Overhead: 12142k -> 8403k
> > Amount of produced post-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> > Amount of memory referenced post-ipa decreased from 77323k to 72346k, overall -6.88%
> > Post-IPA-Garbage: 99865k -> 80833k
> > Post-IPA-Leak: 77323k -> 72346k
> > Post-IPA-Overhead: 12142k -> 8403k
> >
> > comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
> > Peak amount of GGC memory allocated before garbage collecting run decreased from 138642k to 116511k, overall -18.99%
> > Peak amount of GGC memory still allocated after garbage collecting decreased from 127952k to 109476k, overall -16.88%
> > Amount of produced GGC garbage decreased from 374831k to 353887k, overall -5.92%
> > Amount of memory still referenced at the end of compilation decreased from 24124k to 21397k, overall -12.75%
> > Overall memory needed: 1200099k -> 1192295k
> > Peak memory use before GGC: 138642k -> 116511k
> > Peak memory use after GGC: 127952k -> 109476k
> > Maximum of released memory in single GGC run: 59910k -> 43506k
> > Garbage: 374831k -> 353887k
> > Leak: 24124k -> 21397k
> > Overhead: 49858k -> 46186k
> > GGC runs: 104 -> 110
> > Amount of produced pre-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> > Amount of memory referenced pre-ipa decreased from 77323k to 72346k, overall -6.88%
> > Pre-IPA-Garbage: 99865k -> 80833k
> > Pre-IPA-Leak: 77323k -> 72346k
> > Pre-IPA-Overhead: 12142k -> 8403k
> > Amount of produced post-ipa-GGC garbage decreased from 99865k to 80833k, overall -23.55%
> > Amount of memory referenced post-ipa decreased from 77323k to 72346k, overall -6.88%
> > Post-IPA-Garbage: 99865k -> 80833k
> > Post-IPA-Leak: 77323k -> 72346k
> > Post-IPA-Overhead: 12142k -> 8403k
> >
> > Head of the ChangeLog is:
> >
> > --- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 2008-05-08 02:12:21.000000000 +0000
> > +++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 2008-05-08 09:42:11.000000000 +0000
> > @@ -1,3 +1,151 @@
> > +2008-05-08 Richard Guenther <rguenther@suse.de>
> > +
> > + * tree-data-ref.c (dr_analyze_alias): Do not set DR_SUBVARS.
> > + * tree-data-ref.h (struct dr_alias): Remove subvars field.
> > + (DR_SUBVARS): Remove.
> > + * tree-dfa.c (dump_subvars_for): Remove.
> > + (debug_subvars_for): Likewise.
> > + (dump_variable): Do not dump subvars.
> > + (remove_referenced_var): Do not remove subvars.
> > + * tree-flow-inline.h (clear_call_clobbered): SFTs no longer exist.
> > + (lookup_subvars_for_var): Remove.
> > + (get_subvars_for_var): Likewise.
> > + (get_subvars_at): Likewise.
> > + (get_first_overlapping_subvar): Likewise.
> > + (overlap_subvar): Likewise.
> > + * tree-flow.h (subvar_t): Remove.
> > + (struct var_ann_d): Remove subvars field.
> > + * tree-ssa-alias.c (mark_aliases_call_clobbered): Remove queued
> > + argument. Remove special handling of SFTs.
> > + (compute_tag_properties): Likewise.
> > + (set_initial_properties): Likewise.
> > + (compute_call_clobbered): Likewise.
> > + (count_mem_refs): Likewise.
> > + (compute_memory_partitions): Likewise.
> > + (compute_flow_insensitive_aliasing): Likewise.
> > + (setup_pointers_and_addressables): Likewise.
> > + (new_type_alias): Likewise.
> > + (struct used_part): Remove.
> > + (used_portions): Likewise.
> > + (struct used_part_map): Likewise.
> > + (used_part_map_eq): Likewise.
> > + (used_part_map_hash): Likewise.
> > + (free_used_part_map): Likewise.
> > + (up_lookup): Likewise.
> > + (up_insert): Likewise.
> > + (get_or_create_used_part_for): Likewise.
> > + (create_sft): Likewise.
> > + (create_overlap_variables_for): Likewise.
> > + (find_used_portions): Likewise.
> > + (create_structure_vars): Likewise.
> > + * tree.def (STRUCT_FIELD_TAG): Remove.
> > + * tree.h (MTAG_P): Adjust.
> > + (struct tree_memory_tag): Remove base_for_components and
> > + unpartitionable flags.
> > + (struct tree_struct_field_tag): Remove.
> > + (SFT_PARENT_VAR): Likewise.
> > + (SFT_OFFSET): Likewise.
> > + (SFT_SIZE): Likewise.
> > + (SFT_NONADDRESSABLE_P): Likewise.
> > + (SFT_ALIAS_SET): Likewise.
> > + (SFT_UNPARTITIONABLE_P): Likewise.
> > + (SFT_BASE_FOR_COMPONENTS_P): Likewise.
> > + (union tree_node): Remove sft field.
> > + * alias.c (get_alias_set): Remove special handling of SFTs.
> > + * print-tree.c (print_node): Remove handling of SFTs.
> > + * tree-dump.c (dequeue_and_dump): Likewise.
> > + * tree-into-ssa.c (mark_sym_for_renaming): Likewise.
> > + * tree-nrv.c (dest_safe_for_nrv_p): Remove special handling of SFTs.
> > + * tree-predcom.c (set_alias_info): Do not set subvars.
> > + * tree-pretty-print.c (dump_generic_node): Do not handle SFTs.
> > + * tree-ssa-loop-ivopts.c (get_ref_tag): Likewise.
> > + * tree-ssa-operands.c (access_can_touch_variable): Likewise.
> > + (add_vars_for_offset): Remove.
> > + (add_virtual_operand): Remove special handling of SFTs.
> > + (add_call_clobber_ops): Likewise.
> > + (add_call_read_ops): Likewise.
> > + (get_asm_expr_operands): Likewise.
> > + (get_modify_stmt_operands): Likewise.
> > + (get_expr_operands): Likewise.
> > + (add_to_addressable_set): Likewise.
> > + * tree-ssa.c (verify_ssa_name): Do not handle SFTs.
> > + * tree-tailcall.c (suitable_for_tail_opt_p): Likewise.
> > + * tree-vect-transform.c (vect_create_data_ref_ptr): Do not
> > + set subvars.
> > + * tree.c (init_ttree): Remove STRUCT_FIELD_TAG initialization.
> > + (tree_code_size): Remove STRUCT_FIELD_TAG handling.
> > + (tree_node_structure): Likewise.
> > + * tree-ssa-structalias.c (set_uids_in_ptset): Remove special
> > + handling of SFTs.
> > + (find_what_p_points_to): Likewise.
> > +
> > +2008-05-08 Sa Liu <saliu@de.ibm.com>
> > +
> > + * config/spu/spu.md: Fixed subti3 pattern.
> > + * testsuite/gcc.target/spu/subti3.c: New.
> > +
> > +2008-05-08 Richard Guenther <rguenther@suse.de>
> > +
> > + PR middle-end/36154
> > + * tree-ssa-structalias.c (push_fields_onto_fieldstack): Make
> > + sure to create a representative for trailing arrays for PTA.
> > +
> > +2008-05-08 Richard Guenther <rguenther@suse.de>
> > +
> > + PR middle-end/36172
> > + * fold-const.c (operand_equal_p): Two objects which types
> > + differ in pointerness are not equal.
> > +
> > +2008-05-08 Kai Tietz <kai,tietz@onevision.com>
> > +
> > + * calls.c (compute_argument_block_size): Add argument tree fndecl.
> > + (OUTGOING_REG_PARM_STACK_SPACE): Add function type argument.
> > + (emit_library_call_value_1): Add new variable fndecl initialized by
> > + NULL_TREE. It should be the decl type of orgfun, but this information
> > + seems not to be available here, so it uses the default calling abi.
> > + * config/arm/arm.c (arm_return_in_memory): Add fntype argumen.
> > + * config/arm/arm.h (RETURN_IN_MEMORY): Replace RETURN_IN_MEMORY
> > + by TARGET_RETURN_IN_MEMORY.
> > + * config/i386/i386-interix.h: Likewise.
> > + * config/i386/i386.h: Likewise.
> > + * config/i386/i386elf.h: Likewise.
> > + * config/i386/ptx4-i.h: Likewise.
> > + * config/i386/sol2-10.h: Likewise.
> > + * config/i386/sysv4.h: Likewise.
> > + * config/i386/vx-common.h: Likewise.
> > + * config/cris/cris.h: Removed #if 0 clause.
> > + * config/arm/arm-protos.h (arm_return_in_memory): Add fntype
> > + argument.
> > + * config/i386/i386-protos.h (ix86_return_in_memory): Add fntype
> > + argument.
> > + (ix86_sol10_return_in_memory): Likewise.
> > + (ix86_i386elf_return_in_memory): New.
> > + (ix86_i386interix_return_in_memory): New.
> > + * config/mt/mt-protos.h (mt_return_in_memory): New.
> > + * config/mt/mt.c: Likewise.
> > + * config/mt/mt.h (OUTGOING_REG_PARM_STACK_SPACE): Add FNTYPE argument.
> > + (RETURN_IN_MEMORY): Replace by TARGET_RETURN_IN_MEMORY.
> > + * config/bfin/bfin.h: Likewise.
> > + * config/bfin/bfin-protos.h (bfin_return_in_memory): Add fntype
> > + argument.
> > + * config/bfin/bfin.c: Likewise.
> > + * config/pa/pa.h (OUTGOING_REG_PARM_STACK_SPACE): Add FNTYPE argument.
> > + * config/alpha/unicosmk.h: Likewise.
> > + * config/i386/cygming.h: Likewise.
> > + * config/iq2000/iq2000.h: Likewise.
> > + * config/mips/mips.h: Likewise.
> > + * config/mn10300/mn10300.h: Likewise.
> > + * config/rs6000/rs6000.h: Likewise.
> > + * config/score/score.h: Likewise.
> > + * config/spu/spu.h: Likewise.
> > + * config/v850/v850.h: Likewise.
> > + * defaults.h: Likewise.
> > + * doc/tm.texi (OUTGOING_REG_PARM_STACK_SPACE): Adjust documentation.
> > + * expr.c (emit_block_move): Adjust use of OUTGOING_REG_PARM_STACK_SPACE.
> > + * function.c (STACK_DYNAMIC_OFFSET): Adjust use of
> > + OUTGOING_REG_PARM_STACK_SPACE.
> > + * targhooks.c (default_return_in_memory): Remove RETURN_IN_MEMORY.
> > +
> > 2008-05-08 Jakub Jelinek <jakub@redhat.com>
> >
> > * tree-parloops.c (create_parallel_loop): Set OMP_RETURN_NOWAIT
> >
> >
> > The results can be reproduced by building a compiler with
> >
> > --enable-gather-detailed-mem-stats targetting x86-64
> >
> > and compiling preprocessed combine.c or testcase from PR8632 with:
> >
> > -fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
> >
> > The memory consumption summary appears in the dump after detailed listing
> > of the places they are allocated in. Peak memory consumption is actually
> > computed by looking for maximal value in {GC XXXX -> YYYY} report.
> >
> > Your testing script.
>
>
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex