This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A recent patch increased GCC's memory consumption in some cases!


Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing empty function compilation at -O0 level:
    Overall memory needed: 7383k -> 7384k
    Peak memory use before GGC: 2265k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 310k
    Garbage: 445k
    Leak: 2289k
    Overhead: 456k
    GGC runs: 3

comparing empty function compilation at -O0 -g level:
    Overall memory needed: 7399k -> 7400k
    Peak memory use before GGC: 2293k
    Peak memory use after GGC: 1982k
    Maximum of released memory in single GGC run: 311k
    Garbage: 448k
    Leak: 2321k
    Overhead: 461k
    GGC runs: 3

comparing empty function compilation at -O1 level:
    Overall memory needed: 7495k -> 7512k
    Peak memory use before GGC: 2265k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 310k
    Garbage: 451k
    Leak: 2291k
    Overhead: 457k
    GGC runs: 4

comparing empty function compilation at -O2 level:
    Overall memory needed: 7507k -> 7512k
    Peak memory use before GGC: 2266k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 311k
    Garbage: 454k
    Leak: 2291k
    Overhead: 457k
    GGC runs: 4

comparing empty function compilation at -O3 level:
    Overall memory needed: 7507k -> 7512k
    Peak memory use before GGC: 2266k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 311k
    Garbage: 454k
    Leak: 2291k
    Overhead: 457k
    GGC runs: 4

comparing combine.c compilation at -O0 level:
    Overall memory needed: 17731k -> 17728k
    Peak memory use before GGC: 9291k
    Peak memory use after GGC: 8871k
    Maximum of released memory in single GGC run: 2603k
    Garbage: 37162k
    Leak: 6539k
    Overhead: 5024k
    GGC runs: 280

comparing combine.c compilation at -O0 -g level:
    Overall memory needed: 19823k -> 19824k
    Peak memory use before GGC: 10897k
    Peak memory use after GGC: 10531k
    Maximum of released memory in single GGC run: 2375k
    Garbage: 37738k
    Leak: 9415k
    Overhead: 5727k
    GGC runs: 270

comparing combine.c compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 19412k to 19436k, overall 0.12%
  Amount of produced GGC garbage increased from 57374k to 57494k, overall 0.21%
    Overall memory needed: 35163k -> 35176k
    Peak memory use before GGC: 19412k -> 19436k
    Peak memory use after GGC: 19206k -> 19221k
    Maximum of released memory in single GGC run: 2197k
    Garbage: 57374k -> 57494k
    Leak: 6562k -> 6564k
    Overhead: 6354k -> 6358k
    GGC runs: 349 -> 352

comparing combine.c compilation at -O2 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 19446k to 19471k, overall 0.13%
  Peak amount of GGC memory still allocated after garbage collecting increased from 19245k to 19269k, overall 0.12%
    Overall memory needed: 37563k -> 37500k
    Peak memory use before GGC: 19446k -> 19471k
    Peak memory use after GGC: 19245k -> 19269k
    Maximum of released memory in single GGC run: 2187k -> 2185k
    Garbage: 68818k -> 68794k
    Leak: 6681k -> 6673k
    Overhead: 7955k -> 7992k
    GGC runs: 406 -> 405

comparing combine.c compilation at -O3 level:
    Overall memory needed: 45623k -> 45720k
    Peak memory use before GGC: 20560k -> 20504k
    Peak memory use after GGC: 19744k -> 19622k
    Maximum of released memory in single GGC run: 3125k -> 3158k
    Garbage: 102695k -> 101168k
    Leak: 6817k -> 6814k
    Overhead: 12395k -> 12209k
    GGC runs: 456 -> 453

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 103547k -> 103544k
    Peak memory use before GGC: 69329k
    Peak memory use after GGC: 44976k
    Maximum of released memory in single GGC run: 36886k
    Garbage: 130571k
    Leak: 9588k
    Overhead: 16932k
    GGC runs: 206

comparing insn-attrtab.c compilation at -O0 -g level:
    Overall memory needed: 105067k -> 105068k
    Peak memory use before GGC: 70490k
    Peak memory use after GGC: 46244k
    Maximum of released memory in single GGC run: 36886k
    Garbage: 131730k
    Leak: 11278k
    Overhead: 17326k
    GGC runs: 206

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 148183k -> 148000k
    Peak memory use before GGC: 86337k
    Peak memory use after GGC: 80543k -> 80544k
    Maximum of released memory in single GGC run: 33045k -> 33048k
    Garbage: 264605k -> 264635k
    Leak: 9404k -> 9405k
    Overhead: 27590k -> 27596k
    GGC runs: 225

comparing insn-attrtab.c compilation at -O2 level:
    Overall memory needed: 191643k -> 191528k
    Peak memory use before GGC: 87648k -> 87649k
    Peak memory use after GGC: 80609k
    Maximum of released memory in single GGC run: 31388k -> 31387k
    Garbage: 299515k -> 299498k
    Leak: 9401k -> 9402k
    Overhead: 33192k -> 33191k
    GGC runs: 245

comparing insn-attrtab.c compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 191659k to 196316k, overall 2.43%
    Overall memory needed: 191659k -> 196316k
    Peak memory use before GGC: 87665k -> 87666k
    Peak memory use after GGC: 80626k -> 80627k
    Maximum of released memory in single GGC run: 31450k
    Garbage: 300153k -> 300155k
    Leak: 9407k -> 9407k
    Overhead: 33388k -> 33391k
    GGC runs: 245

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 151192k -> 151197k
    Peak memory use before GGC: 92317k
    Peak memory use after GGC: 91394k
    Maximum of released memory in single GGC run: 18793k
    Garbage: 210317k
    Leak: 49389k
    Overhead: 23720k
    GGC runs: 411

comparing Gerald's testcase PR8361 compilation at -O0 -g level:
    Overall memory needed: 169352k -> 169353k
    Peak memory use before GGC: 104950k
    Peak memory use after GGC: 103910k
    Maximum of released memory in single GGC run: 18979k
    Garbage: 216936k
    Leak: 72814k
    Overhead: 29643k
    GGC runs: 382

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Amount of produced GGC garbage decreased from 392073k to 346698k, overall -13.09%
    Overall memory needed: 145031k -> 142270k
    Peak memory use before GGC: 103088k -> 102378k
    Peak memory use after GGC: 102050k -> 101350k
    Maximum of released memory in single GGC run: 17981k -> 17982k
    Garbage: 392073k -> 346698k
    Leak: 50219k -> 50049k
    Overhead: 34383k -> 30234k
    GGC runs: 541 -> 523

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Amount of produced GGC garbage decreased from 431219k to 375730k, overall -14.77%
    Overall memory needed: 146127k -> 142654k
    Peak memory use before GGC: 103514k -> 103057k
    Peak memory use after GGC: 102485k -> 101997k
    Maximum of released memory in single GGC run: 17979k
    Garbage: 431219k -> 375730k
    Leak: 50909k -> 50648k
    Overhead: 39623k -> 34176k
    GGC runs: 592 -> 560

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of produced GGC garbage decreased from 450979k to 392061k, overall -15.03%
  Amount of memory still referenced at the end of compilation increased from 51004k to 51296k, overall 0.57%
    Overall memory needed: 148995k -> 145910k
    Peak memory use before GGC: 104781k -> 104814k
    Peak memory use after GGC: 103711k -> 103748k
    Maximum of released memory in single GGC run: 18300k
    Garbage: 450979k -> 392061k
    Leak: 51004k -> 51296k
    Overhead: 41437k -> 35426k
    GGC runs: 607 -> 570

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 245794k -> 245795k
    Peak memory use before GGC: 81775k
    Peak memory use after GGC: 59514k
    Maximum of released memory in single GGC run: 44985k
    Garbage: 145986k
    Leak: 7570k
    Overhead: 24807k
    GGC runs: 80

comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
    Overall memory needed: 246646k -> 246647k
    Peak memory use before GGC: 82421k
    Peak memory use after GGC: 60160k
    Maximum of released memory in single GGC run: 44974k
    Garbage: 146205k
    Leak: 9338k
    Overhead: 25303k
    GGC runs: 89

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
  Ovarall memory allocated via mmap and sbrk decreased from 258635k to 240616k, overall -7.49%
  Peak amount of GGC memory allocated before garbage collecting run decreased from 104509k to 84340k, overall -23.91%
  Peak amount of GGC memory still allocated after garbage collecting decreased from 101295k to 73894k, overall -37.08%
  Amount of produced GGC garbage decreased from 240015k to 224276k, overall -7.02%
  Amount of memory still referenced at the end of compilation decreased from 25176k to 19608k, overall -28.40%
    Overall memory needed: 258635k -> 240616k
    Peak memory use before GGC: 104509k -> 84340k
    Peak memory use after GGC: 101295k -> 73894k
    Maximum of released memory in single GGC run: 51835k -> 36499k
    Garbage: 240015k -> 224276k
    Leak: 25176k -> 19608k
    Overhead: 29476k -> 30436k
    GGC runs: 79 -> 81

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
  Ovarall memory allocated via mmap and sbrk decreased from 531975k to 485972k, overall -9.47%
  Peak amount of GGC memory allocated before garbage collecting run decreased from 104503k to 78744k, overall -32.71%
  Peak amount of GGC memory still allocated after garbage collecting decreased from 101289k to 73894k, overall -37.07%
  Amount of produced GGC garbage decreased from 270332k to 231677k, overall -16.69%
  Amount of memory still referenced at the end of compilation decreased from 25093k to 19698k, overall -27.39%
    Overall memory needed: 531975k -> 485972k
    Peak memory use before GGC: 104503k -> 78744k
    Peak memory use after GGC: 101289k -> 73894k
    Maximum of released memory in single GGC run: 37167k -> 33796k
    Garbage: 270332k -> 231677k
    Leak: 25093k -> 19698k
    Overhead: 35313k -> 32578k
    GGC runs: 91 -> 92

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
  Amount of produced GGC garbage increased from 371298k to 376322k, overall 1.35%
    Overall memory needed: 1180679k -> 1180644k
    Peak memory use before GGC: 200610k
    Peak memory use after GGC: 188951k
    Maximum of released memory in single GGC run: 80732k
    Garbage: 371298k -> 376322k
    Leak: 45260k -> 45252k
    Overhead: 48285k -> 49159k
    GGC runs: 70

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2007-02-09 20:27:14.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2007-02-10 12:12:58.000000000 +0000
@@ -1,3 +1,80 @@
+2007-02-10  Kaz Kojima  <kkojima@gcc.gnu.org>
+
+	PR rtl-optimization/29599
+	* reload1.c (eliminate_regs_in_insn): Take the destination
+	mode into account when computing the offset.
+
+2007-02-09  Stuart Hastings  <stuart@apple.com>
+	Richard Henderson  <rth@redhat.com>
+
+	* gcc/config/i386/i386.h (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New.
+	* gcc/config/i386/darwin.h: (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New.
+	* gcc/config/i386/i386.md (fixuns_trunc<mode>si2, fixuns_truncsfhi2,
+	fixuns_truncdfhi2): New.
+	(fix_truncsfdi_sse): Call ix86_expand_convert_sign_didf_sse.
+	(floatunsdidf2): Call ix86_expand_convert_uns_didf_sse.
+	(floatunssisf2): Add call to ix86_expand_convert_uns_sisf_sse.
+	(floatunssidf2): Allow nonimmediate source.
+	* gcc/config/i386/sse.md (movdi_to_sse): New.  (vec_concatv2di): Drop '*'.
+	* gcc/config/i386/i386-protos.h (ix86_expand_convert_uns_si_sse,
+	ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse,
+	ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse): New.
+	* gcc/config/i386/i386.c (ix86_expand_convert_uns_si_sse,
+	ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse,
+	ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse,
+	ix86_build_const_vector, ix86_expand_vector_init_one_nonzero): New.
+	(ix86_build_signbit_mask): Fix decl of v, refactor to call ix86_build_const_vector.
+	(x86_emit_floatuns): Rewrite.
+
+2007-02-10  Manuel Lopez-Ibanez  <manu@gcc.gnu.org>
+
+	* genautomata.c (longest_path_length): Delete unused function.
+	(struct state): Delete unused longest_path_length.
+	(UNDEFINED_LONGEST_PATH_LENGTH): Delete unused macro.
+	(get_free_state): Delete unused.
+	
+2007-02-09  Jan Hubicka  <jh@suse.cz>
+
+	* params.def (PARAM_INLINE_UNIT_GROWTH): Set to 30.
+	* doc/invoke.texi (inline-unit-growth): Update default value.
+
+	* Makefile.in (passes.o, ipa-inline.o): Add dependencies.
+	* cgraphbuild.c (build_cgraph_edges): Compute frequencies.
+	(rebuild_cgraph_edges): Likewise.
+	* cgraph.c (cgraph_set_call_stmt): Add new argument frequency.
+	(dump_cgraph_node): Dump frequencies.
+	(cgraph_clone_edge): Add frequency scales.
+	(cgraph_clone_node): Add freuqnecy.
+	* cgraph.h (cgraph_edge): Add freuqnecy argument.
+	(CGRAPH_FREQ_BASE, CGRAPH_FREQ_MAX): New constants.
+	(cgraph_create_edge, cgraph_clone_edge, cgraph_clone_node): Update.
+	* tree-pass.h (TODO_rebuild_frequencies): New constant.
+	* cgraphunit.c (verify_cgraph_node): Verify frequencies.
+	(cgraph_copy_node_for_versioning): Update call of cgraph_clone_edge.
+	(save_inline_function_body): Likewise.
+	* ipa-inline.c: inluce rtl.h
+	(cgraph_clone_inlined_nods): Update call of cgraph_clone_node.
+	(cgraph_edge_badness): Use frequencies.
+	(cgraph_decide_recursive_inlining): Update clonning.
+	(cgraph_decide_inlining_of_small_function): Dump frequency.
+	* predict.c (estimate_bb_frequencies): Export.
+	* predict.h (estimate_bb_frequencies): Declare.
+	* tree-inline.c (copy_bb): Watch overflows.
+	(expand_call_inline): Update call of cgraph_create_edge.
+	(optimize_inline_calls): Use TODO flags to update frequnecies.
+	* passes.h: Include predict.h
+	(init_optimization_passes): Move profile ahead.
+	(execute_function_todo): Handle TODO_rebuild_frequencies.
+
+2007-02-09  Roger Sayle  <roger@eyesopen.com>
+
+	* config/alpha/alpha.c (emit_insxl): Force the first operand of
+	the insbl or inswl pattern into a register.
+
+2007-02-09  Roger Sayle  <roger@eyesopen.com>
+
+	* config/ia64/ia64.md (bswapdi2): New define_insn.
+
 2007-02-09  Richard Henderson  <rth@redhat.com>
 
 	* config/i386/constraints.md (Ym): New constraint.


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]