GCC memory consumption increased by recent patch!

gcctest@suse.de gcctest@suse.de
Sun Jan 9 05:28:00 GMT 2005


Hi,
Comparing memory consumption on compilation of combine.i and generate-3.4.ii I got:


comparing combine.c compilation at -O0 level:
  Overall memory allocated via mmap and sbrk increased from 24397k to 24789k, overall 1.61%
  Peak amount of GGC memory allocated before garbage collecting increased from 9063k to 9353k, overall 3.20%
  Peak amount of GGC memory still allocated after garbage collectin increased from 8377k to 8667k, overall 3.46%
  Amount of memory still referenced at the end of compilation increased from 5862k to 6386k, overall 8.94%
    Overall memory needed: 24397k -> 24789k
    Peak memory use before GGC: 9063k -> 9353k
    Peak memory use after GGC: 8377k -> 8667k
    Maximum of released memory in single GGC run: 2864k
    Garbage: 41736k -> 41717k
    Leak: 5862k -> 6386k
    Overhead: 5544k -> 5778k
    GGC runs: 352 -> 328

comparing combine.c compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 25365k to 25733k, overall 1.45%
  Peak amount of GGC memory allocated before garbage collecting increased from 8951k to 9242k, overall 3.25%
  Peak amount of GGC memory still allocated after garbage collectin increased from 8450k to 8741k, overall 3.44%
  Amount of memory still referenced at the end of compilation increased from 6257k to 6782k, overall 8.38%
    Overall memory needed: 25365k -> 25733k
    Peak memory use before GGC: 8951k -> 9242k
    Peak memory use after GGC: 8450k -> 8741k
    Maximum of released memory in single GGC run: 2023k -> 2026k
    Garbage: 66632k -> 66618k
    Leak: 6257k -> 6782k
    Overhead: 10292k -> 10526k
    GGC runs: 544 -> 516

comparing combine.c compilation at -O2 level:
  Overall memory allocated via mmap and sbrk increased from 28857k to 29217k, overall 1.25%
  Peak amount of GGC memory allocated before garbage collecting increased from 12383k to 12674k, overall 2.35%
  Peak amount of GGC memory still allocated after garbage collectin increased from 12257k to 12548k, overall 2.37%
  Amount of memory still referenced at the end of compilation increased from 6081k to 6607k, overall 8.65%
    Overall memory needed: 28857k -> 29217k
    Peak memory use before GGC: 12383k -> 12674k
    Peak memory use after GGC: 12257k -> 12548k
    Maximum of released memory in single GGC run: 2533k
    Garbage: 80597k -> 80568k
    Leak: 6081k -> 6607k
    Overhead: 14111k -> 14340k
    GGC runs: 546 -> 518

comparing combine.c compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 31249k to 31601k, overall 1.13%
  Peak amount of GGC memory allocated before garbage collecting increased from 12635k to 12925k, overall 2.30%
  Peak amount of GGC memory still allocated after garbage collectin increased from 12257k to 12548k, overall 2.37%
  Amount of memory still referenced at the end of compilation increased from 6602k to 7126k, overall 7.94%
    Overall memory needed: 31249k -> 31601k
    Peak memory use before GGC: 12635k -> 12925k
    Peak memory use after GGC: 12257k -> 12548k
    Maximum of released memory in single GGC run: 3347k -> 3346k
    Garbage: 109084k -> 109059k
    Leak: 6602k -> 7126k
    Overhead: 18976k -> 19205k
    GGC runs: 613 -> 583

comparing insn-attrtab.c compilation at -O0 level:
  Overall memory allocated via mmap and sbrk increased from 117268k to 117556k, overall 0.25%
  Peak amount of GGC memory allocated before garbage collecting increased from 77779k to 78070k, overall 0.37%
  Peak amount of GGC memory still allocated after garbage collectin increased from 45259k to 45550k, overall 0.64%
  Amount of memory still referenced at the end of compilation increased from 10432k to 10957k, overall 5.02%
    Overall memory needed: 117268k -> 117556k
    Peak memory use before GGC: 77779k -> 78070k
    Peak memory use after GGC: 45259k -> 45550k
    Maximum of released memory in single GGC run: 42606k
    Garbage: 159302k -> 159286k
    Leak: 10432k -> 10957k
    Overhead: 20571k -> 20804k
    GGC runs: 294 -> 274

comparing insn-attrtab.c compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 128524k to 128772k, overall 0.19%
  Peak amount of GGC memory allocated before garbage collecting increased from 83296k to 83587k, overall 0.35%
  Peak amount of GGC memory still allocated after garbage collectin increased from 69015k to 69306k, overall 0.42%
  Amount of memory still referenced at the end of compilation increased from 10778k to 11302k, overall 4.86%
    Overall memory needed: 128524k -> 128772k
    Peak memory use before GGC: 83296k -> 83587k
    Peak memory use after GGC: 69015k -> 69306k
    Maximum of released memory in single GGC run: 40616k
    Garbage: 441380k -> 441383k
    Leak: 10778k -> 11302k
    Overhead: 77157k -> 77391k
    GGC runs: 429 -> 406

comparing insn-attrtab.c compilation at -O2 level:
  Overall memory allocated via mmap and sbrk increased from 153256k to 153428k, overall 0.11%
  Peak amount of GGC memory allocated before garbage collecting increased from 99056k to 99346k, overall 0.29%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84176k to 84467k, overall 0.35%
  Amount of memory still referenced at the end of compilation increased from 10702k to 11226k, overall 4.90%
    Overall memory needed: 153256k -> 153428k
    Peak memory use before GGC: 99056k -> 99346k
    Peak memory use after GGC: 84176k -> 84467k
    Maximum of released memory in single GGC run: 41525k -> 41524k
    Garbage: 487928k -> 487926k
    Leak: 10702k -> 11226k
    Overhead: 85258k -> 85492k
    GGC runs: 363 -> 342

comparing insn-attrtab.c compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 153120k to 153412k, overall 0.19%
  Peak amount of GGC memory allocated before garbage collecting increased from 99057k to 99348k, overall 0.29%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84178k to 84468k, overall 0.34%
  Amount of memory still referenced at the end of compilation increased from 10744k to 11268k, overall 4.88%
    Overall memory needed: 153120k -> 153412k
    Peak memory use before GGC: 99057k -> 99348k
    Peak memory use after GGC: 84178k -> 84468k
    Maximum of released memory in single GGC run: 41525k
    Garbage: 489130k -> 489108k
    Leak: 10744k -> 11268k
    Overhead: 85417k -> 85650k
    GGC runs: 373 -> 349

comparing Gerald's testcase PR8361 compilation at -O0 level:
  Overall memory allocated via mmap and sbrk increased from 110800k to 111100k, overall 0.27%
  Peak amount of GGC memory allocated before garbage collecting increased from 86610k to 86891k, overall 0.32%
  Peak amount of GGC memory still allocated after garbage collectin increased from 85642k to 85940k, overall 0.35%
  Amount of memory still referenced at the end of compilation increased from 54976k to 55500k, overall 0.95%
    Overall memory needed: 110800k -> 111100k
    Peak memory use before GGC: 86610k -> 86891k
    Peak memory use after GGC: 85642k -> 85940k
    Maximum of released memory in single GGC run: 19292k -> 19283k
    Garbage: 246356k -> 246357k
    Leak: 54976k -> 55500k
    Overhead: 43121k -> 43355k
    GGC runs: 368

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 103709k to 104029k, overall 0.31%
  Peak amount of GGC memory allocated before garbage collecting increased from 85639k to 85973k, overall 0.39%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84690k to 84937k, overall 0.29%
  Amount of memory still referenced at the end of compilation increased from 56722k to 57247k, overall 0.92%
    Overall memory needed: 103709k -> 104029k
    Peak memory use before GGC: 85639k -> 85973k
    Peak memory use after GGC: 84690k -> 84937k
    Maximum of released memory in single GGC run: 18904k -> 18947k
    Garbage: 466107k -> 466130k
    Leak: 56722k -> 57247k
    Overhead: 66911k -> 67141k
    GGC runs: 554 -> 550

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Overall memory allocated via mmap and sbrk increased from 103765k to 104089k, overall 0.31%
  Peak amount of GGC memory allocated before garbage collecting increased from 85639k to 85973k, overall 0.39%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84691k to 84937k, overall 0.29%
  Amount of memory still referenced at the end of compilation increased from 57302k to 57825k, overall 0.91%
    Overall memory needed: 103765k -> 104089k
    Peak memory use before GGC: 85639k -> 85973k
    Peak memory use after GGC: 84691k -> 84937k
    Maximum of released memory in single GGC run: 18904k -> 18947k
    Garbage: 500471k -> 500440k
    Leak: 57302k -> 57825k
    Overhead: 76378k -> 76603k
    GGC runs: 597 -> 592

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 111453k to 111853k, overall 0.36%
  Peak amount of GGC memory allocated before garbage collecting increased from 92444k to 92711k, overall 0.29%
  Peak amount of GGC memory still allocated after garbage collectin increased from 85893k to 86231k, overall 0.39%
  Amount of memory still referenced at the end of compilation increased from 57627k to 58143k, overall 0.90%
    Overall memory needed: 111453k -> 111853k
    Peak memory use before GGC: 92444k -> 92711k
    Peak memory use after GGC: 85893k -> 86231k
    Maximum of released memory in single GGC run: 19736k -> 19713k
    Garbage: 520419k -> 520391k
    Leak: 57627k -> 58143k
    Overhead: 78218k -> 78451k
    GGC runs: 580 -> 578

Head of changelog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2005-01-08 22:58:26.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2005-01-09 04:26:40.000000000 +0000
@@ -1,3 +1,152 @@
+2005-01-08  David Edelsohn  <edelsohn@gnu.org>
+
+	* config/i386/i386.md (addhi_4): Correct reference in comment.
+	(addqi_4): Same.
+
+2005-01-08  Richard Henderson  <rth@redhat.com>
+
+	* config/i386/emmintrin.h (_mm_cvtsi128_si32): Move earlier.
+	(_mm_cvtsi128_si64x): Likewise.
+	(_mm_srl_epi64, _mm_srl_epi32, _mm_srl_epi16, _mm_sra_epi32,
+	_mm_sra_epi16, _mm_sll_epi64, _mm_sll_epi32, _mm_sll_epi16): Use
+	the _mm_{srl,sll}i_foo counterpart, and _mm_cvtsi128_si32.
+	* config/i386/i386-modes.def: Add V16HI, V32QI, V4DF, V8SF.
+	* config/i386/i386-protos.h: Update.
+	* config/i386/i386.c (print_operand): Add 'H'.
+	(ix86_fixup_binary_operands): Split out from ...
+	(ix86_expand_binary_operator): ... here.
+	(ix86_fixup_binary_operands_no_copy): New.
+	(ix86_expand_fp_absneg_operator): Handle vector mode results.
+	(bdesc_2arg): Update names for sse{,2,3}_ prefixes.
+	(ix86_init_mmx_sse_builtins): Remove *maskncmp* special cases.
+	(safe_vector_operand): Use CONST0_RTX.
+	(ix86_expand_binop_builtin): Use ix86_fixup_binary_operands.
+	(ix86_expand_builtin): Merge CODE_FOR_sse2_maskmovdqu_rex64 and
+	CODE_FOR_sse2_maskmovdqu.  Special case SSE version of MASKMOVDQU
+	expansion.  Update names for sse{,2,3}_ prefixes.  Remove *maskncmp*
+	special cases.
+	* config/i386/i386.h (IX86_BUILTIN_CMPNGTSS): New.
+	(IX86_BUILTIN_CMPNGESS): New.
+	* config/i386/i386.md (UNSPEC_FIX_NOTRUNC): New.
+	(attr type): Add sselog1.
+	(attr unit, attr memory): Handle it.
+	(movti, movti_internal, movti_rex64): Move near other integer moves.
+	(movtf, movtf_internal): Move near other fp moves.
+	(SSEMODE, SSEMODEI, vec_setv2df, vec_extractv2df, vec_initv2df,
+	vec_setv4sf, vec_extractv4sf, vec_initv4sf, movv4sf, movv4sf_internal,
+	movv2df, movv2df_internal, mov<SSEMODEI>, mov<SSEMODEI>_internal, 
+	movmisalign<SSEMODE>, sse_movups_1, sse_movmskps, sse_movntv4sf,
+	sse_movhlps, sse_movlhps, sse_storehps, sse_loadhps, sse_storelps,
+	sse_loadlps, sse_loadss, sse_loadss_1, sse_movss, sse_storess,
+	sse_shufps, addv4sf3, vmaddv4sf3, subv4sf3, vmsubv4sf3, negv4sf2,
+	mulv4sf3, vmmulv4sf3, divv4sf3, vmdivv4sf3, rcpv4sf2, vmrcpv4sf2,
+	rsqrtv4sf2, vmrsqrtv4sf2, sqrtv4sf2, vmsqrtv4sf2, sse_andv4sf3,
+	sse_nandv4sf3, sse_iorv4sf3, sse_xorv4sf3, sse2_andv2df3, 
+	sse2_nandv2df3, sse2_iorv2df3, sse2_xorv2df3, sse2_andv2di3, 
+	sse2_nandv2di3, sse2_iorv2di3, sse2_xorv2di3, maskcmpv4sf3, 
+	vmmaskcmpv4sf3, sse_comi, sse_ucomi, sse_unpckhps, sse_unpcklps,
+	smaxv4sf3, vmsmaxv4sf3, sminv4sf3, vmsminv4sf3, cvtpi2ps, cvtps2pi,
+	cvttps2pi, cvtsi2ss, cvtsi2ssq, cvtss2si, cvtss2siq, cvttss2si,
+	cvttss2siq, addv2df3, vmaddv2df3, subv2df3, vmsubv2df3, mulv2df3,
+	vmmulv2df3, divv2df3, vmdivv2df3, smaxv2df3, vmsmaxv2df3, sminv2df3,
+	vmsminv2df3, sqrtv2df2, vmsqrtv2df2, maskcmpv2df3, vmmaskcmpv2df3,
+	sse2_comi, sse2_ucomi, sse2_movmskpd, sse2_pmovmskb, sse2_maskmovdqu,
+	sse2_maskmovdqu_rex64, sse2_movntv2df, sse2_movntv2di, sse2_movntsi,
+	cvtdq2ps, cvtps2dq, cvttps2dq, cvtdq2pd, cvtpd2dq, cvttpd2dq,
+	cvtpd2pi, cvttpd2pi, cvtpi2pd, cvtsd2si, cvtsd2siq, cvttsd2si,
+	cvttsd2siq, cvtsi2sd, cvtsi2sdq, cvtsd2ss, cvtss2sd, cvtpd2ps,
+	cvtps2pd, addv16qi3, addv8hi3, addv4si3, addv2di3, ssaddv16qi3,
+	ssaddv8hi3, usaddv16qi3, usaddv8hi3, subv16qi3, subv8hi3, subv4si3,
+	subv2di3, sssubv16qi3, sssubv8hi3, ussubv16qi3, ussubv8hi3, mulv8hi3,
+	smulv8hi3_highpart, umulv8hi3_highpart, sse2_umulsidi3,
+	sse2_umulv2siv2di3, sse2_pmaddwd, sse2_uavgv16qi3, sse2_uavgv8hi3,
+	sse2_psadbw, sse2_pinsrw, sse2_pextrw, sse2_pshufd, sse2_pshuflw,
+	sse2_pshufhw, eqv16qi3, eqv8hi3, eqv4si3, gtv16qi3, gtv8hi3, 
+	gtv4si3, umaxv16qi3, smaxv8hi3, uminv16qi3, sminv8hi3, ashrv8hi3,
+	ashrv4si3, lshrv8hi3, lshrv4si3, lshrv2di3, ashlv8hi3, ashlv4si3,
+	ashlv2di3, sse2_ashlti3, sse2_lshrti3, sse2_unpckhpd, sse2_unpcklpd,
+	sse2_packsswb, sse2_packssdw, sse2_packuswb, sse2_punpckhbw, 
+	sse2_punpckhwd, sse2_punpckhdq, sse2_punpcklbw, sse2_punpcklwd,
+	sse2_punpckldq, sse2_punpcklqdq, sse2_punpckhqdq, sse2_movupd,
+	sse2_movdqu, sse2_movdq2q, sse2_movdq2q_rex64, sse2_movq2dq, 
+	sse2_movq2dq_rex64, sse2_loadd, sse2_stored, sse2_storehpd,
+	sse2_loadhpd, sse2_storelpd, sse2_loadlpd, sse2_movsd, sse2_loadsd,
+	sse2_loadsd_1, sse2_storesd, sse2_shufpd, sse2_clflush, sse2_mfence,
+	mfence_insn, sse2_lfence, lfence_insn, mwait, monitor, addsubv4sf3,
+	addsubv2df3, haddv4sf3, haddv2df3, hsubv4sf3, hsubv2df3, movshdup,
+	movsldup, lddqu, loadddup, movddup): Move to sse.md.  Any with
+	non-optabs meanings renamed with an "sse{,2,3}_" prefix at the
+	same time.
+	(SSEPUSH, push<SSEPUSH>): Remove.
+	(MMXPUSH, push<MMXPUSH>): Remove.
+	(sse_movaps, sse_movaps_1, sse_movups): Remove.
+	(sse2_movapd, sse2_movdqa, sse2_movq): Remove.
+	(sse2_andti3, sse2_nandti3, sse2_iorti3, sse2_xorti3): Remove.
+	(sse_clrv4sf, sse_clrv2df, sse2_clrti): Remove.
+	(maskncmpv4sf3, vmmaskncmpv4sf3): Remove.
+	(maskncmpv2df3, vmmaskncmpv2df3): Remove.
+	(ashrv8hi3_ti, ashrv4si3_ti, lshrv8hi3_ti, lshrv4si3_ti): Remove.
+	(lshrv2di3_ti, ashlv8hi3_ti, ashlv4si3_ti, ashlv2di3_ti): Remove.
+	* config/i386/athlon.md (athlon_sselog_load): Handle sselog1.
+	(athlon_sselog_load_k8, athlon_sselog, athlon_sselog_k8): Likewise.
+	* config/i386/ppro.md (ppro_sse_div_V4SF_load): Fix memory attr.
+	(ppro_sse_log_V4SF_load): Similarly.  Handle sselog1.
+	(ppro_sse_log_V4SF): Handle sselog1.
+	* config/i386/predicates.md (const_0_to_1_operand): New.
+	(const_0_to_255_mul_8_operand): New.
+	(const_1_to_31_operand): Rename from const_int_1_31_operand.
+	(const_2_to_3_operand, const_4_to_7_operand): New.
+	* config/i386/sse.md: New file.
+	(SSEMODE12, SSEMODE24, SSEMODE124, SSEMODE248, ssevecsize): New.
+	(sse_movups): Rename from sse_movups_1.
+	(sse_loadlss): Rename from sse_loadss_1.
+	(andv4sf3, iorv4sf3, xorv4sf3, andv2df3): Remove the sse prefix
+	from the name.
+	(negv4sf2): Use ix86_expand_fp_absneg_operator.
+	(absv4sf2, negv2df, absv2df): New.
+	(addv4sf3): Add expander to call ix86_fixup_binary_operands_no_copy.
+	(subv4sf3, mulv4sf3, divv4sf3, smaxv4sf3, sminv4sf3, andv4sf3,
+	iorv4sf3, xorv4sf3, addv2df3, subv2df3, mulv2df3, divv2df3,
+	smaxv2df3, sminv2df3, andv2df3, iorv2df3, xorv2df3, mulv8hi3,
+	umaxv16qi3, smaxv8hi3, uminv16qi3, sminv8hi3): Likewise.
+	(sse3_addsubv4sf3): Model correctly.
+	sse3_haddv4sf3, sse3_hsubv4sf3, sse3_addsubv2df3, sse3_haddv2df3,
+	sse3_hsubv2df3, sse2_ashlti3, sse2_lshrti3): Likewise.
+	(sse_movhlps): Model with vec_select+vec_concat.
+	(sse_movlhps, sse_unpckhps, sse_unpcklps, sse3_movshdup, 
+	sse3_movsldup, sse_shufps, sse_shufps_1, sse2_unpckhpd, sse3_movddup,
+	sse2_unpcklpd, sse2_shufpd, sse2_shufpd_1, sse2_punpckhbw,
+	sse2_punpcklbw, sse2_punpckhwd, sse2_punpcklwd, sse2_punpckhdq,
+	sse2_punpckldq, sse2_punpckhqdq, sse2_punpcklqdq, sse2_pshufd,
+	sse2_pshufd_1, sse2_pshuflw, sse2_pshuflw_1, sse2_pshufhw, 
+	sse2_pshufhw_1): Likewise.
+	(neg<SSEMODEI>2, one_cmpl<SSEMODEI>2): New.
+	(add<SSEMODEI>3, sse2_ssadd<SSEMODE12>3, sse2_usadd<SSEMODE12>3,
+	sub<SSEMODEI>3, sse2_sssub<SSEMODE12>3, sse2_ussub<SSEMODE12>3,
+	ashr<SSEMODE24>3, lshr<SSEMODE248>3, sse2_eq<SSEMODE124>3,
+	sse2_gt<SSEMODDE124>3, and<SSEMODEI>3, sse_nand<SSEMODEI>3,
+	ior<SSEMODEI>3, xor<SSEMODEI>3): Macroize from existing patterns.	
+	(addv4sf3, sse_vmaddv4sf3, mulv4sf3, sse_vmmulv4sf3, smaxv4sf3,
+	sse_vmsmaxv4sf3, sminv4sf3, sse_vmsminv4sf3, addv2df3, sse2_vmaddv2df3,
+	mulv2df3, sse2_vmmulv2df3, smaxv2df3, sse2_vmsmaxv2df3, sminv2df3,
+	sse2_vmsminv2df3, umaxv16qi3, smaxv8hi3, uminv16qi3
+	sminv8hi3): Mark commutative
+	operands.  Use ix86_binary_operator_ok.
+	(sse_unpckhps, sse_unpcklps, sse2_packsswb, sse2_packssdw,
+	sse2_packuswb, sse2_punpckhbw, sse2_punpcklbw, sse2_punpckhwd,
+	sse2_punpcklwd, sse2_punpckhdq, sse2_punpckldq, sse2_punpckhqdq,
+	sse2_punpcklqdq): Allow operand2 in memory.
+	(sse_movhlps, sse_movlhps, sse2_unpckhpd, sse2_unpcklpd
+	sse2_movsd): Add memory alternatives.
+	(sse_storelps): Turn expander into an insn; split after reload.
+	(sse_storess, sse2_loadhpd, sse2_loadlpd): Add non-xmm inputs.
+	(sse2_storehpd, sse2_storelpd): Add non-xmm outputs.
+
+2005-01-08  Eric Botcazou  <ebotcazou@libertysurf.fr>
+
+	* configure.ac (DWARF-2 debug_line): Use objdump.
+	* configure: Regenerate.
+
 2005-01-08  Jeff Law  <law@redhat.com>
 	    Diego Novillo  <dnovillo@redhat.com>
 

I am friendly script caring about memory consumption in GCC.  Please contact
jh@suse.cz if something is going wrong.

The results can be reproduced by building compiler with
--enable-gather-detailed-mem-stats targetting x86-64 and compiling preprocessed
combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing of
the places they are allocated in.  Peak memory consumption is actually computed
by looking for maximal value in {GC XXXX -> YYYY} report.

Yours testing script.



More information about the Gcc-regression mailing list