This is the mail archive of the
gcc-regression@gcc.gnu.org
mailing list for the GCC project.
A recent patch increased GCC's memory consumption!
- From: gcctest at suse dot de
- To: jh at suse dot cz, gcc-regression at gcc dot gnu dot org
- Date: Thu, 17 May 2007 15:21:17 +0000
- Subject: A recent patch increased GCC's memory consumption!
Hi,
I am a friendly script caring about memory consumption in GCC. Please
contact jh@suse.cz if something is going wrong.
Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:
comparing empty function compilation at -O0 level:
Peak amount of GGC memory allocated before garbage collecting increased from 2328k to 2334k, overall 0.26%
Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
Amount of memory still referenced at the end of compilation increased from 2378k to 2385k, overall 0.31%
Overall memory needed: 7411k -> 7410k
Peak memory use before GGC: 2328k -> 2334k
Peak memory use after GGC: 2001k -> 2007k
Maximum of released memory in single GGC run: 327k
Garbage: 480k
Leak: 2378k -> 2385k
Overhead: 517k -> 518k
GGC runs: 3
comparing empty function compilation at -O0 -g level:
Peak amount of GGC memory allocated before garbage collecting increased from 2356k to 2362k, overall 0.25%
Peak amount of GGC memory still allocated after garbage collecting increased from 2028k to 2034k, overall 0.30%
Amount of memory still referenced at the end of compilation increased from 2410k to 2418k, overall 0.31%
Overall memory needed: 7427k -> 7426k
Peak memory use before GGC: 2356k -> 2362k
Peak memory use after GGC: 2028k -> 2034k
Maximum of released memory in single GGC run: 328k
Garbage: 482k
Leak: 2410k -> 2418k
Overhead: 521k -> 523k
GGC runs: 3
comparing empty function compilation at -O1 level:
Peak amount of GGC memory allocated before garbage collecting increased from 2328k to 2334k, overall 0.26%
Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
Amount of memory still referenced at the end of compilation increased from 2380k to 2387k, overall 0.31%
Overall memory needed: 7519k -> 7518k
Peak memory use before GGC: 2328k -> 2334k
Peak memory use after GGC: 2001k -> 2007k
Maximum of released memory in single GGC run: 327k
Garbage: 485k
Leak: 2380k -> 2387k
Overhead: 517k -> 519k
GGC runs: 3
comparing empty function compilation at -O2 level:
Peak amount of GGC memory allocated before garbage collecting increased from 2329k to 2335k, overall 0.26%
Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
Amount of memory still referenced at the end of compilation increased from 2380k to 2388k, overall 0.31%
Overall memory needed: 7527k -> 7526k
Peak memory use before GGC: 2329k -> 2335k
Peak memory use after GGC: 2001k -> 2007k
Maximum of released memory in single GGC run: 328k
Garbage: 489k
Leak: 2380k -> 2388k
Overhead: 518k -> 519k
GGC runs: 4
comparing empty function compilation at -O3 level:
Peak amount of GGC memory allocated before garbage collecting increased from 2329k to 2335k, overall 0.26%
Peak amount of GGC memory still allocated after garbage collecting increased from 2001k to 2007k, overall 0.30%
Amount of memory still referenced at the end of compilation increased from 2380k to 2388k, overall 0.31%
Overall memory needed: 7527k -> 7526k
Peak memory use before GGC: 2329k -> 2335k
Peak memory use after GGC: 2001k -> 2007k
Maximum of released memory in single GGC run: 328k
Garbage: 489k
Leak: 2380k -> 2388k
Overhead: 518k -> 519k
GGC runs: 4
comparing combine.c compilation at -O0 level:
Amount of memory still referenced at the end of compilation increased from 7003k to 7011k, overall 0.11%
Overall memory needed: 17671k -> 17682k
Peak memory use before GGC: 9019k -> 9025k
Peak memory use after GGC: 8259k -> 8265k
Maximum of released memory in single GGC run: 1874k
Garbage: 37665k -> 37649k
Leak: 7003k -> 7011k
Overhead: 4751k -> 4753k
GGC runs: 278
comparing combine.c compilation at -O0 -g level:
Amount of memory still referenced at the end of compilation increased from 9888k to 9903k, overall 0.16%
Overall memory needed: 19571k -> 19578k
Peak memory use before GGC: 10754k -> 10760k
Peak memory use after GGC: 9978k -> 9984k
Maximum of released memory in single GGC run: 1558k
Garbage: 38031k -> 38018k
Leak: 9888k -> 9903k
Overhead: 5457k -> 5459k
GGC runs: 269
comparing combine.c compilation at -O1 level:
Amount of produced GGC garbage increased from 52529k to 52599k, overall 0.13%
Amount of memory still referenced at the end of compilation increased from 7056k to 7063k, overall 0.10%
Overall memory needed: 30003k -> 30010k
Peak memory use before GGC: 17855k -> 17863k
Peak memory use after GGC: 17659k -> 17665k
Maximum of released memory in single GGC run: 1450k -> 1454k
Garbage: 52529k -> 52599k
Leak: 7056k -> 7063k
Overhead: 5924k -> 5931k
GGC runs: 357
comparing combine.c compilation at -O2 level:
Amount of produced GGC garbage increased from 69078k to 69719k, overall 0.93%
Overall memory needed: 34363k -> 34414k
Peak memory use before GGC: 17879k -> 17891k
Peak memory use after GGC: 17671k -> 17679k
Maximum of released memory in single GGC run: 1392k -> 1368k
Garbage: 69078k -> 69719k
Leak: 7175k -> 7178k
Overhead: 8023k -> 8062k
GGC runs: 413 -> 415
comparing combine.c compilation at -O3 level:
Peak amount of GGC memory still allocated after garbage collecting increased from 17826k to 17883k, overall 0.32%
Amount of produced GGC garbage increased from 94366k to 95405k, overall 1.10%
Amount of memory still referenced at the end of compilation increased from 7275k to 7285k, overall 0.13%
Overall memory needed: 40711k -> 40750k
Peak memory use before GGC: 18150k -> 18109k
Peak memory use after GGC: 17826k -> 17883k
Maximum of released memory in single GGC run: 3637k
Garbage: 94366k -> 95405k
Leak: 7275k -> 7285k
Overhead: 11304k -> 11329k
GGC runs: 444 -> 447
comparing insn-attrtab.c compilation at -O0 level:
Amount of produced GGC garbage increased from 129129k to 129385k, overall 0.20%
Overall memory needed: 92859k -> 92882k
Peak memory use before GGC: 58839k -> 58845k
Peak memory use after GGC: 33335k -> 33341k
Maximum of released memory in single GGC run: 33674k
Garbage: 129129k -> 129385k
Leak: 9840k -> 9607k
Overhead: 13888k -> 13889k
GGC runs: 216
comparing insn-attrtab.c compilation at -O0 -g level:
Overall memory needed: 94135k -> 94150k
Peak memory use before GGC: 60001k -> 60007k
Peak memory use after GGC: 34496k -> 34502k
Maximum of released memory in single GGC run: 33675k
Garbage: 129348k -> 129348k
Leak: 11548k -> 11556k
Overhead: 14285k -> 14287k
GGC runs: 212 -> 211
comparing insn-attrtab.c compilation at -O1 level:
Overall memory needed: 110311k -> 107726k
Peak memory use before GGC: 63396k -> 62277k
Peak memory use after GGC: 60770k -> 59776k
Maximum of released memory in single GGC run: 24882k -> 24268k
Garbage: 233061k -> 227713k
Leak: 9735k -> 9735k
Overhead: 26100k -> 25391k
GGC runs: 245 -> 246
comparing insn-attrtab.c compilation at -O2 level:
Overall memory needed: 169391k -> 165762k
Peak memory use before GGC: 63531k -> 62706k
Peak memory use after GGC: 61068k -> 60102k
Maximum of released memory in single GGC run: 21237k -> 20519k
Garbage: 269078k -> 263657k
Leak: 9728k -> 9726k
Overhead: 31676k -> 30976k
GGC runs: 266 -> 267
comparing insn-attrtab.c compilation at -O3 level:
Overall memory needed: 185135k -> 180774k
Peak memory use before GGC: 75553k -> 75338k
Peak memory use after GGC: 71473k -> 70907k
Maximum of released memory in single GGC run: 21970k -> 22137k
Garbage: 300284k -> 292829k
Leak: 9732k -> 9730k
Overhead: 32925k -> 32697k
GGC runs: 267
comparing Gerald's testcase PR8361 compilation at -O0 level:
Overall memory needed: 145852k -> 145853k
Peak memory use before GGC: 89184k -> 89190k
Peak memory use after GGC: 88301k -> 88307k
Maximum of released memory in single GGC run: 18130k
Garbage: 206731k -> 206715k
Leak: 51180k -> 51188k
Overhead: 23432k -> 23433k
GGC runs: 408
comparing Gerald's testcase PR8361 compilation at -O0 -g level:
Overall memory needed: 163568k -> 163569k
Peak memory use before GGC: 101740k -> 101746k
Peak memory use after GGC: 100733k -> 100738k
Maximum of released memory in single GGC run: 18433k -> 18434k
Garbage: 212416k -> 212400k
Leak: 74495k -> 74502k
Overhead: 29328k -> 29330k
GGC runs: 381
comparing Gerald's testcase PR8361 compilation at -O1 level:
Peak amount of GGC memory allocated before garbage collecting increased from 100612k to 100956k, overall 0.34%
Peak amount of GGC memory still allocated after garbage collecting increased from 99600k to 99956k, overall 0.36%
Amount of produced GGC garbage increased from 342613k to 343553k, overall 0.27%
Overall memory needed: 141736k -> 141721k
Peak memory use before GGC: 100612k -> 100956k
Peak memory use after GGC: 99600k -> 99956k
Maximum of released memory in single GGC run: 17471k
Garbage: 342613k -> 343553k
Leak: 51773k -> 51782k
Overhead: 30661k -> 30699k
GGC runs: 527 -> 530
comparing Gerald's testcase PR8361 compilation at -O2 level:
Amount of produced GGC garbage increased from 390024k to 391480k, overall 0.37%
Overall memory needed: 147180k -> 147205k
Peak memory use before GGC: 101401k -> 101412k
Peak memory use after GGC: 100396k -> 100402k
Maximum of released memory in single GGC run: 17468k
Garbage: 390024k -> 391480k
Leak: 52888k -> 52898k
Overhead: 36191k -> 36263k
GGC runs: 580 -> 581
comparing Gerald's testcase PR8361 compilation at -O3 level:
Amount of produced GGC garbage increased from 423975k to 426078k, overall 0.50%
Overall memory needed: 149484k -> 149561k
Peak memory use before GGC: 102996k -> 103006k
Peak memory use after GGC: 101975k -> 101981k
Maximum of released memory in single GGC run: 17912k -> 17872k
Garbage: 423975k -> 426078k
Leak: 53197k -> 53207k
Overhead: 38815k -> 38917k
GGC runs: 605
comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
Overall memory needed: 244575k -> 244578k
Peak memory use before GGC: 81022k -> 81028k
Peak memory use after GGC: 58761k -> 58767k
Maximum of released memory in single GGC run: 44134k
Garbage: 144429k -> 144417k
Leak: 7727k -> 7735k
Overhead: 23300k -> 23301k
GGC runs: 79
comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
Overall memory needed: 245395k -> 245402k
Peak memory use before GGC: 81668k -> 81674k
Peak memory use after GGC: 59407k -> 59413k
Maximum of released memory in single GGC run: 44123k
Garbage: 144474k -> 144534k
Leak: 9496k -> 9503k
Overhead: 23796k -> 23797k
GGC runs: 87
comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
Overall memory needed: 243847k -> 243930k
Peak memory use before GGC: 83517k -> 83514k
Peak memory use after GGC: 74903k -> 74909k
Maximum of released memory in single GGC run: 39415k -> 39406k
Garbage: 222967k -> 222953k
Leak: 20971k -> 20978k
Overhead: 29139k -> 29139k
GGC runs: 81
comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
Overall memory needed: 264011k -> 264214k
Peak memory use before GGC: 79889k -> 79895k
Peak memory use after GGC: 74903k -> 74909k
Maximum of released memory in single GGC run: 33022k -> 33018k
Garbage: 229691k -> 229676k
Leak: 21061k -> 21068k
Overhead: 31163k -> 31163k
GGC runs: 91
comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
Overall memory needed: 1297803k -> 1297770k
Peak memory use before GGC: 190662k -> 190668k
Peak memory use after GGC: 178178k -> 178184k
Maximum of released memory in single GGC run: 80664k
Garbage: 362947k -> 362940k
Leak: 46428k -> 46435k
Overhead: 43819k -> 43819k
GGC runs: 72
Head of the ChangeLog is:
--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 2007-05-16 21:12:47.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 2007-05-17 13:23:54.000000000 +0000
@@ -1,3 +1,98 @@
+2007-05-17 Zdenek Dvorak <dvorakz@suse.cz>
+
+ * tree-vrp.c (finalize_jump_threads): Do not care about dominance info.
+ (execute_vrp): Preserve loops through jump threading.
+ * tree-ssa-threadupdate.c (thread_single_edge,
+ dbds_continue_enumeration_p, determine_bb_domination_status,
+ thread_through_loop_header): New functions.
+ (create_edge_and_update_destination_phis,
+ create_edge_and_update_destination_phis): Set loops for the new blocks.
+ (prune_undesirable_thread_requests): Removed.
+ (redirect_edges): Do not pretend that redirect_edge_and_branch can
+ create new blocks.
+ (thread_block): Do not call prune_undesirable_thread_requests.
+ Update loops.
+ (mark_threaded_blocks): Select edges to thread here.
+ (thread_through_all_blocks): Take may_peel_loop_headers argument.
+ Thread edges through loop headers independently.
+ * cfgloopmanip.c (create_preheader, mfb_keep_just): Export.
+ * tree-pass.h (TODO_mark_first_instance): New.
+ (first_pass_instance): Declare.
+ * cfghooks.c (duplicate_block): Put the block to the original loop
+ if copy is not specified.
+ * tree-ssa-dom.c (tree_ssa_dominator_optimize): Preserve loops through
+ jump threading. Pass may_peel_loop_headers to
+ thread_through_all_blocks according to first_pass_instance.
+ * cfgloop.h (create_preheader): Declare.
+ * tree-flow.h (thread_through_all_blocks): Declaration changed.
+ * basic-block.h (mfb_keep_just, mfb_kj_edge): Declare.
+ * passes.c (first_pass_instance): New variable.
+ (next_pass_1): Set TODO_mark_first_instance.
+ (execute_todo): Set first_pass_instance.
+
+2007-05-17 Uros Bizjak <ubizjak@gmail.com>
+
+ PR tree-optimization/24659
+ * optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi,
+ OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi,
+ OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and
+ OTI_vec_pack_ufix_trunc.
+ (vec_unpacks_float_hi_optab): Define new macro.
+ (vec_unpacks_float_lo_optab): Ditto.
+ (vec_unpacku_float_hi_optab): Ditto.
+ (vec_unpacku_float_lo_optab): Ditto.
+ (vec_pack_sfix_trunc_optab): Ditto.
+ (vec_pack_ufix_trunc_optab): Ditto.
+ * genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab
+ and vec_pack_[s|u]fix_trunc_optab using
+ vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns
+ * tree-vectorizer.c (supportable_widening_operation): Handle
+ FLOAT_EXPR and CONVERT_EXPR. Update comment.
+ (supportable_narrowing_operation): New function.
+ * tree-vectorizer.h (supportable_narrowing_operation): Prototype.
+ * tree-vect-transform.c (vectorizable_conversion): Handle
+ (nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.
+ (vect_gen_widened_results_half): Move before vectorizable_conversion.
+ (vectorizable_type_demotion): Call supportable_narrowing_operation()
+ to check for target support.
+ * optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab
+ for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab
+ for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab
+ for VEC_PACK_FIX_TRUNC_EXPR.
+ (expand_binop): Special case mode of the result for
+ vec_pack_[u|s]fix_trunc_optab.
+ (init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and
+ vec_pack_[u|s]fix_trunc_optab.
+
+ * tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR,
+ VEC_PACK_FIX_TRUNC_EXPR): New tree codes.
+ * tree-pretty-print.c (dump_generic_node): Handle
+ VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and
+ VEC_PACK_FIX_TRUNC_EXPR.
+ (op_prio): Ditto.
+ * expr.c (expand_expr_real_1): Ditto.
+ * tree-inline.c (estimate_num_insns_1): Ditto.
+ * tree-vect-generic.c (expand_vector_operations_1): Ditto.
+
+ * config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander.
+ (vec_unpacks_float_lo_v8hi): Ditto.
+ (vec_unpacku_float_hi_v8hi): Ditto.
+ (vec_unpacku_float_lo_v8hi): Ditto.
+ (vec_unpacks_float_hi_v4si): Ditto.
+ (vec_unpacks_float_lo_v4si): Ditto.
+ (vec_pack_sfix_trunc_v2df): Ditto.
+
+ * doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]:
+ Document.
+ [VEC_UNPACK_FLOAT_LO_EXPR]: Ditto.
+ [VEC_PACK_FIX_TRUNC_EXPR]: Ditto.
+ * doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document.
+ [vec_pack_ufix_trunc]: Ditto.
+ [vec_unpacks_float_hi]: Ditto.
+ [vec_unpacks_float_lo]: Ditto.
+ [vec_unpacku_float_hi]: Ditto.
+ [vec_unpacku_float_lo]: Ditto.
+
2007-05-16 Uros Bizjak <ubizjak@gmail.com>
* soft-fp/README: Update for new files.
@@ -46,14 +141,15 @@
2007-05-16 Paolo Bonzini <bonzini@gnu.org>
- * config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
- calls as pure.
+ * config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
+ calls as pure.
2007-05-16 Eric Christopher <echristo@apple.com>
* config/rs6000/rs6000.c (rs6000_emit_prologue): Move altivec register
- saving after stack push. Set sp_offset whenever we push.
- (rs6000_emit_epilogue): Move altivec register restore before stack push.
+ saving after stack push. Set sp_offset whenever we push.
+ (rs6000_emit_epilogue): Move altivec register restore before
+ stack push.
2007-05-16 Richard Sandiford <richard@codesourcery.com>
@@ -496,7 +592,7 @@
dumps.
2007-05-08 Sandra Loosemore <sandra@codesourcery.com>
- Nigel Stephens <nigel@mips.com>
+ Nigel Stephens <nigel@mips.com>
* config/mips/mips.h (MAX_FPRS_PER_FMT): Renamed from FP_INC.
Update comments and all uses.
@@ -563,7 +659,7 @@
* configure: Regenerate.
* config.in: Regenerate.
-2007-05-07 Naveen.H.S <naveen.hs@kpitcummins.com>
+2007-05-07 Naveen.H.S <naveen.hs@kpitcummins.com>
* config/m32c/muldiv.md (mulhisi3_c): Limit the mode of the 2nd
operand to HI mode.
@@ -1062,7 +1158,7 @@
PR middle-end/22156
Temporarily revert:
2007-04-06 Andreas Tobler <a.tobler@schweiz.org>
- * tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
+ * tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
2007-04-05 Alexandre Oliva <aoliva@redhat.com>
* tree-sra.c (try_instantiate_multiple_fields): Needlessly
initialize align to silence bogus warning.
@@ -1274,17 +1370,17 @@
PR tree-optimization/30965
PR tree-optimization/30978
* Makefile.in (tree-ssa-forwprop.o): Depend on $(FLAGS_H).
- * tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
- (find_equivalent_equality_comparison): Likewise.
- (simplify_cond): Likewise.
- (get_prop_source_stmt): New helper.
- (get_prop_dest_stmt): Likewise.
+ * tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
+ (find_equivalent_equality_comparison): Likewise.
+ (simplify_cond): Likewise.
+ (get_prop_source_stmt): New helper.
+ (get_prop_dest_stmt): Likewise.
(can_propagate_from): Likewise.
(remove_prop_source_from_use): Likewise.
- (combine_cond_expr_cond): Likewise.
- (forward_propagate_comparison): New function.
- (forward_propagate_into_cond): Rewrite to use fold for
- tree combining.
+ (combine_cond_expr_cond): Likewise.
+ (forward_propagate_comparison): New function.
+ (forward_propagate_into_cond): Rewrite to use fold for
+ tree combining.
(tree_ssa_forward_propagate_single_use_vars): Call
forward_propagate_comparison to propagate comparisons.
The results can be reproduced by building a compiler with
--enable-gather-detailed-mem-stats targetting x86-64
and compiling preprocessed combine.c or testcase from PR8632 with:
-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in. Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.
Your testing script.