This is the mail archive of the
gcc-regression@gcc.gnu.org
mailing list for the GCC project.
A recent patch increased GCC's memory consumption!
- From: gcctest at suse dot de
- To: jh at suse dot cz, gcc-regression at gcc dot gnu dot org
- Date: Tue, 06 Feb 2007 16:19:38 +0000
- Subject: A recent patch increased GCC's memory consumption!
Hi,
I am a friendly script caring about memory consumption in GCC. Please
contact jh@suse.cz if something is going wrong.
Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:
comparing empty function compilation at -O0 level:
Amount of produced GGC garbage increased from 444k to 445k, overall 0.24%
Overall memory needed: 7383k
Peak memory use before GGC: 2264k -> 2265k
Peak memory use after GGC: 1955k
Maximum of released memory in single GGC run: 309k -> 310k
Garbage: 444k -> 445k
Leak: 2289k
Overhead: 456k -> 456k
GGC runs: 3
comparing empty function compilation at -O0 -g level:
Amount of produced GGC garbage increased from 447k to 448k, overall 0.24%
Overall memory needed: 7399k
Peak memory use before GGC: 2292k -> 2293k
Peak memory use after GGC: 1982k
Maximum of released memory in single GGC run: 310k -> 311k
Garbage: 447k -> 448k
Leak: 2321k
Overhead: 460k -> 461k
GGC runs: 3
comparing empty function compilation at -O1 level:
Amount of produced GGC garbage increased from 450k to 451k, overall 0.24%
Overall memory needed: 7495k
Peak memory use before GGC: 2264k -> 2265k
Peak memory use after GGC: 1955k
Maximum of released memory in single GGC run: 309k -> 310k
Garbage: 450k -> 451k
Leak: 2291k
Overhead: 456k -> 457k
GGC runs: 4
comparing empty function compilation at -O2 level:
Amount of produced GGC garbage increased from 453k to 454k, overall 0.23%
Overall memory needed: 7507k
Peak memory use before GGC: 2265k -> 2266k
Peak memory use after GGC: 1955k
Maximum of released memory in single GGC run: 310k -> 311k
Garbage: 453k -> 454k
Leak: 2291k
Overhead: 457k -> 457k
GGC runs: 4
comparing empty function compilation at -O3 level:
Amount of produced GGC garbage increased from 453k to 454k, overall 0.23%
Overall memory needed: 7507k
Peak memory use before GGC: 2265k -> 2266k
Peak memory use after GGC: 1955k
Maximum of released memory in single GGC run: 310k -> 311k
Garbage: 453k -> 454k
Leak: 2291k
Overhead: 457k -> 457k
GGC runs: 4
comparing combine.c compilation at -O0 level:
Overall memory needed: 17819k -> 17731k
Peak memory use before GGC: 9327k -> 9291k
Peak memory use after GGC: 8890k -> 8871k
Maximum of released memory in single GGC run: 2633k -> 2603k
Garbage: 37259k -> 37162k
Leak: 6538k -> 6539k
Overhead: 4655k -> 5024k
GGC runs: 280
comparing combine.c compilation at -O0 -g level:
Overall memory needed: 19723k -> 19823k
Peak memory use before GGC: 10916k -> 10897k
Peak memory use after GGC: 10550k -> 10531k
Maximum of released memory in single GGC run: 2393k -> 2375k
Garbage: 37829k -> 37738k
Leak: 9414k -> 9415k
Overhead: 5358k -> 5727k
GGC runs: 271 -> 270
comparing combine.c compilation at -O1 level:
Overall memory needed: 35295k -> 35163k
Peak memory use before GGC: 19563k -> 19412k
Peak memory use after GGC: 19361k -> 19206k
Maximum of released memory in single GGC run: 2216k -> 2196k
Garbage: 58407k -> 57400k
Leak: 6562k -> 6562k
Overhead: 6164k -> 6359k
GGC runs: 356 -> 349
comparing combine.c compilation at -O2 level:
Overall memory needed: 37719k -> 37547k
Peak memory use before GGC: 19568k -> 19448k
Peak memory use after GGC: 19374k -> 19245k
Maximum of released memory in single GGC run: 2208k -> 2188k
Garbage: 70384k -> 69101k
Leak: 6681k -> 6673k
Overhead: 7821k -> 7972k
GGC runs: 409 -> 406
comparing combine.c compilation at -O3 level:
Overall memory needed: 46039k -> 45631k
Peak memory use before GGC: 20766k -> 20534k
Peak memory use after GGC: 19876k -> 19743k
Maximum of released memory in single GGC run: 3154k -> 3126k
Garbage: 105050k -> 102995k
Leak: 6817k -> 6817k
Overhead: 12211k -> 12402k
GGC runs: 461 -> 458
comparing insn-attrtab.c compilation at -O0 level:
Overall memory needed: 104599k -> 103547k
Peak memory use before GGC: 70356k -> 69329k
Peak memory use after GGC: 45188k -> 44976k
Maximum of released memory in single GGC run: 37701k -> 36886k
Garbage: 131161k -> 130571k
Leak: 9581k -> 9588k
Overhead: 15666k -> 16932k
GGC runs: 206
comparing insn-attrtab.c compilation at -O0 -g level:
Overall memory needed: 106131k -> 105067k
Peak memory use before GGC: 71518k -> 70490k
Peak memory use after GGC: 46456k -> 46244k
Maximum of released memory in single GGC run: 37702k -> 36886k
Garbage: 132317k -> 131730k
Leak: 11271k -> 11278k
Overhead: 16060k -> 17326k
GGC runs: 206
comparing insn-attrtab.c compilation at -O1 level:
Overall memory needed: 149499k -> 148183k
Peak memory use before GGC: 86800k -> 86337k
Peak memory use after GGC: 80983k -> 80543k
Maximum of released memory in single GGC run: 33275k -> 33045k
Garbage: 269604k -> 264606k
Leak: 9398k -> 9404k
Overhead: 27282k -> 27590k
GGC runs: 226 -> 225
comparing insn-attrtab.c compilation at -O2 level:
Overall memory needed: 196727k -> 192999k
Peak memory use before GGC: 88116k -> 87648k
Peak memory use after GGC: 81051k -> 80609k
Maximum of released memory in single GGC run: 31604k -> 31388k
Garbage: 304746k -> 299523k
Leak: 9395k -> 9401k
Overhead: 32895k -> 33192k
GGC runs: 246 -> 245
comparing insn-attrtab.c compilation at -O3 level:
Overall memory needed: 196739k -> 191667k
Peak memory use before GGC: 88130k -> 87665k
Peak memory use after GGC: 81064k -> 80626k
Maximum of released memory in single GGC run: 31674k -> 31450k
Garbage: 305458k -> 300159k
Leak: 9400k -> 9407k
Overhead: 33096k -> 33388k
GGC runs: 246 -> 245
comparing Gerald's testcase PR8361 compilation at -O0 level:
Amount of produced GGC garbage increased from 209437k to 210317k, overall 0.42%
Amount of memory still referenced at the end of compilation increased from 49262k to 49389k, overall 0.26%
Overall memory needed: 151647k -> 151193k
Peak memory use before GGC: 92636k -> 92317k
Peak memory use after GGC: 91719k -> 91394k
Maximum of released memory in single GGC run: 18923k -> 18793k
Garbage: 209437k -> 210317k
Leak: 49262k -> 49389k
Overhead: 21554k -> 23720k
GGC runs: 409 -> 411
comparing Gerald's testcase PR8361 compilation at -O0 -g level:
Amount of produced GGC garbage increased from 216065k to 216936k, overall 0.40%
Amount of memory still referenced at the end of compilation increased from 72687k to 72814k, overall 0.17%
Overall memory needed: 169807k -> 169357k
Peak memory use before GGC: 105259k -> 104950k
Peak memory use after GGC: 104216k -> 103910k
Maximum of released memory in single GGC run: 19099k -> 18979k
Garbage: 216065k -> 216936k
Leak: 72687k -> 72814k
Overhead: 27476k -> 29643k
GGC runs: 383 -> 382
comparing Gerald's testcase PR8361 compilation at -O1 level:
Amount of memory still referenced at the end of compilation increased from 50086k to 50212k, overall 0.25%
Overall memory needed: 145159k -> 145039k
Peak memory use before GGC: 103343k -> 103059k
Peak memory use after GGC: 102259k -> 102031k
Maximum of released memory in single GGC run: 18066k -> 17981k
Garbage: 393904k -> 391802k
Leak: 50086k -> 50212k
Overhead: 30256k -> 34379k
GGC runs: 549 -> 541
comparing Gerald's testcase PR8361 compilation at -O2 level:
Amount of memory still referenced at the end of compilation increased from 50775k to 50901k, overall 0.25%
Overall memory needed: 146215k -> 146087k
Peak memory use before GGC: 103721k -> 103490k
Peak memory use after GGC: 102694k -> 102465k
Maximum of released memory in single GGC run: 18061k -> 17979k
Garbage: 434398k -> 431076k
Leak: 50775k -> 50901k
Overhead: 35395k -> 39569k
GGC runs: 601 -> 593
comparing Gerald's testcase PR8361 compilation at -O3 level:
Amount of memory still referenced at the end of compilation increased from 50863k to 50990k, overall 0.25%
Overall memory needed: 149403k -> 149003k
Peak memory use before GGC: 104990k -> 104751k
Peak memory use after GGC: 103914k -> 103682k
Maximum of released memory in single GGC run: 18492k -> 18300k
Garbage: 456007k -> 451163k
Leak: 50863k -> 50990k
Overhead: 37019k -> 41445k
GGC runs: 611 -> 606
comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
Overall memory needed: 246463k -> 246995k
Peak memory use before GGC: 82632k
Peak memory use after GGC: 59515k -> 59514k
Maximum of released memory in single GGC run: 45585k
Garbage: 148104k -> 147214k
Leak: 8082k -> 8082k
Overhead: 24864k -> 24807k
GGC runs: 80
comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
Overall memory needed: 247351k -> 247847k
Peak memory use before GGC: 83278k
Peak memory use after GGC: 60161k -> 60160k
Maximum of released memory in single GGC run: 45230k
Garbage: 148323k -> 147433k
Leak: 9338k -> 9338k
Overhead: 25359k -> 25303k
GGC runs: 88
comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
Overall memory needed: 260379k -> 258639k
Peak memory use before GGC: 104834k -> 104566k
Peak memory use after GGC: 101620k -> 101352k
Maximum of released memory in single GGC run: 51846k -> 51848k
Garbage: 241495k -> 240888k
Leak: 25176k
Overhead: 28735k -> 29476k
GGC runs: 79
comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
Overall memory needed: 532207k -> 531915k
Peak memory use before GGC: 104828k -> 104556k
Peak memory use after GGC: 101615k -> 101342k
Maximum of released memory in single GGC run: 37189k -> 37192k
Garbage: 273333k -> 272729k
Leak: 25605k
Overhead: 34774k -> 35516k
GGC runs: 91
comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
Overall memory needed: 1183239k -> 1182403k
Peak memory use before GGC: 201488k -> 200610k
Peak memory use after GGC: 189830k -> 188951k
Maximum of released memory in single GGC run: 80890k -> 80735k
Garbage: 373285k -> 371841k
Leak: 45260k
Overhead: 46802k -> 48290k
GGC runs: 70
Head of the ChangeLog is:
--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 2007-02-05 23:01:46.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 2007-02-06 14:35:59.000000000 +0000
@@ -1,3 +1,289 @@
+2006-02-06 Paolo Bonzini <bonzini@gnu.org>
+
+ * Makefile.in (tree-ssa-loop-ivopts.o): Add pointer-set.h dependency.
+ (tree-ssa-reassoc.o): Add pointer-set.h dependency.
+ (tree-cfg.o): Remove hashtab.h dependency.
+
+ * tree-ssa-loop-ivopts.c: Include pointer-set.h.
+ (struct ivopts_data): Change niters to pointer_map_t.
+ (struct nfe_cache_elt, nfe_hash, nfe_eq): Delete.
+ (niter_for_exit): Create pointer_map on demand. Change for
+ pointer_map API.
+ (tree_ssa_iv_optimize_init): Initialize data->niters to NULL.
+ (free_loop_data): Destroy data->niters if created and reset field.
+ (tree_ssa_iv_optimize_finalize): Don't delete data->niters here.
+ (tree_ssa_iv_optimize_loop): Check for presence of stale data.
+
+ * tree-ssa-reassoc.c: Include pointer-set.h.
+ (bb_rank): Change to long *.
+ (operand_rank): Change to pointer_map_t.
+ (find_operand_rank): Return long, -1 if not found. Declare as inline.
+ (insert_operand_rank): Accept long.
+ (operand_entry_hash, operand_entry_eq): Remove.
+ (get_rank): Return long. Adjust for changes above.
+ (init_reassoc): Change rank type to long. Adjust creation of bb_rank
+ and operand_rank.
+ (fini_reassoc): Delete operand_rank with pointer_map_destroy.
+
+ * tree-ssa-structalias.c (vi_for_tree): Change to pointer_map.
+ (struct tree_vi, tree_vi_t, tree_vi_hash, tree_vi_eq): Delete.
+ (insert_vi_for_tree): Rewrite for pointer_map API. Assert argument
+ is not NULL.
+ (lookup_vi_for_tree): Rewrite for pointer_map API. Return varinfo_t
+ directly since it cannot be NULL.
+ (get_vi_for_tree): Rewrite for pointer_map API.
+ (find_what_p_points_to): Adjust for change to lookup_vi_for_tree.
+ (init_alias_vars): Create vi_for_tree as pointer_map.
+ (delete_points_to_sets): Delete vi_for_tree using pointer_map_destroy.
+
+ * tree-cfg.c: Don't include hashtab.h.
+ (edge_to_cases): Declare as pointer_map.
+ (struct edge_to_cases_elt, edge_to_cases_hash, edge_to_cases_eq):
+ Delete.
+ (edge_to_cases_cleanup): Rewrite as pointer_map_traverse callback.
+ (start_recording_case_labels): Create edge_to_cases as pointer_map.
+ (end_recoding_case_labels): Cleanup edge_to_cases manually before
+ destroying it.
+ (record_switch_edge): Delete.
+ (get_cases_for_edge): Adjust for pointer_map API, inline
+ record_switch_edge (rewritten for new API), remove goto.
+
+2006-02-06 Paolo Bonzini <bonzini@gnu.org>
+
+ * Makefile.in (tree-nested.o): Add pointer-set.h dependency.
+ * tree-nested.c: Include pointer-set.h.
+ (var_map_elt, var_map_eq, var_map_hash): Delete.
+ (struct nesting_info): Remove GTY marker. Change the two htab_t's
+ to pointer_map_t's.
+ (nesting_info_bitmap_obstack): New.
+ (lookup_field_for_decl): Adjust for pointer_map API.
+ (lookup_tramp_for_decl): Adjust for pointer_map API.
+ (get_nonlocal_debug_decl): Adjust for pointer_map API.
+ (get_local_debug_decl): Adjust for pointer_map API.
+ (convert_nl_goto_reference): Adjust for pointer_map API.
+ (convert_nl_goto_receiver): Adjust for pointer_map API.
+ (create_nesting_tree): Create outside GGC space. Create bitmap on
+ the new obstack. Create field_map and var_map as pointer_maps.
+ (free_nesting_tree): Adjust for changes to create_nesting_tree.
+ (root): Delete.
+ (lower_nested_functions): Move root here, no need to NULL it.
+ Initialize and release the obstack.
+
+2007-02-06 Paolo Bonzini <bonzini@gnu.org>
+
+ * tree.c (tree_int_map_hash, tree_int_map_eq, tree_int_map_marked_p):
+ Remove prototypes and make them non-static.
+ (struct tree_int_map): Remove.
+ * tree.h (struct tree_int_map): Move here, turning TO into an
+ unsigned int.
+ (tree_int_map_hash, tree_int_map_eq, tree_int_map_marked_p): Declare.
+
+ * tree.h (TREE_COMPLEXITY): Remove.
+ (struct tree_exp): Remove complexity field.
+ * tree.c (build1_stat): Don't set it.
+
+2007-02-06 Dorit Nuzman <dorit@il.ibm.com>
+ Victor Kaplansky <victork@il.ibm.com>
+
+ * tree-vectorizer.c (vect_is_simple_use): Support induction.
+ (vect_is_simple_reduction): Support reduction with induction as
+ one of the operands.
+ (vect_is_simple_iv_evolution): Fix formatting.
+ * tree-vect-analyze.c (vect_mark_stmts_to_be_vectorized): Fix
+ formatting. Don't mark induction phis for vectorization.
+ (vect_analyze_scalar_cycles): Analyze all inductions, then reductions.
+ * tree-vect-transform.c (get_initial_def_for_induction): New function.
+ (vect_get_vec_def_for_operand): Support induction.
+ (vect_get_vec_def_for_stmt_copy): Fix formatting and add check for
+ induction case.
+ (vectorizable_reduction): Support reduction with induction as one of
+ the operands.
+ (vectorizable_type_demotion): Use def-type of stmt argument rather
+ than dummy def-type.
+
+ * tree-ssa-loop.c (gate_scev_const_prop): Return the value of
+ flag_tree_scev_cprop.
+ * common.opt (tree-scev-cprop): New flag.
+
+ * tree-vect-transform.c (vect_create_destination_var): Use 'kind' in
+ call to vect_get_new_vect_var.
+
+2007-02-06 Ira Rosen <irar@il.ibm.com>
+
+ * tree-vect-patterns.c (vect_recog_widen_mult_pattern): Check that
+ vectype is not NULL.
+ (vect_pattern_recog_1): Likewise.
+
+2007-02-05 Kaveh R. Ghazi <ghazi@caip.rutgers.edu>
+
+ * fold-const.c (negate_expr_p): Handle CONJ_EXPR.
+ (fold_negate_expr): Likewise.
+
+2007-02-05 Alexandre Oliva <aoliva@redhat.com>
+
+ PR debug/30189
+ * dwarf2out.c (modified_type_die): Follow DECL_ORIGINAL_TYPE
+ even if cv-qualification is the same.
+
+2007-02-05 Geoffrey Keating <geoffk@apple.com>
+
+ * config/rs6000/darwin-tramp.asm (__trampoline_setup): Call
+ __enable_execute_stack on completion.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_fldxf_k8, athlon_fld_k8,
+ athlon_fstxf_k8, athlon_fst_k8, athlon_fist, athlon_fmov,
+ athlon_fadd_load, athlon_fadd_load_k8, athlon_fadd, athlon_fmul,
+ athlon_fmul_load, athlon_fmul_load_k8, athlon_fsgn,
+ athlon_fdiv_load, athlon_fdiv_load_k8, athlon_fdiv_k8,
+ athlon_fpspc_load, athlon_fpspc, athlon_fcmov_load,
+ athlon_fcmov_load_k8, athlon_fcmov_k8, athlon_fcomi_load_k8,
+ athlon_fcomi, athlon_fcom_load_k8, athlon_fcom): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/i386.md (x86_sahf_1, cmpfp_i_mixed, cmpfp_i_sse,
+ cmpfp_i_i387, cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387,
+ swapsi, swaphi_1, swapqi_1, swapdi_rex64, fix_truncsfdi_sse,
+ fix_truncdfdi_sse, fix_truncsfsi_sse, fix_truncdfsi_sse,
+ x86_fldcw_1, floatsisf2_mixed, floatsisf2_sse, floatdisf2_mixed,
+ floatdisf2_sse, floatsidf2_mixed, floatsidf2_sse,
+ floatdidf2_mixed, floatdidf2_sse, muldi3_1_rex64, mulsi3_1,
+ mulsi3_1_zext, mulhi3_1, mulqi3_1, umulqihi3_1, mulqihi3_insn,
+ umulditi3_insn, umulsidi3_insn, mulditi3_insn, mulsidi3_insn,
+ umuldi3_highpart_rex64, umulsi3_highpart_insn,
+ umulsi3_highpart_zext, smuldi3_highpart_rex64,
+ smulsi3_highpart_insn, smulsi3_highpart_zext, x86_64_shld,
+ x86_shld_1, x86_64_shrd, sqrtsf2_mixed, sqrtsf2_sse,
+ sqrtsf2_i387, sqrtdf2_mixed, sqrtdf2_sse, sqrtdf2_i387,
+ sqrtextendsfdf2_i387, sqrtxf2, sqrtextendsfxf2_i387,
+ sqrtextenddfxf2_i387): Added amdfam10_decode.
+
+ * config/i386/athlon.md (athlon_idirect_amdfam10,
+ athlon_ivector_amdfam10, athlon_idirect_load_amdfam10,
+ athlon_ivector_load_amdfam10, athlon_idirect_both_amdfam10,
+ athlon_ivector_both_amdfam10, athlon_idirect_store_amdfam10,
+ athlon_ivector_store_amdfam10): New define_insn_reservation.
+ (athlon_idirect_loadmov, athlon_idirect_movstore): Added
+ amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_call_amdfam10,
+ athlon_pop_amdfam10, athlon_lea_amdfam10): New
+ define_insn_reservation.
+ (athlon_branch, athlon_push, athlon_leave_k8, athlon_imul_k8,
+ athlon_imul_k8_DI, athlon_imul_mem_k8, athlon_imul_mem_k8_DI,
+ athlon_idiv, athlon_idiv_mem, athlon_str): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_sseld_amdfam10,
+ athlon_mmxld_amdfam10, athlon_ssest_amdfam10,
+ athlon_mmxssest_short_amdfam10): New define_insn_reservation.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_sseins_amdfam10): New
+ define_insn_reservation.
+ * config/i386/i386.md (sseins): Added sseins to define_attr type
+ and define_attr unit.
+ * config/i386/sse.md: Set type attribute to sseins for insertq
+ and insertqi.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (sselog_load_amdfam10, sselog_amdfam10,
+ ssecmpvector_load_amdfam10, ssecmpvector_amdfam10,
+ ssecomi_load_amdfam10, ssecomi_amdfam10,
+ sseaddvector_load_amdfam10, sseaddvector_amdfam10): New
+ define_insn_reservation.
+ (ssecmp_load_k8, ssecmp, sseadd_load_k8, seadd): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (cvtss2sd_load_amdfam10,
+ cvtss2sd_amdfam10, cvtps2pd_load_amdfam10, cvtps2pd_amdfam10,
+ cvtsi2sd_load_amdfam10, cvtsi2ss_load_amdfam10,
+ cvtsi2sd_amdfam10, cvtsi2ss_amdfam10, cvtsd2ss_load_amdfam10,
+ cvtsd2ss_amdfam10, cvtpd2ps_load_amdfam10, cvtpd2ps_amdfam10,
+ cvtsX2si_load_amdfam10, cvtsX2si_amdfam10): New
+ define_insn_reservation.
+
+ * config/i386/sse.md (cvtsi2ss, cvtsi2ssq, cvtss2si,
+ cvtss2siq, cvttss2si, cvttss2siq, cvtsi2sd, cvtsi2sdq,
+ cvtsd2si, cvtsd2siq, cvttsd2si, cvttsd2siq,
+ cvtpd2dq, cvttpd2dq, cvtsd2ss, cvtss2sd,
+ cvtpd2ps, cvtps2pd): Added amdfam10_decode attribute.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_ssedivvector_amdfam10,
+ athlon_ssedivvector_load_amdfam10, athlon_ssemulvector_amdfam10,
+ athlon_ssemulvector_load_amdfam10): New define_insn_reservation.
+ (athlon_ssediv, athlon_ssediv_load_k8, athlon_ssemul,
+ athlon_ssemul_load_k8): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/i386.h (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL): New macro.
+ (x86_sse_unaligned_move_optimal): New variable.
+
+ * config/i386/i386.c (x86_sse_unaligned_move_optimal): Enable for
+ m_AMDFAM10.
+ (ix86_expand_vector_move_misalign): Add code to generate movupd/movups
+ for unaligned vector SSE double/single precision loads for AMDFAM10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/i386.h (TARGET_AMDFAM10): New macro.
+ (TARGET_CPU_CPP_BUILTINS): Add code for amdfam10.
+ Define TARGET_CPU_DEFAULT_amdfam10.
+ (TARGET_CPU_DEFAULT_NAMES): Add amdfam10.
+ (processor_type): Add PROCESSOR_AMDFAM10.
+
+ * config/i386/i386.md: Add amdfam10 as a new cpu attribute to match
+ processor_type in config/i386/i386.h.
+ Enable imul peepholes for TARGET_AMDFAM10.
+
+ * config.gcc: Add support for --with-cpu option for amdfam10.
+
+ * config/i386/i386.c (amdfam10_cost): New variable.
+ (m_AMDFAM10): New macro.
+ (m_ATHLON_K8_AMDFAM10): New macro.
+ (x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen,
+ x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop,
+ x86_promote_QImode, x86_integer_DFmode_moves,
+ x86_partial_reg_dependency, x86_memory_mismatch_stall,
+ x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387,
+ x86_sse_partial_reg_dependency, x86_sse_typeless_stores,
+ x86_use_ffreep, x86_use_incdec, x86_four_jump_limit,
+ x86_schedule, x86_use_bt, x86_cmpxchg16b, x86_pad_returns):
+ Enable/disable for amdfam10.
+ (override_options): Add amdfam10_cost to processor_target_table.
+ Set up PROCESSOR_AMDFAM10 for amdfam10 entry in
+ processor_alias_table.
+ (ix86_issue_rate): Add PROCESSOR_AMDFAM10.
+ (ix86_adjust_cost): Add code for amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/i386.opt: Add new Advanced Bit Manipulation (-mabm)
+ instruction set feature flag. Add new (-mpopcnt) flag for popcnt
+ instruction. Add new SSE4A (-msse4a) instruction set feature flag.
+ * config/i386/i386.h: Add builtin definition for SSE4A.
+ * config/i386/i386.md: Add support for ABM instructions
+ (popcnt and lzcnt).
+ * config/i386/sse.md: Add support for SSE4A instructions
+ (movntss, movntsd, extrq, insertq).
+ * config/i386/i386.c: Add support for ABM and SSE4A builtins.
+ Add -march=amdfam10 flag.
+ * config/i386/ammintrin.h: Add support for SSE4A intrinsics.
+ * doc/invoke.texi: Add documentation on flags for sse4a, abm, popcnt
+ and amdfam10.
+ * doc/extend.texi: Add documentation for SSE4A builtins.
+
2007-02-05 Bob Wilson <bob.wilson@acm.org>
* config/xtensa/xtensa.c (constantpool_mem_p): Skip over SUBREGs.
The results can be reproduced by building a compiler with
--enable-gather-detailed-mem-stats targetting x86-64
and compiling preprocessed combine.c or testcase from PR8632 with:
-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in. Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.
Your testing script.