This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A recent patch increased GCC's memory consumption!


Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing empty function compilation at -O0 level:
  Amount of produced GGC garbage increased from 444k to 445k, overall 0.24%
    Overall memory needed: 7383k
    Peak memory use before GGC: 2264k -> 2265k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 309k -> 310k
    Garbage: 444k -> 445k
    Leak: 2289k
    Overhead: 456k -> 456k
    GGC runs: 3

comparing empty function compilation at -O0 -g level:
  Amount of produced GGC garbage increased from 447k to 448k, overall 0.24%
    Overall memory needed: 7399k
    Peak memory use before GGC: 2292k -> 2293k
    Peak memory use after GGC: 1982k
    Maximum of released memory in single GGC run: 310k -> 311k
    Garbage: 447k -> 448k
    Leak: 2321k
    Overhead: 460k -> 461k
    GGC runs: 3

comparing empty function compilation at -O1 level:
  Amount of produced GGC garbage increased from 450k to 451k, overall 0.24%
    Overall memory needed: 7495k
    Peak memory use before GGC: 2264k -> 2265k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 309k -> 310k
    Garbage: 450k -> 451k
    Leak: 2291k
    Overhead: 456k -> 457k
    GGC runs: 4

comparing empty function compilation at -O2 level:
  Amount of produced GGC garbage increased from 453k to 454k, overall 0.23%
    Overall memory needed: 7507k
    Peak memory use before GGC: 2265k -> 2266k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 310k -> 311k
    Garbage: 453k -> 454k
    Leak: 2291k
    Overhead: 457k -> 457k
    GGC runs: 4

comparing empty function compilation at -O3 level:
  Amount of produced GGC garbage increased from 453k to 454k, overall 0.23%
    Overall memory needed: 7507k
    Peak memory use before GGC: 2265k -> 2266k
    Peak memory use after GGC: 1955k
    Maximum of released memory in single GGC run: 310k -> 311k
    Garbage: 453k -> 454k
    Leak: 2291k
    Overhead: 457k -> 457k
    GGC runs: 4

comparing combine.c compilation at -O0 level:
    Overall memory needed: 17819k -> 17731k
    Peak memory use before GGC: 9327k -> 9291k
    Peak memory use after GGC: 8890k -> 8871k
    Maximum of released memory in single GGC run: 2633k -> 2603k
    Garbage: 37259k -> 37162k
    Leak: 6538k -> 6539k
    Overhead: 4655k -> 5024k
    GGC runs: 280

comparing combine.c compilation at -O0 -g level:
    Overall memory needed: 19723k -> 19823k
    Peak memory use before GGC: 10916k -> 10897k
    Peak memory use after GGC: 10550k -> 10531k
    Maximum of released memory in single GGC run: 2393k -> 2375k
    Garbage: 37829k -> 37738k
    Leak: 9414k -> 9415k
    Overhead: 5358k -> 5727k
    GGC runs: 271 -> 270

comparing combine.c compilation at -O1 level:
    Overall memory needed: 35295k -> 35163k
    Peak memory use before GGC: 19563k -> 19412k
    Peak memory use after GGC: 19361k -> 19206k
    Maximum of released memory in single GGC run: 2216k -> 2196k
    Garbage: 58407k -> 57400k
    Leak: 6562k -> 6562k
    Overhead: 6164k -> 6359k
    GGC runs: 356 -> 349

comparing combine.c compilation at -O2 level:
    Overall memory needed: 37719k -> 37547k
    Peak memory use before GGC: 19568k -> 19448k
    Peak memory use after GGC: 19374k -> 19245k
    Maximum of released memory in single GGC run: 2208k -> 2188k
    Garbage: 70384k -> 69101k
    Leak: 6681k -> 6673k
    Overhead: 7821k -> 7972k
    GGC runs: 409 -> 406

comparing combine.c compilation at -O3 level:
    Overall memory needed: 46039k -> 45631k
    Peak memory use before GGC: 20766k -> 20534k
    Peak memory use after GGC: 19876k -> 19743k
    Maximum of released memory in single GGC run: 3154k -> 3126k
    Garbage: 105050k -> 102995k
    Leak: 6817k -> 6817k
    Overhead: 12211k -> 12402k
    GGC runs: 461 -> 458

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 104599k -> 103547k
    Peak memory use before GGC: 70356k -> 69329k
    Peak memory use after GGC: 45188k -> 44976k
    Maximum of released memory in single GGC run: 37701k -> 36886k
    Garbage: 131161k -> 130571k
    Leak: 9581k -> 9588k
    Overhead: 15666k -> 16932k
    GGC runs: 206

comparing insn-attrtab.c compilation at -O0 -g level:
    Overall memory needed: 106131k -> 105067k
    Peak memory use before GGC: 71518k -> 70490k
    Peak memory use after GGC: 46456k -> 46244k
    Maximum of released memory in single GGC run: 37702k -> 36886k
    Garbage: 132317k -> 131730k
    Leak: 11271k -> 11278k
    Overhead: 16060k -> 17326k
    GGC runs: 206

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 149499k -> 148183k
    Peak memory use before GGC: 86800k -> 86337k
    Peak memory use after GGC: 80983k -> 80543k
    Maximum of released memory in single GGC run: 33275k -> 33045k
    Garbage: 269604k -> 264606k
    Leak: 9398k -> 9404k
    Overhead: 27282k -> 27590k
    GGC runs: 226 -> 225

comparing insn-attrtab.c compilation at -O2 level:
    Overall memory needed: 196727k -> 192999k
    Peak memory use before GGC: 88116k -> 87648k
    Peak memory use after GGC: 81051k -> 80609k
    Maximum of released memory in single GGC run: 31604k -> 31388k
    Garbage: 304746k -> 299523k
    Leak: 9395k -> 9401k
    Overhead: 32895k -> 33192k
    GGC runs: 246 -> 245

comparing insn-attrtab.c compilation at -O3 level:
    Overall memory needed: 196739k -> 191667k
    Peak memory use before GGC: 88130k -> 87665k
    Peak memory use after GGC: 81064k -> 80626k
    Maximum of released memory in single GGC run: 31674k -> 31450k
    Garbage: 305458k -> 300159k
    Leak: 9400k -> 9407k
    Overhead: 33096k -> 33388k
    GGC runs: 246 -> 245

comparing Gerald's testcase PR8361 compilation at -O0 level:
  Amount of produced GGC garbage increased from 209437k to 210317k, overall 0.42%
  Amount of memory still referenced at the end of compilation increased from 49262k to 49389k, overall 0.26%
    Overall memory needed: 151647k -> 151193k
    Peak memory use before GGC: 92636k -> 92317k
    Peak memory use after GGC: 91719k -> 91394k
    Maximum of released memory in single GGC run: 18923k -> 18793k
    Garbage: 209437k -> 210317k
    Leak: 49262k -> 49389k
    Overhead: 21554k -> 23720k
    GGC runs: 409 -> 411

comparing Gerald's testcase PR8361 compilation at -O0 -g level:
  Amount of produced GGC garbage increased from 216065k to 216936k, overall 0.40%
  Amount of memory still referenced at the end of compilation increased from 72687k to 72814k, overall 0.17%
    Overall memory needed: 169807k -> 169357k
    Peak memory use before GGC: 105259k -> 104950k
    Peak memory use after GGC: 104216k -> 103910k
    Maximum of released memory in single GGC run: 19099k -> 18979k
    Garbage: 216065k -> 216936k
    Leak: 72687k -> 72814k
    Overhead: 27476k -> 29643k
    GGC runs: 383 -> 382

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Amount of memory still referenced at the end of compilation increased from 50086k to 50212k, overall 0.25%
    Overall memory needed: 145159k -> 145039k
    Peak memory use before GGC: 103343k -> 103059k
    Peak memory use after GGC: 102259k -> 102031k
    Maximum of released memory in single GGC run: 18066k -> 17981k
    Garbage: 393904k -> 391802k
    Leak: 50086k -> 50212k
    Overhead: 30256k -> 34379k
    GGC runs: 549 -> 541

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Amount of memory still referenced at the end of compilation increased from 50775k to 50901k, overall 0.25%
    Overall memory needed: 146215k -> 146087k
    Peak memory use before GGC: 103721k -> 103490k
    Peak memory use after GGC: 102694k -> 102465k
    Maximum of released memory in single GGC run: 18061k -> 17979k
    Garbage: 434398k -> 431076k
    Leak: 50775k -> 50901k
    Overhead: 35395k -> 39569k
    GGC runs: 601 -> 593

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of memory still referenced at the end of compilation increased from 50863k to 50990k, overall 0.25%
    Overall memory needed: 149403k -> 149003k
    Peak memory use before GGC: 104990k -> 104751k
    Peak memory use after GGC: 103914k -> 103682k
    Maximum of released memory in single GGC run: 18492k -> 18300k
    Garbage: 456007k -> 451163k
    Leak: 50863k -> 50990k
    Overhead: 37019k -> 41445k
    GGC runs: 611 -> 606

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 246463k -> 246995k
    Peak memory use before GGC: 82632k
    Peak memory use after GGC: 59515k -> 59514k
    Maximum of released memory in single GGC run: 45585k
    Garbage: 148104k -> 147214k
    Leak: 8082k -> 8082k
    Overhead: 24864k -> 24807k
    GGC runs: 80

comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
    Overall memory needed: 247351k -> 247847k
    Peak memory use before GGC: 83278k
    Peak memory use after GGC: 60161k -> 60160k
    Maximum of released memory in single GGC run: 45230k
    Garbage: 148323k -> 147433k
    Leak: 9338k -> 9338k
    Overhead: 25359k -> 25303k
    GGC runs: 88

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 260379k -> 258639k
    Peak memory use before GGC: 104834k -> 104566k
    Peak memory use after GGC: 101620k -> 101352k
    Maximum of released memory in single GGC run: 51846k -> 51848k
    Garbage: 241495k -> 240888k
    Leak: 25176k
    Overhead: 28735k -> 29476k
    GGC runs: 79

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 532207k -> 531915k
    Peak memory use before GGC: 104828k -> 104556k
    Peak memory use after GGC: 101615k -> 101342k
    Maximum of released memory in single GGC run: 37189k -> 37192k
    Garbage: 273333k -> 272729k
    Leak: 25605k
    Overhead: 34774k -> 35516k
    GGC runs: 91

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 1183239k -> 1182403k
    Peak memory use before GGC: 201488k -> 200610k
    Peak memory use after GGC: 189830k -> 188951k
    Maximum of released memory in single GGC run: 80890k -> 80735k
    Garbage: 373285k -> 371841k
    Leak: 45260k
    Overhead: 46802k -> 48290k
    GGC runs: 70

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2007-02-05 23:01:46.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2007-02-06 14:35:59.000000000 +0000
@@ -1,3 +1,289 @@
+2006-02-06  Paolo Bonzini  <bonzini@gnu.org>
+
+	* Makefile.in (tree-ssa-loop-ivopts.o): Add pointer-set.h dependency.
+	(tree-ssa-reassoc.o): Add pointer-set.h dependency.
+	(tree-cfg.o): Remove hashtab.h dependency.
+
+	* tree-ssa-loop-ivopts.c: Include pointer-set.h.
+	(struct ivopts_data): Change niters to pointer_map_t.
+	(struct nfe_cache_elt, nfe_hash, nfe_eq): Delete.
+	(niter_for_exit): Create pointer_map on demand.  Change for
+	pointer_map API.
+	(tree_ssa_iv_optimize_init): Initialize data->niters to NULL.
+	(free_loop_data): Destroy data->niters if created and reset field.
+	(tree_ssa_iv_optimize_finalize): Don't delete data->niters here.
+	(tree_ssa_iv_optimize_loop): Check for presence of stale data.
+
+	* tree-ssa-reassoc.c: Include pointer-set.h.
+	(bb_rank): Change to long *.
+	(operand_rank): Change to pointer_map_t.
+	(find_operand_rank): Return long, -1 if not found.  Declare as inline.
+	(insert_operand_rank): Accept long.
+	(operand_entry_hash, operand_entry_eq): Remove.
+	(get_rank): Return long.  Adjust for changes above.
+	(init_reassoc): Change rank type to long.  Adjust creation of bb_rank
+	and operand_rank.
+	(fini_reassoc): Delete operand_rank with pointer_map_destroy.
+
+	* tree-ssa-structalias.c (vi_for_tree): Change to pointer_map.
+	(struct tree_vi, tree_vi_t, tree_vi_hash, tree_vi_eq): Delete.
+	(insert_vi_for_tree): Rewrite for pointer_map API.  Assert argument
+	is not NULL.
+	(lookup_vi_for_tree): Rewrite for pointer_map API.  Return varinfo_t
+	directly since it cannot be NULL.
+	(get_vi_for_tree): Rewrite for pointer_map API.
+	(find_what_p_points_to): Adjust for change to lookup_vi_for_tree.
+	(init_alias_vars): Create vi_for_tree as pointer_map.
+	(delete_points_to_sets): Delete vi_for_tree using pointer_map_destroy.
+
+	* tree-cfg.c: Don't include hashtab.h.
+	(edge_to_cases): Declare as pointer_map.
+	(struct edge_to_cases_elt, edge_to_cases_hash, edge_to_cases_eq):
+	Delete.
+	(edge_to_cases_cleanup): Rewrite as pointer_map_traverse callback.
+	(start_recording_case_labels): Create edge_to_cases as pointer_map.
+	(end_recoding_case_labels): Cleanup edge_to_cases manually before
+	destroying it.
+	(record_switch_edge): Delete.
+	(get_cases_for_edge): Adjust for pointer_map API, inline
+	record_switch_edge (rewritten for new API), remove goto.
+
+2006-02-06  Paolo Bonzini  <bonzini@gnu.org>
+
+	* Makefile.in (tree-nested.o): Add pointer-set.h dependency.
+	* tree-nested.c: Include pointer-set.h.
+	(var_map_elt, var_map_eq, var_map_hash): Delete.
+	(struct nesting_info): Remove GTY marker.  Change the two htab_t's
+	to pointer_map_t's.
+	(nesting_info_bitmap_obstack): New.
+	(lookup_field_for_decl): Adjust for pointer_map API.
+	(lookup_tramp_for_decl): Adjust for pointer_map API.
+	(get_nonlocal_debug_decl): Adjust for pointer_map API.
+	(get_local_debug_decl): Adjust for pointer_map API.
+	(convert_nl_goto_reference): Adjust for pointer_map API.
+	(convert_nl_goto_receiver): Adjust for pointer_map API.
+	(create_nesting_tree): Create outside GGC space.  Create bitmap on
+	the new obstack.  Create field_map and var_map as pointer_maps.
+	(free_nesting_tree): Adjust for changes to create_nesting_tree.
+	(root): Delete.	
+	(lower_nested_functions): Move root here, no need to NULL it.
+	Initialize and release the obstack.
+
+2007-02-06  Paolo Bonzini  <bonzini@gnu.org>
+
+        * tree.c (tree_int_map_hash, tree_int_map_eq, tree_int_map_marked_p):
+        Remove prototypes and make them non-static.
+        (struct tree_int_map): Remove.
+        * tree.h (struct tree_int_map): Move here, turning TO into an
+        unsigned int.
+        (tree_int_map_hash, tree_int_map_eq, tree_int_map_marked_p): Declare.
+
+        * tree.h (TREE_COMPLEXITY): Remove.
+        (struct tree_exp): Remove complexity field.
+        * tree.c (build1_stat): Don't set it.
+
+2007-02-06  Dorit Nuzman  <dorit@il.ibm.com>
+	    Victor Kaplansky  <victork@il.ibm.com>
+
+	* tree-vectorizer.c (vect_is_simple_use): Support induction.
+	(vect_is_simple_reduction): Support reduction with induction as
+	one of the operands.
+	(vect_is_simple_iv_evolution): Fix formatting.
+	* tree-vect-analyze.c (vect_mark_stmts_to_be_vectorized): Fix 
+	formatting.  Don't mark induction phis for vectorization.
+	(vect_analyze_scalar_cycles): Analyze all inductions, then reductions.
+	* tree-vect-transform.c (get_initial_def_for_induction): New function.
+	(vect_get_vec_def_for_operand): Support induction.
+	(vect_get_vec_def_for_stmt_copy): Fix formatting and add check for
+	induction case.
+	(vectorizable_reduction): Support reduction with induction as one of 
+	the operands. 
+	(vectorizable_type_demotion): Use def-type of stmt argument rather
+	than dummy def-type.
+
+	* tree-ssa-loop.c (gate_scev_const_prop): Return the value of
+	flag_tree_scev_cprop.
+	* common.opt (tree-scev-cprop): New flag.
+
+	* tree-vect-transform.c (vect_create_destination_var): Use 'kind' in
+	call to vect_get_new_vect_var.
+
+2007-02-06  Ira Rosen  <irar@il.ibm.com>
+
+	* tree-vect-patterns.c (vect_recog_widen_mult_pattern): Check that 
+	vectype is not NULL.
+	(vect_pattern_recog_1): Likewise.
+
+2007-02-05  Kaveh R. Ghazi  <ghazi@caip.rutgers.edu>
+
+	* fold-const.c (negate_expr_p): Handle CONJ_EXPR.
+	(fold_negate_expr): Likewise.
+
+2007-02-05  Alexandre Oliva  <aoliva@redhat.com>
+
+	PR debug/30189
+	* dwarf2out.c (modified_type_die): Follow DECL_ORIGINAL_TYPE
+	even if cv-qualification is the same.
+
+2007-02-05  Geoffrey Keating  <geoffk@apple.com>
+
+	* config/rs6000/darwin-tramp.asm (__trampoline_setup): Call
+	__enable_execute_stack on completion.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (athlon_fldxf_k8, athlon_fld_k8,
+	athlon_fstxf_k8, athlon_fst_k8, athlon_fist, athlon_fmov,
+	athlon_fadd_load, athlon_fadd_load_k8, athlon_fadd, athlon_fmul,
+	athlon_fmul_load, athlon_fmul_load_k8, athlon_fsgn,
+	athlon_fdiv_load, athlon_fdiv_load_k8, athlon_fdiv_k8,
+	athlon_fpspc_load, athlon_fpspc, athlon_fcmov_load,
+	athlon_fcmov_load_k8, athlon_fcmov_k8, athlon_fcomi_load_k8,
+	athlon_fcomi, athlon_fcom_load_k8, athlon_fcom): Added amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/i386.md (x86_sahf_1, cmpfp_i_mixed, cmpfp_i_sse,
+	cmpfp_i_i387, cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387,
+	swapsi, swaphi_1, swapqi_1, swapdi_rex64, fix_truncsfdi_sse,
+	fix_truncdfdi_sse, fix_truncsfsi_sse, fix_truncdfsi_sse,
+	x86_fldcw_1, floatsisf2_mixed, floatsisf2_sse, floatdisf2_mixed,
+	floatdisf2_sse, floatsidf2_mixed, floatsidf2_sse,
+	floatdidf2_mixed, floatdidf2_sse, muldi3_1_rex64, mulsi3_1,
+	mulsi3_1_zext, mulhi3_1, mulqi3_1, umulqihi3_1, mulqihi3_insn,
+	umulditi3_insn, umulsidi3_insn, mulditi3_insn, mulsidi3_insn,
+	umuldi3_highpart_rex64, umulsi3_highpart_insn,
+	umulsi3_highpart_zext, smuldi3_highpart_rex64,
+	smulsi3_highpart_insn, smulsi3_highpart_zext, x86_64_shld,
+	x86_shld_1, x86_64_shrd, sqrtsf2_mixed, sqrtsf2_sse,
+	sqrtsf2_i387, sqrtdf2_mixed, sqrtdf2_sse, sqrtdf2_i387,
+	sqrtextendsfdf2_i387, sqrtxf2, sqrtextendsfxf2_i387,
+	sqrtextenddfxf2_i387): Added amdfam10_decode.
+	
+	* config/i386/athlon.md (athlon_idirect_amdfam10,
+	athlon_ivector_amdfam10, athlon_idirect_load_amdfam10,
+	athlon_ivector_load_amdfam10, athlon_idirect_both_amdfam10,
+	athlon_ivector_both_amdfam10, athlon_idirect_store_amdfam10,
+	athlon_ivector_store_amdfam10): New define_insn_reservation.
+	(athlon_idirect_loadmov, athlon_idirect_movstore): Added
+	amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (athlon_call_amdfam10,
+	athlon_pop_amdfam10, athlon_lea_amdfam10): New
+	define_insn_reservation.
+	(athlon_branch, athlon_push, athlon_leave_k8, athlon_imul_k8,
+	athlon_imul_k8_DI, athlon_imul_mem_k8, athlon_imul_mem_k8_DI,
+	athlon_idiv, athlon_idiv_mem, athlon_str): Added amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (athlon_sseld_amdfam10,
+	athlon_mmxld_amdfam10, athlon_ssest_amdfam10,
+	athlon_mmxssest_short_amdfam10): New define_insn_reservation.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (athlon_sseins_amdfam10): New
+	define_insn_reservation.
+	* config/i386/i386.md (sseins): Added sseins to define_attr type
+	and define_attr unit.
+	* config/i386/sse.md: Set type attribute to sseins for insertq
+	and insertqi.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (sselog_load_amdfam10, sselog_amdfam10,
+	ssecmpvector_load_amdfam10, ssecmpvector_amdfam10,
+	ssecomi_load_amdfam10, ssecomi_amdfam10,
+	sseaddvector_load_amdfam10, sseaddvector_amdfam10): New
+	define_insn_reservation.
+	(ssecmp_load_k8, ssecmp, sseadd_load_k8, seadd): Added amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (cvtss2sd_load_amdfam10,
+	cvtss2sd_amdfam10, cvtps2pd_load_amdfam10, cvtps2pd_amdfam10,
+	cvtsi2sd_load_amdfam10, cvtsi2ss_load_amdfam10,
+	cvtsi2sd_amdfam10, cvtsi2ss_amdfam10, cvtsd2ss_load_amdfam10,
+	cvtsd2ss_amdfam10, cvtpd2ps_load_amdfam10, cvtpd2ps_amdfam10,
+	cvtsX2si_load_amdfam10, cvtsX2si_amdfam10): New 
+	define_insn_reservation.
+
+	* config/i386/sse.md (cvtsi2ss, cvtsi2ssq, cvtss2si,
+	cvtss2siq, cvttss2si, cvttss2siq, cvtsi2sd, cvtsi2sdq,
+	cvtsd2si, cvtsd2siq, cvttsd2si, cvttsd2siq,
+	cvtpd2dq, cvttpd2dq, cvtsd2ss, cvtss2sd,
+	cvtpd2ps, cvtps2pd): Added amdfam10_decode attribute.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/athlon.md (athlon_ssedivvector_amdfam10,
+	athlon_ssedivvector_load_amdfam10, athlon_ssemulvector_amdfam10,
+	athlon_ssemulvector_load_amdfam10): New define_insn_reservation.
+	(athlon_ssediv, athlon_ssediv_load_k8, athlon_ssemul,
+	athlon_ssemul_load_k8): Added amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/i386.h (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL): New macro.
+	(x86_sse_unaligned_move_optimal): New variable.
+	
+	* config/i386/i386.c (x86_sse_unaligned_move_optimal): Enable for  
+	m_AMDFAM10.
+	(ix86_expand_vector_move_misalign): Add code to generate movupd/movups
+	for unaligned vector SSE double/single precision loads for AMDFAM10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+
+	* config/i386/i386.h (TARGET_AMDFAM10): New macro.
+	(TARGET_CPU_CPP_BUILTINS): Add code for amdfam10.
+	Define TARGET_CPU_DEFAULT_amdfam10.
+	(TARGET_CPU_DEFAULT_NAMES): Add amdfam10.
+	(processor_type): Add PROCESSOR_AMDFAM10.	
+	
+	* config/i386/i386.md: Add amdfam10 as a new cpu attribute to match
+	processor_type in config/i386/i386.h.
+	Enable imul peepholes for TARGET_AMDFAM10.
+	
+	* config.gcc: Add support for --with-cpu option for amdfam10.
+	
+	* config/i386/i386.c (amdfam10_cost): New variable.
+	(m_AMDFAM10): New macro.
+	(m_ATHLON_K8_AMDFAM10): New macro.
+	(x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen,
+	x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop,
+	x86_promote_QImode, x86_integer_DFmode_moves,
+	x86_partial_reg_dependency, x86_memory_mismatch_stall, 
+	x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387,
+	x86_sse_partial_reg_dependency, x86_sse_typeless_stores,
+	x86_use_ffreep, x86_use_incdec, x86_four_jump_limit,
+	x86_schedule, x86_use_bt, x86_cmpxchg16b, x86_pad_returns):
+	Enable/disable for amdfam10.
+	(override_options): Add amdfam10_cost to processor_target_table.
+	Set up PROCESSOR_AMDFAM10 for amdfam10 entry in 
+	processor_alias_table.
+	(ix86_issue_rate): Add PROCESSOR_AMDFAM10.
+	(ix86_adjust_cost): Add code for amdfam10.
+
+2007-02-05	Harsha Jagasia	<harsha.jagasia@amd.com>
+	
+	* config/i386/i386.opt: Add new Advanced Bit Manipulation (-mabm)
+	instruction set feature flag. Add new (-mpopcnt) flag for popcnt 
+	instruction. Add new SSE4A (-msse4a) instruction set feature flag.
+	* config/i386/i386.h: Add builtin definition for SSE4A.
+	* config/i386/i386.md: Add support for ABM instructions 
+	(popcnt and lzcnt).
+	* config/i386/sse.md: Add support for SSE4A instructions
+	(movntss, movntsd, extrq, insertq).
+	* config/i386/i386.c: Add support for ABM and SSE4A builtins.
+	Add -march=amdfam10 flag.
+	* config/i386/ammintrin.h: Add support for SSE4A intrinsics.
+	* doc/invoke.texi: Add documentation on flags for sse4a, abm, popcnt
+	and amdfam10.
+	* doc/extend.texi: Add documentation for SSE4A builtins.
+
 2007-02-05  Bob Wilson  <bob.wilson@acm.org>
 
 	* config/xtensa/xtensa.c (constantpool_mem_p): Skip over SUBREGs.


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]