This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A recent patch increased GCC's memory consumption in some cases!


Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing empty function compilation at -O0 level:
    Overall memory needed: 8801k
    Peak memory use before GGC: 1488k
    Peak memory use after GGC: 1437k
    Maximum of released memory in single GGC run: 85k
    Garbage: 218k
    Leak: 1537k
    Overhead: 187k
    GGC runs: 4
    Pre-IPA-Garbage: 210k
    Pre-IPA-Leak: 1539k
    Pre-IPA-Overhead: 186k
    Post-IPA-Garbage: 210k
    Post-IPA-Leak: 1539k
    Post-IPA-Overhead: 186k

comparing empty function compilation at -O0 -g level:
    Overall memory needed: 8825k
    Peak memory use before GGC: 1516k
    Peak memory use after GGC: 1464k
    Maximum of released memory in single GGC run: 87k
    Garbage: 219k
    Leak: 1570k
    Overhead: 192k
    GGC runs: 4
    Pre-IPA-Garbage: 210k
    Pre-IPA-Leak: 1539k
    Pre-IPA-Overhead: 186k
    Post-IPA-Garbage: 210k
    Post-IPA-Leak: 1539k
    Post-IPA-Overhead: 186k

comparing empty function compilation at -O1 level:
    Overall memory needed: 8801k
    Peak memory use before GGC: 1488k
    Peak memory use after GGC: 1437k
    Maximum of released memory in single GGC run: 90k
    Garbage: 223k
    Leak: 1537k
    Overhead: 188k
    GGC runs: 4
    Pre-IPA-Garbage: 212k
    Pre-IPA-Leak: 1540k
    Pre-IPA-Overhead: 186k
    Post-IPA-Garbage: 212k
    Post-IPA-Leak: 1540k
    Post-IPA-Overhead: 186k

comparing empty function compilation at -O2 level:
    Overall memory needed: 8941k -> 8929k
    Peak memory use before GGC: 1488k
    Peak memory use after GGC: 1437k
    Maximum of released memory in single GGC run: 90k
    Garbage: 228k
    Leak: 1537k
    Overhead: 189k
    GGC runs: 5
    Pre-IPA-Garbage: 212k
    Pre-IPA-Leak: 1540k
    Pre-IPA-Overhead: 186k
    Post-IPA-Garbage: 212k
    Post-IPA-Leak: 1540k
    Post-IPA-Overhead: 186k

comparing empty function compilation at -O3 level:
    Overall memory needed: 8933k -> 8929k
    Peak memory use before GGC: 1488k
    Peak memory use after GGC: 1437k
    Maximum of released memory in single GGC run: 90k
    Garbage: 228k
    Leak: 1537k
    Overhead: 189k
    GGC runs: 5
    Pre-IPA-Garbage: 212k
    Pre-IPA-Leak: 1540k
    Pre-IPA-Overhead: 186k
    Post-IPA-Garbage: 212k
    Post-IPA-Leak: 1540k
    Post-IPA-Overhead: 186k

comparing combine.c compilation at -O0 level:
    Overall memory needed: 31457k
    Peak memory use before GGC: 17478k
    Peak memory use after GGC: 17029k
    Maximum of released memory in single GGC run: 1911k
    Garbage: 37895k
    Leak: 7171k -> 7155k
    Overhead: 5490k -> 5491k
    GGC runs: 331
    Pre-IPA-Garbage: 12530k
    Pre-IPA-Leak: 18411k
    Pre-IPA-Overhead: 2504k
    Post-IPA-Garbage: 12530k
    Post-IPA-Leak: 18411k
    Post-IPA-Overhead: 2504k

comparing combine.c compilation at -O0 -g level:
    Overall memory needed: 33401k
    Peak memory use before GGC: 19386k
    Peak memory use after GGC: 18869k
    Maximum of released memory in single GGC run: 1920k
    Garbage: 38110k
    Leak: 10441k
    Overhead: 6303k
    GGC runs: 315
    Pre-IPA-Garbage: 12549k
    Pre-IPA-Leak: 20660k
    Pre-IPA-Overhead: 2986k
    Post-IPA-Garbage: 12549k
    Post-IPA-Leak: 20660k
    Post-IPA-Overhead: 2986k

comparing combine.c compilation at -O1 level:
  Amount of produced GGC garbage increased from 45806k to 45931k, overall 0.27%
    Overall memory needed: 31913k -> 32209k
    Peak memory use before GGC: 16555k -> 16551k
    Peak memory use after GGC: 16383k -> 16380k
    Maximum of released memory in single GGC run: 1378k
    Garbage: 45806k -> 45931k
    Leak: 7156k -> 7155k
    Overhead: 6440k -> 6449k
    GGC runs: 388
    Pre-IPA-Garbage: 13405k
    Pre-IPA-Leak: 17702k
    Pre-IPA-Overhead: 2552k
    Post-IPA-Garbage: 13405k
    Post-IPA-Leak: 17702k
    Post-IPA-Overhead: 2552k

comparing combine.c compilation at -O2 level:
  Amount of produced GGC garbage increased from 56306k to 56509k, overall 0.36%
    Overall memory needed: 32989k -> 33021k
    Peak memory use before GGC: 16628k -> 16615k
    Peak memory use after GGC: 16454k -> 16447k
    Maximum of released memory in single GGC run: 1489k
    Garbage: 56306k -> 56509k
    Leak: 7188k -> 7188k
    Overhead: 8033k -> 8083k
    GGC runs: 441 -> 443
    Pre-IPA-Garbage: 13435k
    Pre-IPA-Leak: 17724k
    Pre-IPA-Overhead: 2555k
    Post-IPA-Garbage: 13435k
    Post-IPA-Leak: 17724k
    Post-IPA-Overhead: 2555k

comparing combine.c compilation at -O3 level:
  Amount of produced GGC garbage increased from 80462k to 80674k, overall 0.26%
    Overall memory needed: 37029k -> 37149k
    Peak memory use before GGC: 16727k -> 16723k
    Peak memory use after GGC: 16557k -> 16551k
    Maximum of released memory in single GGC run: 1681k -> 1682k
    Garbage: 80462k -> 80674k
    Leak: 7249k -> 7249k
    Overhead: 11103k -> 11072k
    GGC runs: 527 -> 528
    Pre-IPA-Garbage: 13435k
    Pre-IPA-Leak: 17759k
    Pre-IPA-Overhead: 2555k
    Post-IPA-Garbage: 13435k
    Post-IPA-Leak: 17759k
    Post-IPA-Overhead: 2555k

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 152505k -> 152521k
    Peak memory use before GGC: 65254k
    Peak memory use after GGC: 52818k
    Maximum of released memory in single GGC run: 26250k
    Garbage: 128569k
    Leak: 9587k
    Overhead: 16691k
    GGC runs: 258
    Pre-IPA-Garbage: 40782k
    Pre-IPA-Leak: 51014k
    Pre-IPA-Overhead: 7761k
    Post-IPA-Garbage: 40782k
    Post-IPA-Leak: 51014k
    Post-IPA-Overhead: 7761k

comparing insn-attrtab.c compilation at -O0 -g level:
    Overall memory needed: 153817k
    Peak memory use before GGC: 66520k
    Peak memory use after GGC: 54081k
    Maximum of released memory in single GGC run: 26251k
    Garbage: 128907k
    Leak: 11219k
    Overhead: 17144k
    GGC runs: 252
    Pre-IPA-Garbage: 40791k
    Pre-IPA-Leak: 52539k
    Pre-IPA-Overhead: 8091k
    Post-IPA-Garbage: 40791k
    Post-IPA-Leak: 52539k
    Post-IPA-Overhead: 8091k

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 154545k
    Peak memory use before GGC: 54972k
    Peak memory use after GGC: 44902k
    Maximum of released memory in single GGC run: 17233k
    Garbage: 181124k -> 181126k
    Leak: 9178k
    Overhead: 23427k -> 23428k
    GGC runs: 298
    Pre-IPA-Garbage: 45256k
    Pre-IPA-Leak: 45116k
    Pre-IPA-Overhead: 7607k
    Post-IPA-Garbage: 45256k
    Post-IPA-Leak: 45116k
    Post-IPA-Overhead: 7607k

comparing insn-attrtab.c compilation at -O2 level:
    Overall memory needed: 202421k
    Peak memory use before GGC: 54418k
    Peak memory use after GGC: 44649k
    Maximum of released memory in single GGC run: 18696k
    Garbage: 211641k -> 211649k
    Leak: 9193k
    Overhead: 29298k -> 29301k
    GGC runs: 331
    Pre-IPA-Garbage: 45281k
    Pre-IPA-Leak: 45120k
    Pre-IPA-Overhead: 7609k
    Post-IPA-Garbage: 45281k
    Post-IPA-Leak: 45120k
    Post-IPA-Overhead: 7609k

comparing insn-attrtab.c compilation at -O3 level:
    Overall memory needed: 205857k -> 206109k
    Peak memory use before GGC: 54430k
    Peak memory use after GGC: 44658k
    Maximum of released memory in single GGC run: 18679k
    Garbage: 229893k -> 229989k
    Leak: 9211k
    Overhead: 31200k -> 31223k
    GGC runs: 351 -> 352
    Pre-IPA-Garbage: 45281k
    Pre-IPA-Leak: 45120k
    Pre-IPA-Overhead: 7609k
    Post-IPA-Garbage: 45281k
    Post-IPA-Leak: 45120k
    Post-IPA-Overhead: 7609k

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 146321k -> 146265k
    Peak memory use before GGC: 81868k
    Peak memory use after GGC: 81058k
    Maximum of released memory in single GGC run: 13542k
    Garbage: 192383k -> 192383k
    Leak: 55448k
    Overhead: 28823k -> 28823k
    GGC runs: 436
    Pre-IPA-Garbage: 105277k
    Pre-IPA-Leak: 84587k
    Pre-IPA-Overhead: 15579k
    Post-IPA-Garbage: 105277k
    Post-IPA-Leak: 84587k
    Post-IPA-Overhead: 15579k

comparing Gerald's testcase PR8361 compilation at -O0 -g level:
    Overall memory needed: 163689k -> 163757k
    Peak memory use before GGC: 95614k
    Peak memory use after GGC: 94667k
    Maximum of released memory in single GGC run: 13985k
    Garbage: 197353k -> 197353k
    Leak: 82400k
    Overhead: 35357k -> 35357k
    GGC runs: 409
    Pre-IPA-Garbage: 105779k
    Pre-IPA-Leak: 101036k
    Pre-IPA-Overhead: 19072k
    Post-IPA-Garbage: 105779k
    Post-IPA-Leak: 101036k
    Post-IPA-Overhead: 19072k

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Amount of produced GGC garbage increased from 269059k to 269992k, overall 0.35%
    Overall memory needed: 108649k -> 107909k
    Peak memory use before GGC: 82538k -> 81507k
    Peak memory use after GGC: 81713k -> 80699k
    Maximum of released memory in single GGC run: 13775k -> 13777k
    Garbage: 269059k -> 269992k
    Leak: 52272k -> 52256k
    Overhead: 31689k -> 31955k
    GGC runs: 525
    Pre-IPA-Garbage: 153885k -> 153091k
    Pre-IPA-Leak: 86855k -> 85821k
    Pre-IPA-Overhead: 19248k -> 19154k
    Post-IPA-Garbage: 153885k -> 153091k
    Post-IPA-Leak: 86855k -> 85821k
    Post-IPA-Overhead: 19248k -> 19154k

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Amount of produced GGC garbage increased from 304549k to 306261k, overall 0.56%
    Overall memory needed: 108129k -> 107593k
    Peak memory use before GGC: 82566k -> 81283k
    Peak memory use after GGC: 80858k -> 80157k
    Maximum of released memory in single GGC run: 13773k
    Garbage: 304549k -> 306261k
    Leak: 52369k -> 52370k
    Overhead: 36687k -> 37155k
    GGC runs: 569 -> 570
    Pre-IPA-Garbage: 156917k -> 156404k
    Pre-IPA-Leak: 85949k -> 85114k
    Pre-IPA-Overhead: 19461k -> 19383k
    Post-IPA-Garbage: 156917k -> 156404k
    Post-IPA-Leak: 85949k -> 85114k
    Post-IPA-Overhead: 19461k -> 19383k

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of produced GGC garbage increased from 335372k to 336826k, overall 0.43%
    Overall memory needed: 114521k -> 113781k
    Peak memory use before GGC: 83032k -> 81749k
    Peak memory use after GGC: 80967k -> 80157k
    Maximum of released memory in single GGC run: 13773k
    Garbage: 335372k -> 336826k
    Leak: 52415k -> 52415k
    Overhead: 40618k -> 40764k
    GGC runs: 606 -> 609
    Pre-IPA-Garbage: 156917k -> 156404k
    Pre-IPA-Leak: 85953k -> 85117k
    Pre-IPA-Overhead: 19461k -> 19383k
    Post-IPA-Garbage: 156917k -> 156404k
    Post-IPA-Leak: 85953k -> 85117k
    Post-IPA-Overhead: 19461k -> 19383k

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 358789k -> 358769k
    Peak memory use before GGC: 78173k
    Peak memory use after GGC: 49107k
    Maximum of released memory in single GGC run: 37057k
    Garbage: 140190k
    Leak: 7711k
    Overhead: 24960k
    GGC runs: 86
    Pre-IPA-Garbage: 12171k
    Pre-IPA-Leak: 18626k
    Pre-IPA-Overhead: 2403k
    Post-IPA-Garbage: 12171k
    Post-IPA-Leak: 18626k
    Post-IPA-Overhead: 2403k

comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
    Overall memory needed: 359561k -> 359521k
    Peak memory use before GGC: 78856k
    Peak memory use after GGC: 49791k
    Maximum of released memory in single GGC run: 37041k
    Garbage: 140255k
    Leak: 9707k
    Overhead: 25529k
    GGC runs: 94
    Pre-IPA-Garbage: 12173k
    Pre-IPA-Leak: 18873k
    Pre-IPA-Overhead: 2456k
    Post-IPA-Garbage: 12173k
    Post-IPA-Leak: 18873k
    Post-IPA-Overhead: 2456k

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 308773k to 347669k, overall 12.60%
  Peak amount of GGC memory allocated before garbage collecting increased from 80235k to 94211k, overall 17.42%
  Peak amount of GGC memory still allocated after garbage collecting increased from 69462k to 82935k, overall 19.40%
  Amount of produced GGC garbage increased from 224434k to 256003k, overall 14.07%
  Amount of memory still referenced at the end of compilation increased from 9462k to 9535k, overall 0.77%
    Overall memory needed: 308773k -> 347669k
    Peak memory use before GGC: 80235k -> 94211k
    Peak memory use after GGC: 69462k -> 82935k
    Maximum of released memory in single GGC run: 38514k -> 47307k
    Garbage: 224434k -> 256003k
    Leak: 9462k -> 9535k
    Overhead: 32358k -> 35484k
    GGC runs: 95 -> 97
  Amount of produced pre-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
  Amount of memory referenced pre-ipa increased from 63974k to 64580k, overall 0.95%
    Pre-IPA-Garbage: 41119k -> 42051k
    Pre-IPA-Leak: 63974k -> 64580k
    Pre-IPA-Overhead: 7105k -> 7108k
  Amount of produced post-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
  Amount of memory referenced post-ipa increased from 63974k to 64580k, overall 0.95%
    Post-IPA-Garbage: 41119k -> 42051k
    Post-IPA-Leak: 63974k -> 64580k
    Post-IPA-Overhead: 7105k -> 7108k

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
  Overall memory allocated via mmap and sbrk increased from 518213k to 666965k, overall 28.70%
  Peak amount of GGC memory allocated before garbage collecting increased from 80260k to 90301k, overall 12.51%
  Peak amount of GGC memory still allocated after garbage collecting increased from 69463k to 82936k, overall 19.40%
  Amount of produced GGC garbage increased from 266746k to 302290k, overall 13.32%
  Amount of memory still referenced at the end of compilation increased from 9463k to 11372k, overall 20.17%
    Overall memory needed: 518213k -> 666965k
    Peak memory use before GGC: 80260k -> 90301k
    Peak memory use after GGC: 69463k -> 82936k
    Maximum of released memory in single GGC run: 38750k -> 38640k
    Garbage: 266746k -> 302290k
    Leak: 9463k -> 11372k
    Overhead: 42002k -> 50851k
    GGC runs: 107
  Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
  Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
    Pre-IPA-Garbage: 90152k -> 84972k
    Pre-IPA-Leak: 80240k -> 86483k
    Pre-IPA-Overhead: 11095k -> 11064k
  Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
  Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
    Post-IPA-Garbage: 90152k -> 84972k
    Post-IPA-Leak: 80240k -> 86483k
    Post-IPA-Overhead: 11095k -> 11064k

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
  Overall memory allocated via mmap and sbrk increased from 1031281k to 1233337k, overall 19.59%
  Amount of produced GGC garbage increased from 329142k to 347506k, overall 5.58%
    Overall memory needed: 1031281k -> 1233337k
    Peak memory use before GGC: 135102k -> 133837k
    Peak memory use after GGC: 126444k -> 126291k
    Maximum of released memory in single GGC run: 54329k -> 51246k
    Garbage: 329142k -> 347506k
    Leak: 10302k -> 10289k
    Overhead: 42368k -> 43798k
    GGC runs: 106 -> 107
  Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
  Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
    Pre-IPA-Garbage: 90152k -> 84972k
    Pre-IPA-Leak: 80240k -> 86483k
    Pre-IPA-Overhead: 11095k -> 11064k
  Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
  Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
    Post-IPA-Garbage: 90152k -> 84972k
    Post-IPA-Leak: 80240k -> 86483k
    Post-IPA-Overhead: 11095k -> 11064k

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-haydn-memory/x86_64/mem-result/ChangeLog	2009-05-29 07:03:50.000000000 +0000
+++ /usr/src/SpecTests/sandbox-haydn-memory/gcc/gcc/ChangeLog	2009-05-29 21:48:20.000000000 +0000
@@ -1,3 +1,310 @@
+2009-05-29  Eric Botcazou  <ebotcazou@adacore.com>
+
+	* tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
+	(force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
+	(ptr_difference_cost): Use affine combinations to compute it.
+	(difference_cost): Likewise.
+	(get_computation_cost_at): Compute more accurate cost for addresses
+	if the ratio is a multiplier allowed in addresses.
+	For non-addresses, consider that an additional offset or symbol is
+	added only once.
+
+2009-05-29  Jakub Jelinek  <jakub@redhat.com>
+
+	* config/i386/i386.c (ix86_decompose_address): Avoid useless
+	0 displacement.  Add 0 displacement if base is %[er]bp or %r13.
+
+	* config/i386/i386.md (prefix_data16, prefix_rep): Set to 0 for
+	TYPE_SSE{MULADD,4ARG,IADD1,CVT1} by default.
+	(prefix_rex): For UNIT_MMX don't imply the prefix by default
+	if MODE_DI.
+	(prefix_extra): Default to 2 for TYPE_SSE{MULADD,4ARG} and
+	to 1 for TYPE_SSE{IADD1,CVT1}.
+	(prefix_vex_imm8): Removed.
+	(length_vex): Only pass 1 as second argument to
+	ix86_attr_length_vex_default if prefix_extra is 0.
+	(modrm): For TYPE_INCDEC only set to 0 if not TARGET_64BIT.
+	(length): For prefix vex computation use length_immediate
+	attribute instead of prefix_vex_imm8.
+	(cmpqi_ext_3_insn, cmpqi_ext_3_insn_rex64,
+	addqi_ext_1, addqi_ext_1_rex64, *testqi_ext_0, andqi_ext_0,
+	*andqi_ext_0_cc, *iorqi_ext_0, *xorqi_ext_0, *xorqi_cc_ext_1,
+	*xorqi_cc_ext_1_rex64): Override modrm attribute to 1.
+	(extendsidi2_rex64, extendhidi2, extendqidi2, extendhisi2,
+	*extendhisi2_zext, extendqihi2, extendqisi2, *extendqisi2_zext): Emit
+	a space in between the operands.
+	(*anddi_1_rex64, *andsi_1): Likewise.  Override prefix_rex to 1
+	if one operand is 0xff and the other one si, di, bp or sp.
+	(*andhi_1): Override prefix_rex to 1 if one operand is 0xff and the
+	other one si, di, bp or sp.
+	(*btsq, *btrq, *btcq, *btdi_rex64, *btsi): Add mode attribute.
+	(*ffssi_1, *ffsdi_1, ctzsi2, ctzdi2): Add
+	type and mode attributes.
+	(*bsr, *bsr_rex64, *bsrhi): Add type attribute.
+	(*cmpfp_i_mixed, *cmpfp_iu_mixed): For TYPE_SSECOMI, clear
+	prefix_rep attribute and set prefix_data16 attribute iff MODE_DF.
+	(*cmpfp_i_sse, *cmpfp_iu_sse): Clear prefix_rep attribute and set
+	prefix_data16 attribute iff MODE_DF.
+	(*movsi_1): For TYPE_SSEMOV MODE_SI set prefix_data16 attribute.
+	(fix_trunc<mode>di_sse): Set prefix_rex attribute.
+	(*adddi_4_rex64, *addsi_4): Use const128_operand instead of
+	constm128_operand in length_immediate computation.
+	(*addhi_4): Likewise.  Fix mode attribute to MODE_HI.
+	(anddi_1_rex64): Use movzbl/movzwl instead of movzbq/movzwq.
+	(*avx_ashlti3, sse2_ashlti3, *avx_lshrti3, sse2_lshrti3): Set
+	length_immediate attribute to 1.
+	(x86_fnstsw_1, x86_fnstcw_1, x86_fldcw_1): Fix length attribute.
+	(*movdi_1_rex64): Override prefix_rex or prefix_data16 attributes
+	for certain alternatives.
+	(*movdf_nointeger, *movdf_integer_rex64, *movdf_integer): Override
+	prefix_data16 attribute if MODE_V1DF.
+	(*avx_setcc<mode>, *sse_setcc<mode>, *sse5_setcc<mode>): Set
+	length_immediate to 1.
+	(set_got_rex64, set_rip_rex64): Remove length attribute, set
+	length_address to 4, set mode attribute to MODE_DI.
+	(set_got_offset_rex64): Likewise.  Set length_immediate to 0.
+	(fxam<mode>2_i387): Set length attribute to 4.
+	(*prefetch_sse, *prefetch_sse_rex, *prefetch_3dnow,
+	*prefetch_3dnow_rex): Override length_address attribute.
+	(sse4_2_crc32<mode>): Override prefix_data16 and prefix_rex
+	attributes.
+	* config/i386/predicates.md (ext_QIreg_nomode_operand): New predicate.
+	(constm128_operand): Removed.
+	* config/i386/i386.c (memory_address_length): For
+	disp && !index && !base in 64-bit mode account for SIB byte if
+	print_operand_address can't optimize disp32 into disp32(%rip)
+	and UNSPEC doesn't imply (%rip) addressing.  Add 1 to length
+	for fs: or gs: segment.
+	(ix86_attr_length_immediate_default): When checking if shortform
+	is possible, truncate immediate to the length of the non-shortened
+	immediate.
+	(ix86_attr_length_address_default): Ignore MEM_P operands
+	with X constraint.
+	(ix86_attr_length_vex_default): Only check for DImode on
+	GENERAL_REG_P operands.
+	* config/i386/sse.md (<sse>_comi, <sse>_ucomi): Clear
+	prefix_rep attribute, set prefix_data16 attribute iff MODE_DF.
+	(sse_cvttps2pi): Clear prefix_rep attribute.
+	(sse2_cvttps2dq, *sse2_cvtpd2dq, sse2_cvtps2pd): Clear prefix_data16
+	attribute.
+	(*sse2_cvttpd2dq): Don't clear prefix_rep attribute.
+	(*avx_ashr<mode>3, ashr<mode>3, *avx_lshr<mode>3, lshr<mode>3,
+	*avx_ashl<mode>3, ashl<mode>3): Set length_immediate attribute to 1
+	iff operand 2 is const_int_operand.
+	(*vec_dupv4si, avx_shufpd256_1, *avx_shufpd_<mode>,
+	sse2_shufpd_<mode>): Set length_immediate attribute to 1.
+	(sse2_pshufd_1): Likewise.  Set prefix attribute to maybe_vex
+	instead of vex.
+	(sse2_pshuflw_1, sse2_pshufhw_1): Set length_immediate to 1 and clear
+	prefix_data16.
+	(sse2_unpckhpd, sse2_unpcklpd, sse2_storehpd, *vec_concatv2df): Set
+	prefix_data16 attribute for movlpd and movhpd instructions.
+	(sse2_loadhpd, sse2_loadlpd, sse2_movsd): Likewise.  Override
+	length_immediate for shufpd instruction.
+	(sse2_movntsi, sse3_lddqu): Clear prefix_data16 attribute.
+	(avx_cmpp<avxmodesuffixf2c><mode>3,
+	avx_cmps<ssemodesuffixf2c><mode>3, *avx_maskcmp<mode>3,
+	<sse>_maskcmp<mode>3, <sse>_vmmaskcmp<mode>3,
+	avx_shufps256_1, *avx_shufps_<mode>, sse_shufps_<mode>,
+	*vec_dupv4sf_avx, *vec_dupv4sf): Set
+	length_immediate attribute to 1.
+	(*avx_cvtsi2ssq, *avx_cvtsi2sdq): Set length_vex attribute to 4.
+	(sse_cvtsi2ssq, sse2_cvtsi2sdq): Set prefix_rex attribute to 1.
+	(sse2_cvtpi2pd, sse_loadlps, sse2_storelpd): Override
+	prefix_data16 attribute for the first alternative to 1.
+	(*avx_loadlps): Override length_immediate for the first alternative.
+	(*vec_concatv2sf_avx): Override length_immediate and prefix_extra
+	attributes for second alternative.
+	(*vec_concatv2sf_sse4_1): Override length_immediate and
+	prefix_data16 attributes for second alternative.
+	(*vec_setv4sf_avx, *avx_insertps, vec_extract_lo_<mode>,
+	vec_extract_hi_<mode>, vec_extract_lo_v16hi,
+	vec_extract_hi_v16hi, vec_extract_lo_v32qi,
+	vec_extract_hi_v32qi): Set prefix_extra and length_immediate to 1.
+	(*vec_setv4sf_sse4_1, sse4_1_insertps, *sse4_1_extractps): Set
+	prefix_data16 and length_immediate to 1.
+	(*avx_mulv2siv2di3, *avx_mulv4si3, sse4_2_gtv2di3): Set prefix_extra
+	to 1.
+	(*avx_<code><mode>3, *avx_eq<mode>3, *avx_gt<mode>3): Set
+	prefix_extra attribute for variants that don't have 0f prefix
+	alone.
+	(*avx_pinsr<ssevecsize>): Likewise.  Set length_immediate to 1.
+	(*sse4_1_pinsrb, *sse2_pinsrw, *sse4_1_pinsrd, *sse4_1_pextrb,
+	*sse4_1_pextrb_memory, *sse2_pextrw, *sse4_1_pextrw_memory,
+	*sse4_1_pextrd): Set length_immediate to 1.
+	(*sse4_1_pinsrd): Likewise.  Set prefix_extra to 1.
+	(*sse4_1_pinsrq, *sse4_1_pextrq): Set prefix_rex and length_immediate
+	to 1.
+	(*vec_extractv2di_1_rex64_avx, *vec_extractv2di_1_rex64,
+	*vec_extractv2di_1_avx, *vec_extractv2di_1_sse2): Override
+	length_immediate to 1 for second alternative.
+	(*vec_concatv2si_avx, *vec_concatv2di_rex64_avx): Override
+	prefix_extra and length_immediate attributes for the first
+	alternative.
+	(vec_concatv2si_sse4_1): Override length_immediate to 1 for the
+	first alternative.
+	(*vec_concatv2di_rex64_sse4_1): Likewise.  Override prefix_rex
+	to 1 for the first and third alternative.
+	(*vec_concatv2di_rex64_sse): Override prefix_rex to 1 for the second
+	alternative.
+	(*sse2_maskmovdqu, *sse2_maskmovdqu_rex64): Override length_vex
+	attribute.
+	(*sse_sfence, sse2_mfence, sse2_lfence): Override length_address
+	attribute to 0.
+	(*avx_phaddwv8hi3, *avx_phadddv4si3, *avx_phaddswv8hi3,
+	*avx_phsubwv8hi3, *avx_phsubdv4si3, *avx_phsubswv8hi,
+	*avx_pmaddubsw128, *avx_pmulhrswv8hi3, *avx_pshufbv16qi3,
+	*avx_psign<mode>3): Set prefix_extra attribute to 1.
+	(ssse3_phaddwv4hi3, ssse3_phadddv2si3, ssse3_phaddswv4hi3,
+	ssse3_phsubwv4hi3, ssse3_phsubdv2si3, ssse3_phsubswv4hi3,
+	ssse3_pmaddubsw, *ssse3_pmulhrswv4hi, ssse3_pshufbv8qi3,
+	ssse3_psign<mode>3): Override prefix_rex attribute.
+	(*avx_palignrti): Override prefix_extra and length_immediate
+	to 1.
+	(ssse3_palignrti): Override length_immediate to 1.
+	(ssse3_palignrdi): Override length_immediate to 1, override
+	prefix_rex attribute.
+	(abs<mode>2): Override prefix_rep to 0, override prefix_rex
+	attribute.
+	(sse4a_extrqi): Override length_immediate to 2.
+	(sse4a_insertqi): Likewise.  Override prefix_data16 to 0.
+	(sse4a_insertq): Override prefix_data16 to 0.
+	(avx_blendp<avxmodesuffixf2c><avxmodesuffix>,
+	avx_blendvp<avxmodesuffixf2c><avxmodesuffix>,
+	avx_dpp<avxmodesuffixf2c><avxmodesuffix>, *avx_mpsadbw,
+	*avx_pblendvb, *avx_pblendw, avx_roundp<avxmodesuffixf2c>256,
+	avx_rounds<avxmodesuffixf2c>256): Override prefix_extra
+	and length_immediate to 1.
+	(sse4_1_blendp<ssemodesuffixf2c>, sse4_1_dpp<ssemodesuffixf2c>,
+	sse4_2_pcmpestr, sse4_2_pcmpestri, sse4_2_pcmpestrm,
+	sse4_2_pcmpestr_cconly, sse4_2_pcmpistr, sse4_2_pcmpistri,
+	sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Override prefix_data16
+	and length_immediate to 1.
+	(sse4_1_blendvp<ssemodesuffixf2c>): Override prefix_data16 to 1.
+	(sse4_1_mpsadbw, sse4_1_pblendw): Override length_immediate to 1.
+	(*avx_packusdw, avx_vtestp<avxmodesuffixf2c><avxmodesuffix>,
+	avx_ptest256): Override prefix_extra to 1.
+	(sse4_1_roundp<ssemodesuffixf2c>, sse4_1_rounds<ssemodesuffixf2c>):
+	Override prefix_data16 and length_immediate to 1.
+	(sse5_pperm_zero_v16qi_v8hi, sse5_pperm_sign_v16qi_v8hi,
+	sse5_pperm_zero_v8hi_v4si, sse5_pperm_sign_v8hi_v4si,
+	sse5_pperm_zero_v4si_v2di, sse5_pperm_sign_v4si_v2di,
+	sse5_vrotl<mode>3, sse5_ashl<mode>3, sse5_lshl<mode>3): Override
+	prefix_data16 to 0 and prefix_extra to 2.
+	(sse5_rotl<mode>3, sse5_rotr<mode>3): Override length_immediate to 1.
+	(sse5_frcz<mode>2, sse5_vmfrcz<mode>2): Don't override prefix_extra
+	attribute.
+	(*sse5_vmmaskcmp<mode>3, sse5_com_tf<mode>3,
+	sse5_maskcmp<mode>3, sse5_maskcmp<mode>3, sse5_maskcmp_uns<mode>3):
+	Override prefix_data16 and prefix_rep to 0, length_immediate to 1
+	and prefix_extra to 2.
+	(sse5_maskcmp_uns2<mode>3, sse5_pcom_tf<mode>3): Override
+	prefix_data16 to 0, length_immediate to 1 and prefix_extra to 2.
+	(*avx_aesenc, *avx_aesenclast, *avx_aesdec, *avx_aesdeclast,
+	avx_vpermilvar<mode>3,
+	avx_vbroadcasts<avxmodesuffixf2c><avxmodesuffix>,
+	avx_vbroadcastss256, avx_vbroadcastf128_p<avxmodesuffixf2c>256,
+	avx_maskloadp<avxmodesuffixf2c><avxmodesuffix>,
+	avx_maskstorep<avxmodesuffixf2c><avxmodesuffix>):
+	Override prefix_extra to 1.
+	(aeskeygenassist, pclmulqdq): Override length_immediate to 1.
+	(*vpclmulqdq, avx_vpermil<mode>, avx_vperm2f128<mode>3,
+	vec_set_lo_<mode>, vec_set_hi_<mode>, vec_set_lo_v16hi,
+	vec_set_hi_v16hi, vec_set_lo_v32qi, vec_set_hi_v32qi): Override
+	prefix_extra and length_immediate to 1.
+	(*avx_vzeroall, avx_vzeroupper, avx_vzeroupper_rex64): Override
+	modrm to 0.
+	(*vec_concat<mode>_avx): Override prefix_extra and length_immediate
+	to 1 for the first alternative.
+	* config/i386/mmx.md (*mov<mode>_internal_rex64): Override
+	prefix_rep, prefix_data16 and/or prefix_rex attributes in certain
+	cases.
+	(*mov<mode>_internal_avx, *movv2sf_internal_rex64,
+	*movv2sf_internal_avx, *movv2sf_internal): Override
+	prefix_rep attribute for certain alternatives.
+	(*mov<mode>_internal): Override prefix_rep or prefix_data16
+	attributes for certain alternatives.
+	(*movv2sf_internal_rex64_avx): Override prefix_rep and length_vex
+	attributes for certain alternatives.
+	(*mmx_addv2sf3, *mmx_subv2sf3, *mmx_mulv2sf3,
+	*mmx_<code>v2sf3_finite, *mmx_<code>v2sf3, mmx_rcpv2sf2,
+	mmx_rcpit1v2sf3, mmx_rcpit2v2sf3, mmx_rsqrtv2sf2, mmx_rsqit1v2sf3,
+	mmx_haddv2sf3, mmx_hsubv2sf3, mmx_addsubv2sf3,
+	*mmx_eqv2sf3, mmx_gtv2sf3, mmx_gev2sf3, mmx_pf2id, mmx_pf2iw,
+	mmx_pi2fw, mmx_floatv2si2, mmx_pswapdv2sf2, *mmx_pmulhrwv4hi3,
+	mmx_pswapdv2si2): Set prefix_extra attribute to 1.
+	(mmx_ashr<mode>3, mmx_lshr<mode>3, mmx_ashl<mode>3): Set
+	length_immediate to 1 if operand 2 is const_int_operand.
+	(*mmx_pinsrw, mmx_pextrw, mmx_pshufw_1, *vec_dupv4hi,
+	*vec_extractv2si_1): Set length_immediate
+	attribute to 1.
+	(*mmx_uavgv8qi3): Override prefix_extra attribute to 1 if
+	using old 3DNOW insn rather than SSE/3DNOW_A.
+	(mmx_emms, mmx_femms): Clear modrm attribute.
+
+2009-05-29  Martin Jambor  <mjambor@suse.cz>
+
+	* tree-sra.c:  New implementation of SRA.
+
+	* params.def (PARAM_SRA_MAX_STRUCTURE_SIZE): Removed.
+	(PARAM_SRA_MAX_STRUCTURE_COUNT): Removed.
+	(PARAM_SRA_FIELD_STRUCTURE_RATIO): Removed.
+	* params.h (SRA_MAX_STRUCTURE_SIZE): Removed.
+	(SRA_MAX_STRUCTURE_COUNT): Removed.
+	(SRA_FIELD_STRUCTURE_RATIO): Removed.
+	* doc/invoke.texi (sra-max-structure-size): Removed.
+	(sra-field-structure-ratio): Removed.
+
+2009-05-29  Jakub Jelinek  <jakub@redhat.com>
+
+	PR middle-end/40291
+	* builtins.c (expand_builtin_memcmp): Convert len to sizetype
+	before expansion.
+
+2009-05-29  Andrey Belevantsev  <abel@ispras.ru>
+
+	PR rtl-optimization/40101
+	* sel-sched-ir.c (get_seqno_by_preds): Allow returning negative
+	seqno.	Adjust comment.
+	* sel-sched.c (find_seqno_for_bookkeeping): Assert that when 
+	inserting bookkeeping before a jump, the jump is not scheduled.
+	When no positive seqno found, provide a value.  Add comment.
+
+2009-05-29  Richard Guenther  <rguenther@suse.de>
+
+	* tree-ssa-alias.c (nonaliasing_component_refs_p): Remove
+	short-cutting on the first component.
+
+2009-05-29  Jakub Jelinek  <jakub@redhat.com>
+
+	PR middle-end/39958
+	* omp-low.c (scan_omp_1_op): Call remap_type on TREE_TYPE
+	for trees other than decls/types.
+
+2009-05-29  Richard Guenther  <rguenther@suse.de>
+
+	* tree-ssa-operands.c (get_expr_operands): Do not handle
+	INDIRECT_REFs in the handled-component case.  Remove
+	unused get_ref_base_and_extent case.
+	* tree-dfa.c (get_ref_base_and_extent): Avoid calling
+	tree_low_cst and host_integerp where possible.
+	* tree-ssa-structalias.c (equiv_class_label_eq): Check hash
+	codes for equivalence.
+	* dce.c (find_call_stack_args): Avoid redundant bitmap queries.
+
+2009-05-29  David Billinghurst <billingd@gcc.gnu.org>
+
+	* config.gcc: Add i386/t-fprules-softfp and soft-fp/t-softfp
+	to tmake_file for i[34567]86-*-cygwin*.	
+
+2009-05-29  Jakub Jelinek  <jakub@redhat.com>
+
+	PR target/40017
+	* config/rs6000/rs6000-c.c (_Bool_keyword): New variable.
+	(altivec_categorize_keyword, init_vector_keywords,
+	rs6000_cpu_cpp_builtins): Define _Bool as conditional macro
+	similar to bool.
+
 2009-05-29  Kai Tietz  <kai.tietz@onevision.com>
 
 	* tree.c (handle_dll_attribute): Check if node is


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]