This is the mail archive of the gcc-regression@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A recent patch increased GCC's memory consumption in some cases!


Hi,
this seems to cause over 20% more memory consumption in some cases.
My personal guess would be for new SRA. rtl-optimization/28071 testcase
is very regular and easy to analyze, so it might be worthwhile to try to
figure out why we need so much more RAM.  It might be just more SRAing.

Honza
> Hi,
> 
> I am a friendly script caring about memory consumption in GCC.  Please
> contact jh@suse.cz if something is going wrong.
> 
> Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
> and generate-3.4.ii I got:
> 
> 
> comparing empty function compilation at -O0 level:
>     Overall memory needed: 8801k
>     Peak memory use before GGC: 1488k
>     Peak memory use after GGC: 1437k
>     Maximum of released memory in single GGC run: 85k
>     Garbage: 218k
>     Leak: 1537k
>     Overhead: 187k
>     GGC runs: 4
>     Pre-IPA-Garbage: 210k
>     Pre-IPA-Leak: 1539k
>     Pre-IPA-Overhead: 186k
>     Post-IPA-Garbage: 210k
>     Post-IPA-Leak: 1539k
>     Post-IPA-Overhead: 186k
> 
> comparing empty function compilation at -O0 -g level:
>     Overall memory needed: 8825k
>     Peak memory use before GGC: 1516k
>     Peak memory use after GGC: 1464k
>     Maximum of released memory in single GGC run: 87k
>     Garbage: 219k
>     Leak: 1570k
>     Overhead: 192k
>     GGC runs: 4
>     Pre-IPA-Garbage: 210k
>     Pre-IPA-Leak: 1539k
>     Pre-IPA-Overhead: 186k
>     Post-IPA-Garbage: 210k
>     Post-IPA-Leak: 1539k
>     Post-IPA-Overhead: 186k
> 
> comparing empty function compilation at -O1 level:
>     Overall memory needed: 8801k
>     Peak memory use before GGC: 1488k
>     Peak memory use after GGC: 1437k
>     Maximum of released memory in single GGC run: 90k
>     Garbage: 223k
>     Leak: 1537k
>     Overhead: 188k
>     GGC runs: 4
>     Pre-IPA-Garbage: 212k
>     Pre-IPA-Leak: 1540k
>     Pre-IPA-Overhead: 186k
>     Post-IPA-Garbage: 212k
>     Post-IPA-Leak: 1540k
>     Post-IPA-Overhead: 186k
> 
> comparing empty function compilation at -O2 level:
>     Overall memory needed: 8941k -> 8929k
>     Peak memory use before GGC: 1488k
>     Peak memory use after GGC: 1437k
>     Maximum of released memory in single GGC run: 90k
>     Garbage: 228k
>     Leak: 1537k
>     Overhead: 189k
>     GGC runs: 5
>     Pre-IPA-Garbage: 212k
>     Pre-IPA-Leak: 1540k
>     Pre-IPA-Overhead: 186k
>     Post-IPA-Garbage: 212k
>     Post-IPA-Leak: 1540k
>     Post-IPA-Overhead: 186k
> 
> comparing empty function compilation at -O3 level:
>     Overall memory needed: 8933k -> 8929k
>     Peak memory use before GGC: 1488k
>     Peak memory use after GGC: 1437k
>     Maximum of released memory in single GGC run: 90k
>     Garbage: 228k
>     Leak: 1537k
>     Overhead: 189k
>     GGC runs: 5
>     Pre-IPA-Garbage: 212k
>     Pre-IPA-Leak: 1540k
>     Pre-IPA-Overhead: 186k
>     Post-IPA-Garbage: 212k
>     Post-IPA-Leak: 1540k
>     Post-IPA-Overhead: 186k
> 
> comparing combine.c compilation at -O0 level:
>     Overall memory needed: 31457k
>     Peak memory use before GGC: 17478k
>     Peak memory use after GGC: 17029k
>     Maximum of released memory in single GGC run: 1911k
>     Garbage: 37895k
>     Leak: 7171k -> 7155k
>     Overhead: 5490k -> 5491k
>     GGC runs: 331
>     Pre-IPA-Garbage: 12530k
>     Pre-IPA-Leak: 18411k
>     Pre-IPA-Overhead: 2504k
>     Post-IPA-Garbage: 12530k
>     Post-IPA-Leak: 18411k
>     Post-IPA-Overhead: 2504k
> 
> comparing combine.c compilation at -O0 -g level:
>     Overall memory needed: 33401k
>     Peak memory use before GGC: 19386k
>     Peak memory use after GGC: 18869k
>     Maximum of released memory in single GGC run: 1920k
>     Garbage: 38110k
>     Leak: 10441k
>     Overhead: 6303k
>     GGC runs: 315
>     Pre-IPA-Garbage: 12549k
>     Pre-IPA-Leak: 20660k
>     Pre-IPA-Overhead: 2986k
>     Post-IPA-Garbage: 12549k
>     Post-IPA-Leak: 20660k
>     Post-IPA-Overhead: 2986k
> 
> comparing combine.c compilation at -O1 level:
>   Amount of produced GGC garbage increased from 45806k to 45931k, overall 0.27%
>     Overall memory needed: 31913k -> 32209k
>     Peak memory use before GGC: 16555k -> 16551k
>     Peak memory use after GGC: 16383k -> 16380k
>     Maximum of released memory in single GGC run: 1378k
>     Garbage: 45806k -> 45931k
>     Leak: 7156k -> 7155k
>     Overhead: 6440k -> 6449k
>     GGC runs: 388
>     Pre-IPA-Garbage: 13405k
>     Pre-IPA-Leak: 17702k
>     Pre-IPA-Overhead: 2552k
>     Post-IPA-Garbage: 13405k
>     Post-IPA-Leak: 17702k
>     Post-IPA-Overhead: 2552k
> 
> comparing combine.c compilation at -O2 level:
>   Amount of produced GGC garbage increased from 56306k to 56509k, overall 0.36%
>     Overall memory needed: 32989k -> 33021k
>     Peak memory use before GGC: 16628k -> 16615k
>     Peak memory use after GGC: 16454k -> 16447k
>     Maximum of released memory in single GGC run: 1489k
>     Garbage: 56306k -> 56509k
>     Leak: 7188k -> 7188k
>     Overhead: 8033k -> 8083k
>     GGC runs: 441 -> 443
>     Pre-IPA-Garbage: 13435k
>     Pre-IPA-Leak: 17724k
>     Pre-IPA-Overhead: 2555k
>     Post-IPA-Garbage: 13435k
>     Post-IPA-Leak: 17724k
>     Post-IPA-Overhead: 2555k
> 
> comparing combine.c compilation at -O3 level:
>   Amount of produced GGC garbage increased from 80462k to 80674k, overall 0.26%
>     Overall memory needed: 37029k -> 37149k
>     Peak memory use before GGC: 16727k -> 16723k
>     Peak memory use after GGC: 16557k -> 16551k
>     Maximum of released memory in single GGC run: 1681k -> 1682k
>     Garbage: 80462k -> 80674k
>     Leak: 7249k -> 7249k
>     Overhead: 11103k -> 11072k
>     GGC runs: 527 -> 528
>     Pre-IPA-Garbage: 13435k
>     Pre-IPA-Leak: 17759k
>     Pre-IPA-Overhead: 2555k
>     Post-IPA-Garbage: 13435k
>     Post-IPA-Leak: 17759k
>     Post-IPA-Overhead: 2555k
> 
> comparing insn-attrtab.c compilation at -O0 level:
>     Overall memory needed: 152505k -> 152521k
>     Peak memory use before GGC: 65254k
>     Peak memory use after GGC: 52818k
>     Maximum of released memory in single GGC run: 26250k
>     Garbage: 128569k
>     Leak: 9587k
>     Overhead: 16691k
>     GGC runs: 258
>     Pre-IPA-Garbage: 40782k
>     Pre-IPA-Leak: 51014k
>     Pre-IPA-Overhead: 7761k
>     Post-IPA-Garbage: 40782k
>     Post-IPA-Leak: 51014k
>     Post-IPA-Overhead: 7761k
> 
> comparing insn-attrtab.c compilation at -O0 -g level:
>     Overall memory needed: 153817k
>     Peak memory use before GGC: 66520k
>     Peak memory use after GGC: 54081k
>     Maximum of released memory in single GGC run: 26251k
>     Garbage: 128907k
>     Leak: 11219k
>     Overhead: 17144k
>     GGC runs: 252
>     Pre-IPA-Garbage: 40791k
>     Pre-IPA-Leak: 52539k
>     Pre-IPA-Overhead: 8091k
>     Post-IPA-Garbage: 40791k
>     Post-IPA-Leak: 52539k
>     Post-IPA-Overhead: 8091k
> 
> comparing insn-attrtab.c compilation at -O1 level:
>     Overall memory needed: 154545k
>     Peak memory use before GGC: 54972k
>     Peak memory use after GGC: 44902k
>     Maximum of released memory in single GGC run: 17233k
>     Garbage: 181124k -> 181126k
>     Leak: 9178k
>     Overhead: 23427k -> 23428k
>     GGC runs: 298
>     Pre-IPA-Garbage: 45256k
>     Pre-IPA-Leak: 45116k
>     Pre-IPA-Overhead: 7607k
>     Post-IPA-Garbage: 45256k
>     Post-IPA-Leak: 45116k
>     Post-IPA-Overhead: 7607k
> 
> comparing insn-attrtab.c compilation at -O2 level:
>     Overall memory needed: 202421k
>     Peak memory use before GGC: 54418k
>     Peak memory use after GGC: 44649k
>     Maximum of released memory in single GGC run: 18696k
>     Garbage: 211641k -> 211649k
>     Leak: 9193k
>     Overhead: 29298k -> 29301k
>     GGC runs: 331
>     Pre-IPA-Garbage: 45281k
>     Pre-IPA-Leak: 45120k
>     Pre-IPA-Overhead: 7609k
>     Post-IPA-Garbage: 45281k
>     Post-IPA-Leak: 45120k
>     Post-IPA-Overhead: 7609k
> 
> comparing insn-attrtab.c compilation at -O3 level:
>     Overall memory needed: 205857k -> 206109k
>     Peak memory use before GGC: 54430k
>     Peak memory use after GGC: 44658k
>     Maximum of released memory in single GGC run: 18679k
>     Garbage: 229893k -> 229989k
>     Leak: 9211k
>     Overhead: 31200k -> 31223k
>     GGC runs: 351 -> 352
>     Pre-IPA-Garbage: 45281k
>     Pre-IPA-Leak: 45120k
>     Pre-IPA-Overhead: 7609k
>     Post-IPA-Garbage: 45281k
>     Post-IPA-Leak: 45120k
>     Post-IPA-Overhead: 7609k
> 
> comparing Gerald's testcase PR8361 compilation at -O0 level:
>     Overall memory needed: 146321k -> 146265k
>     Peak memory use before GGC: 81868k
>     Peak memory use after GGC: 81058k
>     Maximum of released memory in single GGC run: 13542k
>     Garbage: 192383k -> 192383k
>     Leak: 55448k
>     Overhead: 28823k -> 28823k
>     GGC runs: 436
>     Pre-IPA-Garbage: 105277k
>     Pre-IPA-Leak: 84587k
>     Pre-IPA-Overhead: 15579k
>     Post-IPA-Garbage: 105277k
>     Post-IPA-Leak: 84587k
>     Post-IPA-Overhead: 15579k
> 
> comparing Gerald's testcase PR8361 compilation at -O0 -g level:
>     Overall memory needed: 163689k -> 163757k
>     Peak memory use before GGC: 95614k
>     Peak memory use after GGC: 94667k
>     Maximum of released memory in single GGC run: 13985k
>     Garbage: 197353k -> 197353k
>     Leak: 82400k
>     Overhead: 35357k -> 35357k
>     GGC runs: 409
>     Pre-IPA-Garbage: 105779k
>     Pre-IPA-Leak: 101036k
>     Pre-IPA-Overhead: 19072k
>     Post-IPA-Garbage: 105779k
>     Post-IPA-Leak: 101036k
>     Post-IPA-Overhead: 19072k
> 
> comparing Gerald's testcase PR8361 compilation at -O1 level:
>   Amount of produced GGC garbage increased from 269059k to 269992k, overall 0.35%
>     Overall memory needed: 108649k -> 107909k
>     Peak memory use before GGC: 82538k -> 81507k
>     Peak memory use after GGC: 81713k -> 80699k
>     Maximum of released memory in single GGC run: 13775k -> 13777k
>     Garbage: 269059k -> 269992k
>     Leak: 52272k -> 52256k
>     Overhead: 31689k -> 31955k
>     GGC runs: 525
>     Pre-IPA-Garbage: 153885k -> 153091k
>     Pre-IPA-Leak: 86855k -> 85821k
>     Pre-IPA-Overhead: 19248k -> 19154k
>     Post-IPA-Garbage: 153885k -> 153091k
>     Post-IPA-Leak: 86855k -> 85821k
>     Post-IPA-Overhead: 19248k -> 19154k
> 
> comparing Gerald's testcase PR8361 compilation at -O2 level:
>   Amount of produced GGC garbage increased from 304549k to 306261k, overall 0.56%
>     Overall memory needed: 108129k -> 107593k
>     Peak memory use before GGC: 82566k -> 81283k
>     Peak memory use after GGC: 80858k -> 80157k
>     Maximum of released memory in single GGC run: 13773k
>     Garbage: 304549k -> 306261k
>     Leak: 52369k -> 52370k
>     Overhead: 36687k -> 37155k
>     GGC runs: 569 -> 570
>     Pre-IPA-Garbage: 156917k -> 156404k
>     Pre-IPA-Leak: 85949k -> 85114k
>     Pre-IPA-Overhead: 19461k -> 19383k
>     Post-IPA-Garbage: 156917k -> 156404k
>     Post-IPA-Leak: 85949k -> 85114k
>     Post-IPA-Overhead: 19461k -> 19383k
> 
> comparing Gerald's testcase PR8361 compilation at -O3 level:
>   Amount of produced GGC garbage increased from 335372k to 336826k, overall 0.43%
>     Overall memory needed: 114521k -> 113781k
>     Peak memory use before GGC: 83032k -> 81749k
>     Peak memory use after GGC: 80967k -> 80157k
>     Maximum of released memory in single GGC run: 13773k
>     Garbage: 335372k -> 336826k
>     Leak: 52415k -> 52415k
>     Overhead: 40618k -> 40764k
>     GGC runs: 606 -> 609
>     Pre-IPA-Garbage: 156917k -> 156404k
>     Pre-IPA-Leak: 85953k -> 85117k
>     Pre-IPA-Overhead: 19461k -> 19383k
>     Post-IPA-Garbage: 156917k -> 156404k
>     Post-IPA-Leak: 85953k -> 85117k
>     Post-IPA-Overhead: 19461k -> 19383k
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
>     Overall memory needed: 358789k -> 358769k
>     Peak memory use before GGC: 78173k
>     Peak memory use after GGC: 49107k
>     Maximum of released memory in single GGC run: 37057k
>     Garbage: 140190k
>     Leak: 7711k
>     Overhead: 24960k
>     GGC runs: 86
>     Pre-IPA-Garbage: 12171k
>     Pre-IPA-Leak: 18626k
>     Pre-IPA-Overhead: 2403k
>     Post-IPA-Garbage: 12171k
>     Post-IPA-Leak: 18626k
>     Post-IPA-Overhead: 2403k
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
>     Overall memory needed: 359561k -> 359521k
>     Peak memory use before GGC: 78856k
>     Peak memory use after GGC: 49791k
>     Maximum of released memory in single GGC run: 37041k
>     Garbage: 140255k
>     Leak: 9707k
>     Overhead: 25529k
>     GGC runs: 94
>     Pre-IPA-Garbage: 12173k
>     Pre-IPA-Leak: 18873k
>     Pre-IPA-Overhead: 2456k
>     Post-IPA-Garbage: 12173k
>     Post-IPA-Leak: 18873k
>     Post-IPA-Overhead: 2456k
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
>   Overall memory allocated via mmap and sbrk increased from 308773k to 347669k, overall 12.60%
>   Peak amount of GGC memory allocated before garbage collecting increased from 80235k to 94211k, overall 17.42%
>   Peak amount of GGC memory still allocated after garbage collecting increased from 69462k to 82935k, overall 19.40%
>   Amount of produced GGC garbage increased from 224434k to 256003k, overall 14.07%
>   Amount of memory still referenced at the end of compilation increased from 9462k to 9535k, overall 0.77%
>     Overall memory needed: 308773k -> 347669k
>     Peak memory use before GGC: 80235k -> 94211k
>     Peak memory use after GGC: 69462k -> 82935k
>     Maximum of released memory in single GGC run: 38514k -> 47307k
>     Garbage: 224434k -> 256003k
>     Leak: 9462k -> 9535k
>     Overhead: 32358k -> 35484k
>     GGC runs: 95 -> 97
>   Amount of produced pre-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
>   Amount of memory referenced pre-ipa increased from 63974k to 64580k, overall 0.95%
>     Pre-IPA-Garbage: 41119k -> 42051k
>     Pre-IPA-Leak: 63974k -> 64580k
>     Pre-IPA-Overhead: 7105k -> 7108k
>   Amount of produced post-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
>   Amount of memory referenced post-ipa increased from 63974k to 64580k, overall 0.95%
>     Post-IPA-Garbage: 41119k -> 42051k
>     Post-IPA-Leak: 63974k -> 64580k
>     Post-IPA-Overhead: 7105k -> 7108k
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
>   Overall memory allocated via mmap and sbrk increased from 518213k to 666965k, overall 28.70%
>   Peak amount of GGC memory allocated before garbage collecting increased from 80260k to 90301k, overall 12.51%
>   Peak amount of GGC memory still allocated after garbage collecting increased from 69463k to 82936k, overall 19.40%
>   Amount of produced GGC garbage increased from 266746k to 302290k, overall 13.32%
>   Amount of memory still referenced at the end of compilation increased from 9463k to 11372k, overall 20.17%
>     Overall memory needed: 518213k -> 666965k
>     Peak memory use before GGC: 80260k -> 90301k
>     Peak memory use after GGC: 69463k -> 82936k
>     Maximum of released memory in single GGC run: 38750k -> 38640k
>     Garbage: 266746k -> 302290k
>     Leak: 9463k -> 11372k
>     Overhead: 42002k -> 50851k
>     GGC runs: 107
>   Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
>   Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
>     Pre-IPA-Garbage: 90152k -> 84972k
>     Pre-IPA-Leak: 80240k -> 86483k
>     Pre-IPA-Overhead: 11095k -> 11064k
>   Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
>   Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
>     Post-IPA-Garbage: 90152k -> 84972k
>     Post-IPA-Leak: 80240k -> 86483k
>     Post-IPA-Overhead: 11095k -> 11064k
> 
> comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
>   Overall memory allocated via mmap and sbrk increased from 1031281k to 1233337k, overall 19.59%
>   Amount of produced GGC garbage increased from 329142k to 347506k, overall 5.58%
>     Overall memory needed: 1031281k -> 1233337k
>     Peak memory use before GGC: 135102k -> 133837k
>     Peak memory use after GGC: 126444k -> 126291k
>     Maximum of released memory in single GGC run: 54329k -> 51246k
>     Garbage: 329142k -> 347506k
>     Leak: 10302k -> 10289k
>     Overhead: 42368k -> 43798k
>     GGC runs: 106 -> 107
>   Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
>   Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
>     Pre-IPA-Garbage: 90152k -> 84972k
>     Pre-IPA-Leak: 80240k -> 86483k
>     Pre-IPA-Overhead: 11095k -> 11064k
>   Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
>   Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
>     Post-IPA-Garbage: 90152k -> 84972k
>     Post-IPA-Leak: 80240k -> 86483k
>     Post-IPA-Overhead: 11095k -> 11064k
> 
> Head of the ChangeLog is:
> 
> --- /usr/src/SpecTests/sandbox-haydn-memory/x86_64/mem-result/ChangeLog	2009-05-29 07:03:50.000000000 +0000
> +++ /usr/src/SpecTests/sandbox-haydn-memory/gcc/gcc/ChangeLog	2009-05-29 21:48:20.000000000 +0000
> @@ -1,3 +1,310 @@
> +2009-05-29  Eric Botcazou  <ebotcazou@adacore.com>
> +
> +	* tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
> +	(force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
> +	(ptr_difference_cost): Use affine combinations to compute it.
> +	(difference_cost): Likewise.
> +	(get_computation_cost_at): Compute more accurate cost for addresses
> +	if the ratio is a multiplier allowed in addresses.
> +	For non-addresses, consider that an additional offset or symbol is
> +	added only once.
> +
> +2009-05-29  Jakub Jelinek  <jakub@redhat.com>
> +
> +	* config/i386/i386.c (ix86_decompose_address): Avoid useless
> +	0 displacement.  Add 0 displacement if base is %[er]bp or %r13.
> +
> +	* config/i386/i386.md (prefix_data16, prefix_rep): Set to 0 for
> +	TYPE_SSE{MULADD,4ARG,IADD1,CVT1} by default.
> +	(prefix_rex): For UNIT_MMX don't imply the prefix by default
> +	if MODE_DI.
> +	(prefix_extra): Default to 2 for TYPE_SSE{MULADD,4ARG} and
> +	to 1 for TYPE_SSE{IADD1,CVT1}.
> +	(prefix_vex_imm8): Removed.
> +	(length_vex): Only pass 1 as second argument to
> +	ix86_attr_length_vex_default if prefix_extra is 0.
> +	(modrm): For TYPE_INCDEC only set to 0 if not TARGET_64BIT.
> +	(length): For prefix vex computation use length_immediate
> +	attribute instead of prefix_vex_imm8.
> +	(cmpqi_ext_3_insn, cmpqi_ext_3_insn_rex64,
> +	addqi_ext_1, addqi_ext_1_rex64, *testqi_ext_0, andqi_ext_0,
> +	*andqi_ext_0_cc, *iorqi_ext_0, *xorqi_ext_0, *xorqi_cc_ext_1,
> +	*xorqi_cc_ext_1_rex64): Override modrm attribute to 1.
> +	(extendsidi2_rex64, extendhidi2, extendqidi2, extendhisi2,
> +	*extendhisi2_zext, extendqihi2, extendqisi2, *extendqisi2_zext): Emit
> +	a space in between the operands.
> +	(*anddi_1_rex64, *andsi_1): Likewise.  Override prefix_rex to 1
> +	if one operand is 0xff and the other one si, di, bp or sp.
> +	(*andhi_1): Override prefix_rex to 1 if one operand is 0xff and the
> +	other one si, di, bp or sp.
> +	(*btsq, *btrq, *btcq, *btdi_rex64, *btsi): Add mode attribute.
> +	(*ffssi_1, *ffsdi_1, ctzsi2, ctzdi2): Add
> +	type and mode attributes.
> +	(*bsr, *bsr_rex64, *bsrhi): Add type attribute.
> +	(*cmpfp_i_mixed, *cmpfp_iu_mixed): For TYPE_SSECOMI, clear
> +	prefix_rep attribute and set prefix_data16 attribute iff MODE_DF.
> +	(*cmpfp_i_sse, *cmpfp_iu_sse): Clear prefix_rep attribute and set
> +	prefix_data16 attribute iff MODE_DF.
> +	(*movsi_1): For TYPE_SSEMOV MODE_SI set prefix_data16 attribute.
> +	(fix_trunc<mode>di_sse): Set prefix_rex attribute.
> +	(*adddi_4_rex64, *addsi_4): Use const128_operand instead of
> +	constm128_operand in length_immediate computation.
> +	(*addhi_4): Likewise.  Fix mode attribute to MODE_HI.
> +	(anddi_1_rex64): Use movzbl/movzwl instead of movzbq/movzwq.
> +	(*avx_ashlti3, sse2_ashlti3, *avx_lshrti3, sse2_lshrti3): Set
> +	length_immediate attribute to 1.
> +	(x86_fnstsw_1, x86_fnstcw_1, x86_fldcw_1): Fix length attribute.
> +	(*movdi_1_rex64): Override prefix_rex or prefix_data16 attributes
> +	for certain alternatives.
> +	(*movdf_nointeger, *movdf_integer_rex64, *movdf_integer): Override
> +	prefix_data16 attribute if MODE_V1DF.
> +	(*avx_setcc<mode>, *sse_setcc<mode>, *sse5_setcc<mode>): Set
> +	length_immediate to 1.
> +	(set_got_rex64, set_rip_rex64): Remove length attribute, set
> +	length_address to 4, set mode attribute to MODE_DI.
> +	(set_got_offset_rex64): Likewise.  Set length_immediate to 0.
> +	(fxam<mode>2_i387): Set length attribute to 4.
> +	(*prefetch_sse, *prefetch_sse_rex, *prefetch_3dnow,
> +	*prefetch_3dnow_rex): Override length_address attribute.
> +	(sse4_2_crc32<mode>): Override prefix_data16 and prefix_rex
> +	attributes.
> +	* config/i386/predicates.md (ext_QIreg_nomode_operand): New predicate.
> +	(constm128_operand): Removed.
> +	* config/i386/i386.c (memory_address_length): For
> +	disp && !index && !base in 64-bit mode account for SIB byte if
> +	print_operand_address can't optimize disp32 into disp32(%rip)
> +	and UNSPEC doesn't imply (%rip) addressing.  Add 1 to length
> +	for fs: or gs: segment.
> +	(ix86_attr_length_immediate_default): When checking if shortform
> +	is possible, truncate immediate to the length of the non-shortened
> +	immediate.
> +	(ix86_attr_length_address_default): Ignore MEM_P operands
> +	with X constraint.
> +	(ix86_attr_length_vex_default): Only check for DImode on
> +	GENERAL_REG_P operands.
> +	* config/i386/sse.md (<sse>_comi, <sse>_ucomi): Clear
> +	prefix_rep attribute, set prefix_data16 attribute iff MODE_DF.
> +	(sse_cvttps2pi): Clear prefix_rep attribute.
> +	(sse2_cvttps2dq, *sse2_cvtpd2dq, sse2_cvtps2pd): Clear prefix_data16
> +	attribute.
> +	(*sse2_cvttpd2dq): Don't clear prefix_rep attribute.
> +	(*avx_ashr<mode>3, ashr<mode>3, *avx_lshr<mode>3, lshr<mode>3,
> +	*avx_ashl<mode>3, ashl<mode>3): Set length_immediate attribute to 1
> +	iff operand 2 is const_int_operand.
> +	(*vec_dupv4si, avx_shufpd256_1, *avx_shufpd_<mode>,
> +	sse2_shufpd_<mode>): Set length_immediate attribute to 1.
> +	(sse2_pshufd_1): Likewise.  Set prefix attribute to maybe_vex
> +	instead of vex.
> +	(sse2_pshuflw_1, sse2_pshufhw_1): Set length_immediate to 1 and clear
> +	prefix_data16.
> +	(sse2_unpckhpd, sse2_unpcklpd, sse2_storehpd, *vec_concatv2df): Set
> +	prefix_data16 attribute for movlpd and movhpd instructions.
> +	(sse2_loadhpd, sse2_loadlpd, sse2_movsd): Likewise.  Override
> +	length_immediate for shufpd instruction.
> +	(sse2_movntsi, sse3_lddqu): Clear prefix_data16 attribute.
> +	(avx_cmpp<avxmodesuffixf2c><mode>3,
> +	avx_cmps<ssemodesuffixf2c><mode>3, *avx_maskcmp<mode>3,
> +	<sse>_maskcmp<mode>3, <sse>_vmmaskcmp<mode>3,
> +	avx_shufps256_1, *avx_shufps_<mode>, sse_shufps_<mode>,
> +	*vec_dupv4sf_avx, *vec_dupv4sf): Set
> +	length_immediate attribute to 1.
> +	(*avx_cvtsi2ssq, *avx_cvtsi2sdq): Set length_vex attribute to 4.
> +	(sse_cvtsi2ssq, sse2_cvtsi2sdq): Set prefix_rex attribute to 1.
> +	(sse2_cvtpi2pd, sse_loadlps, sse2_storelpd): Override
> +	prefix_data16 attribute for the first alternative to 1.
> +	(*avx_loadlps): Override length_immediate for the first alternative.
> +	(*vec_concatv2sf_avx): Override length_immediate and prefix_extra
> +	attributes for second alternative.
> +	(*vec_concatv2sf_sse4_1): Override length_immediate and
> +	prefix_data16 attributes for second alternative.
> +	(*vec_setv4sf_avx, *avx_insertps, vec_extract_lo_<mode>,
> +	vec_extract_hi_<mode>, vec_extract_lo_v16hi,
> +	vec_extract_hi_v16hi, vec_extract_lo_v32qi,
> +	vec_extract_hi_v32qi): Set prefix_extra and length_immediate to 1.
> +	(*vec_setv4sf_sse4_1, sse4_1_insertps, *sse4_1_extractps): Set
> +	prefix_data16 and length_immediate to 1.
> +	(*avx_mulv2siv2di3, *avx_mulv4si3, sse4_2_gtv2di3): Set prefix_extra
> +	to 1.
> +	(*avx_<code><mode>3, *avx_eq<mode>3, *avx_gt<mode>3): Set
> +	prefix_extra attribute for variants that don't have 0f prefix
> +	alone.
> +	(*avx_pinsr<ssevecsize>): Likewise.  Set length_immediate to 1.
> +	(*sse4_1_pinsrb, *sse2_pinsrw, *sse4_1_pinsrd, *sse4_1_pextrb,
> +	*sse4_1_pextrb_memory, *sse2_pextrw, *sse4_1_pextrw_memory,
> +	*sse4_1_pextrd): Set length_immediate to 1.
> +	(*sse4_1_pinsrd): Likewise.  Set prefix_extra to 1.
> +	(*sse4_1_pinsrq, *sse4_1_pextrq): Set prefix_rex and length_immediate
> +	to 1.
> +	(*vec_extractv2di_1_rex64_avx, *vec_extractv2di_1_rex64,
> +	*vec_extractv2di_1_avx, *vec_extractv2di_1_sse2): Override
> +	length_immediate to 1 for second alternative.
> +	(*vec_concatv2si_avx, *vec_concatv2di_rex64_avx): Override
> +	prefix_extra and length_immediate attributes for the first
> +	alternative.
> +	(vec_concatv2si_sse4_1): Override length_immediate to 1 for the
> +	first alternative.
> +	(*vec_concatv2di_rex64_sse4_1): Likewise.  Override prefix_rex
> +	to 1 for the first and third alternative.
> +	(*vec_concatv2di_rex64_sse): Override prefix_rex to 1 for the second
> +	alternative.
> +	(*sse2_maskmovdqu, *sse2_maskmovdqu_rex64): Override length_vex
> +	attribute.
> +	(*sse_sfence, sse2_mfence, sse2_lfence): Override length_address
> +	attribute to 0.
> +	(*avx_phaddwv8hi3, *avx_phadddv4si3, *avx_phaddswv8hi3,
> +	*avx_phsubwv8hi3, *avx_phsubdv4si3, *avx_phsubswv8hi,
> +	*avx_pmaddubsw128, *avx_pmulhrswv8hi3, *avx_pshufbv16qi3,
> +	*avx_psign<mode>3): Set prefix_extra attribute to 1.
> +	(ssse3_phaddwv4hi3, ssse3_phadddv2si3, ssse3_phaddswv4hi3,
> +	ssse3_phsubwv4hi3, ssse3_phsubdv2si3, ssse3_phsubswv4hi3,
> +	ssse3_pmaddubsw, *ssse3_pmulhrswv4hi, ssse3_pshufbv8qi3,
> +	ssse3_psign<mode>3): Override prefix_rex attribute.
> +	(*avx_palignrti): Override prefix_extra and length_immediate
> +	to 1.
> +	(ssse3_palignrti): Override length_immediate to 1.
> +	(ssse3_palignrdi): Override length_immediate to 1, override
> +	prefix_rex attribute.
> +	(abs<mode>2): Override prefix_rep to 0, override prefix_rex
> +	attribute.
> +	(sse4a_extrqi): Override length_immediate to 2.
> +	(sse4a_insertqi): Likewise.  Override prefix_data16 to 0.
> +	(sse4a_insertq): Override prefix_data16 to 0.
> +	(avx_blendp<avxmodesuffixf2c><avxmodesuffix>,
> +	avx_blendvp<avxmodesuffixf2c><avxmodesuffix>,
> +	avx_dpp<avxmodesuffixf2c><avxmodesuffix>, *avx_mpsadbw,
> +	*avx_pblendvb, *avx_pblendw, avx_roundp<avxmodesuffixf2c>256,
> +	avx_rounds<avxmodesuffixf2c>256): Override prefix_extra
> +	and length_immediate to 1.
> +	(sse4_1_blendp<ssemodesuffixf2c>, sse4_1_dpp<ssemodesuffixf2c>,
> +	sse4_2_pcmpestr, sse4_2_pcmpestri, sse4_2_pcmpestrm,
> +	sse4_2_pcmpestr_cconly, sse4_2_pcmpistr, sse4_2_pcmpistri,
> +	sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Override prefix_data16
> +	and length_immediate to 1.
> +	(sse4_1_blendvp<ssemodesuffixf2c>): Override prefix_data16 to 1.
> +	(sse4_1_mpsadbw, sse4_1_pblendw): Override length_immediate to 1.
> +	(*avx_packusdw, avx_vtestp<avxmodesuffixf2c><avxmodesuffix>,
> +	avx_ptest256): Override prefix_extra to 1.
> +	(sse4_1_roundp<ssemodesuffixf2c>, sse4_1_rounds<ssemodesuffixf2c>):
> +	Override prefix_data16 and length_immediate to 1.
> +	(sse5_pperm_zero_v16qi_v8hi, sse5_pperm_sign_v16qi_v8hi,
> +	sse5_pperm_zero_v8hi_v4si, sse5_pperm_sign_v8hi_v4si,
> +	sse5_pperm_zero_v4si_v2di, sse5_pperm_sign_v4si_v2di,
> +	sse5_vrotl<mode>3, sse5_ashl<mode>3, sse5_lshl<mode>3): Override
> +	prefix_data16 to 0 and prefix_extra to 2.
> +	(sse5_rotl<mode>3, sse5_rotr<mode>3): Override length_immediate to 1.
> +	(sse5_frcz<mode>2, sse5_vmfrcz<mode>2): Don't override prefix_extra
> +	attribute.
> +	(*sse5_vmmaskcmp<mode>3, sse5_com_tf<mode>3,
> +	sse5_maskcmp<mode>3, sse5_maskcmp<mode>3, sse5_maskcmp_uns<mode>3):
> +	Override prefix_data16 and prefix_rep to 0, length_immediate to 1
> +	and prefix_extra to 2.
> +	(sse5_maskcmp_uns2<mode>3, sse5_pcom_tf<mode>3): Override
> +	prefix_data16 to 0, length_immediate to 1 and prefix_extra to 2.
> +	(*avx_aesenc, *avx_aesenclast, *avx_aesdec, *avx_aesdeclast,
> +	avx_vpermilvar<mode>3,
> +	avx_vbroadcasts<avxmodesuffixf2c><avxmodesuffix>,
> +	avx_vbroadcastss256, avx_vbroadcastf128_p<avxmodesuffixf2c>256,
> +	avx_maskloadp<avxmodesuffixf2c><avxmodesuffix>,
> +	avx_maskstorep<avxmodesuffixf2c><avxmodesuffix>):
> +	Override prefix_extra to 1.
> +	(aeskeygenassist, pclmulqdq): Override length_immediate to 1.
> +	(*vpclmulqdq, avx_vpermil<mode>, avx_vperm2f128<mode>3,
> +	vec_set_lo_<mode>, vec_set_hi_<mode>, vec_set_lo_v16hi,
> +	vec_set_hi_v16hi, vec_set_lo_v32qi, vec_set_hi_v32qi): Override
> +	prefix_extra and length_immediate to 1.
> +	(*avx_vzeroall, avx_vzeroupper, avx_vzeroupper_rex64): Override
> +	modrm to 0.
> +	(*vec_concat<mode>_avx): Override prefix_extra and length_immediate
> +	to 1 for the first alternative.
> +	* config/i386/mmx.md (*mov<mode>_internal_rex64): Override
> +	prefix_rep, prefix_data16 and/or prefix_rex attributes in certain
> +	cases.
> +	(*mov<mode>_internal_avx, *movv2sf_internal_rex64,
> +	*movv2sf_internal_avx, *movv2sf_internal): Override
> +	prefix_rep attribute for certain alternatives.
> +	(*mov<mode>_internal): Override prefix_rep or prefix_data16
> +	attributes for certain alternatives.
> +	(*movv2sf_internal_rex64_avx): Override prefix_rep and length_vex
> +	attributes for certain alternatives.
> +	(*mmx_addv2sf3, *mmx_subv2sf3, *mmx_mulv2sf3,
> +	*mmx_<code>v2sf3_finite, *mmx_<code>v2sf3, mmx_rcpv2sf2,
> +	mmx_rcpit1v2sf3, mmx_rcpit2v2sf3, mmx_rsqrtv2sf2, mmx_rsqit1v2sf3,
> +	mmx_haddv2sf3, mmx_hsubv2sf3, mmx_addsubv2sf3,
> +	*mmx_eqv2sf3, mmx_gtv2sf3, mmx_gev2sf3, mmx_pf2id, mmx_pf2iw,
> +	mmx_pi2fw, mmx_floatv2si2, mmx_pswapdv2sf2, *mmx_pmulhrwv4hi3,
> +	mmx_pswapdv2si2): Set prefix_extra attribute to 1.
> +	(mmx_ashr<mode>3, mmx_lshr<mode>3, mmx_ashl<mode>3): Set
> +	length_immediate to 1 if operand 2 is const_int_operand.
> +	(*mmx_pinsrw, mmx_pextrw, mmx_pshufw_1, *vec_dupv4hi,
> +	*vec_extractv2si_1): Set length_immediate
> +	attribute to 1.
> +	(*mmx_uavgv8qi3): Override prefix_extra attribute to 1 if
> +	using old 3DNOW insn rather than SSE/3DNOW_A.
> +	(mmx_emms, mmx_femms): Clear modrm attribute.
> +
> +2009-05-29  Martin Jambor  <mjambor@suse.cz>
> +
> +	* tree-sra.c:  New implementation of SRA.
> +
> +	* params.def (PARAM_SRA_MAX_STRUCTURE_SIZE): Removed.
> +	(PARAM_SRA_MAX_STRUCTURE_COUNT): Removed.
> +	(PARAM_SRA_FIELD_STRUCTURE_RATIO): Removed.
> +	* params.h (SRA_MAX_STRUCTURE_SIZE): Removed.
> +	(SRA_MAX_STRUCTURE_COUNT): Removed.
> +	(SRA_FIELD_STRUCTURE_RATIO): Removed.
> +	* doc/invoke.texi (sra-max-structure-size): Removed.
> +	(sra-field-structure-ratio): Removed.
> +
> +2009-05-29  Jakub Jelinek  <jakub@redhat.com>
> +
> +	PR middle-end/40291
> +	* builtins.c (expand_builtin_memcmp): Convert len to sizetype
> +	before expansion.
> +
> +2009-05-29  Andrey Belevantsev  <abel@ispras.ru>
> +
> +	PR rtl-optimization/40101
> +	* sel-sched-ir.c (get_seqno_by_preds): Allow returning negative
> +	seqno.	Adjust comment.
> +	* sel-sched.c (find_seqno_for_bookkeeping): Assert that when 
> +	inserting bookkeeping before a jump, the jump is not scheduled.
> +	When no positive seqno found, provide a value.  Add comment.
> +
> +2009-05-29  Richard Guenther  <rguenther@suse.de>
> +
> +	* tree-ssa-alias.c (nonaliasing_component_refs_p): Remove
> +	short-cutting on the first component.
> +
> +2009-05-29  Jakub Jelinek  <jakub@redhat.com>
> +
> +	PR middle-end/39958
> +	* omp-low.c (scan_omp_1_op): Call remap_type on TREE_TYPE
> +	for trees other than decls/types.
> +
> +2009-05-29  Richard Guenther  <rguenther@suse.de>
> +
> +	* tree-ssa-operands.c (get_expr_operands): Do not handle
> +	INDIRECT_REFs in the handled-component case.  Remove
> +	unused get_ref_base_and_extent case.
> +	* tree-dfa.c (get_ref_base_and_extent): Avoid calling
> +	tree_low_cst and host_integerp where possible.
> +	* tree-ssa-structalias.c (equiv_class_label_eq): Check hash
> +	codes for equivalence.
> +	* dce.c (find_call_stack_args): Avoid redundant bitmap queries.
> +
> +2009-05-29  David Billinghurst <billingd@gcc.gnu.org>
> +
> +	* config.gcc: Add i386/t-fprules-softfp and soft-fp/t-softfp
> +	to tmake_file for i[34567]86-*-cygwin*.	
> +
> +2009-05-29  Jakub Jelinek  <jakub@redhat.com>
> +
> +	PR target/40017
> +	* config/rs6000/rs6000-c.c (_Bool_keyword): New variable.
> +	(altivec_categorize_keyword, init_vector_keywords,
> +	rs6000_cpu_cpp_builtins): Define _Bool as conditional macro
> +	similar to bool.
> +
>  2009-05-29  Kai Tietz  <kai.tietz@onevision.com>
>  
>  	* tree.c (handle_dll_attribute): Check if node is
> 
> 
> The results can be reproduced by building a compiler with
> 
> --enable-gather-detailed-mem-stats targetting x86-64
> 
> and compiling preprocessed combine.c or testcase from PR8632 with:
> 
> -fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
> 
> The memory consumption summary appears in the dump after detailed listing
> of the places they are allocated in.  Peak memory consumption is actually
> computed by looking for maximal value in {GC XXXX -> YYYY} report.
> 
> Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]