This is the mail archive of the
gcc-regression@gcc.gnu.org
mailing list for the GCC project.
A recent patch increased GCC's memory consumption in some cases!
- From: gcctest at suse dot de
- To: jh at suse dot cz, hubicka at ucw dot cz, gcc-regression at gcc dot gnu dot org
- Date: Sat, 30 May 2009 00:31:14 +0000
- Subject: A recent patch increased GCC's memory consumption in some cases!
Hi,
I am a friendly script caring about memory consumption in GCC. Please
contact jh@suse.cz if something is going wrong.
Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:
comparing empty function compilation at -O0 level:
Overall memory needed: 8801k
Peak memory use before GGC: 1488k
Peak memory use after GGC: 1437k
Maximum of released memory in single GGC run: 85k
Garbage: 218k
Leak: 1537k
Overhead: 187k
GGC runs: 4
Pre-IPA-Garbage: 210k
Pre-IPA-Leak: 1539k
Pre-IPA-Overhead: 186k
Post-IPA-Garbage: 210k
Post-IPA-Leak: 1539k
Post-IPA-Overhead: 186k
comparing empty function compilation at -O0 -g level:
Overall memory needed: 8825k
Peak memory use before GGC: 1516k
Peak memory use after GGC: 1464k
Maximum of released memory in single GGC run: 87k
Garbage: 219k
Leak: 1570k
Overhead: 192k
GGC runs: 4
Pre-IPA-Garbage: 210k
Pre-IPA-Leak: 1539k
Pre-IPA-Overhead: 186k
Post-IPA-Garbage: 210k
Post-IPA-Leak: 1539k
Post-IPA-Overhead: 186k
comparing empty function compilation at -O1 level:
Overall memory needed: 8801k
Peak memory use before GGC: 1488k
Peak memory use after GGC: 1437k
Maximum of released memory in single GGC run: 90k
Garbage: 223k
Leak: 1537k
Overhead: 188k
GGC runs: 4
Pre-IPA-Garbage: 212k
Pre-IPA-Leak: 1540k
Pre-IPA-Overhead: 186k
Post-IPA-Garbage: 212k
Post-IPA-Leak: 1540k
Post-IPA-Overhead: 186k
comparing empty function compilation at -O2 level:
Overall memory needed: 8941k -> 8929k
Peak memory use before GGC: 1488k
Peak memory use after GGC: 1437k
Maximum of released memory in single GGC run: 90k
Garbage: 228k
Leak: 1537k
Overhead: 189k
GGC runs: 5
Pre-IPA-Garbage: 212k
Pre-IPA-Leak: 1540k
Pre-IPA-Overhead: 186k
Post-IPA-Garbage: 212k
Post-IPA-Leak: 1540k
Post-IPA-Overhead: 186k
comparing empty function compilation at -O3 level:
Overall memory needed: 8933k -> 8929k
Peak memory use before GGC: 1488k
Peak memory use after GGC: 1437k
Maximum of released memory in single GGC run: 90k
Garbage: 228k
Leak: 1537k
Overhead: 189k
GGC runs: 5
Pre-IPA-Garbage: 212k
Pre-IPA-Leak: 1540k
Pre-IPA-Overhead: 186k
Post-IPA-Garbage: 212k
Post-IPA-Leak: 1540k
Post-IPA-Overhead: 186k
comparing combine.c compilation at -O0 level:
Overall memory needed: 31457k
Peak memory use before GGC: 17478k
Peak memory use after GGC: 17029k
Maximum of released memory in single GGC run: 1911k
Garbage: 37895k
Leak: 7171k -> 7155k
Overhead: 5490k -> 5491k
GGC runs: 331
Pre-IPA-Garbage: 12530k
Pre-IPA-Leak: 18411k
Pre-IPA-Overhead: 2504k
Post-IPA-Garbage: 12530k
Post-IPA-Leak: 18411k
Post-IPA-Overhead: 2504k
comparing combine.c compilation at -O0 -g level:
Overall memory needed: 33401k
Peak memory use before GGC: 19386k
Peak memory use after GGC: 18869k
Maximum of released memory in single GGC run: 1920k
Garbage: 38110k
Leak: 10441k
Overhead: 6303k
GGC runs: 315
Pre-IPA-Garbage: 12549k
Pre-IPA-Leak: 20660k
Pre-IPA-Overhead: 2986k
Post-IPA-Garbage: 12549k
Post-IPA-Leak: 20660k
Post-IPA-Overhead: 2986k
comparing combine.c compilation at -O1 level:
Amount of produced GGC garbage increased from 45806k to 45931k, overall 0.27%
Overall memory needed: 31913k -> 32209k
Peak memory use before GGC: 16555k -> 16551k
Peak memory use after GGC: 16383k -> 16380k
Maximum of released memory in single GGC run: 1378k
Garbage: 45806k -> 45931k
Leak: 7156k -> 7155k
Overhead: 6440k -> 6449k
GGC runs: 388
Pre-IPA-Garbage: 13405k
Pre-IPA-Leak: 17702k
Pre-IPA-Overhead: 2552k
Post-IPA-Garbage: 13405k
Post-IPA-Leak: 17702k
Post-IPA-Overhead: 2552k
comparing combine.c compilation at -O2 level:
Amount of produced GGC garbage increased from 56306k to 56509k, overall 0.36%
Overall memory needed: 32989k -> 33021k
Peak memory use before GGC: 16628k -> 16615k
Peak memory use after GGC: 16454k -> 16447k
Maximum of released memory in single GGC run: 1489k
Garbage: 56306k -> 56509k
Leak: 7188k -> 7188k
Overhead: 8033k -> 8083k
GGC runs: 441 -> 443
Pre-IPA-Garbage: 13435k
Pre-IPA-Leak: 17724k
Pre-IPA-Overhead: 2555k
Post-IPA-Garbage: 13435k
Post-IPA-Leak: 17724k
Post-IPA-Overhead: 2555k
comparing combine.c compilation at -O3 level:
Amount of produced GGC garbage increased from 80462k to 80674k, overall 0.26%
Overall memory needed: 37029k -> 37149k
Peak memory use before GGC: 16727k -> 16723k
Peak memory use after GGC: 16557k -> 16551k
Maximum of released memory in single GGC run: 1681k -> 1682k
Garbage: 80462k -> 80674k
Leak: 7249k -> 7249k
Overhead: 11103k -> 11072k
GGC runs: 527 -> 528
Pre-IPA-Garbage: 13435k
Pre-IPA-Leak: 17759k
Pre-IPA-Overhead: 2555k
Post-IPA-Garbage: 13435k
Post-IPA-Leak: 17759k
Post-IPA-Overhead: 2555k
comparing insn-attrtab.c compilation at -O0 level:
Overall memory needed: 152505k -> 152521k
Peak memory use before GGC: 65254k
Peak memory use after GGC: 52818k
Maximum of released memory in single GGC run: 26250k
Garbage: 128569k
Leak: 9587k
Overhead: 16691k
GGC runs: 258
Pre-IPA-Garbage: 40782k
Pre-IPA-Leak: 51014k
Pre-IPA-Overhead: 7761k
Post-IPA-Garbage: 40782k
Post-IPA-Leak: 51014k
Post-IPA-Overhead: 7761k
comparing insn-attrtab.c compilation at -O0 -g level:
Overall memory needed: 153817k
Peak memory use before GGC: 66520k
Peak memory use after GGC: 54081k
Maximum of released memory in single GGC run: 26251k
Garbage: 128907k
Leak: 11219k
Overhead: 17144k
GGC runs: 252
Pre-IPA-Garbage: 40791k
Pre-IPA-Leak: 52539k
Pre-IPA-Overhead: 8091k
Post-IPA-Garbage: 40791k
Post-IPA-Leak: 52539k
Post-IPA-Overhead: 8091k
comparing insn-attrtab.c compilation at -O1 level:
Overall memory needed: 154545k
Peak memory use before GGC: 54972k
Peak memory use after GGC: 44902k
Maximum of released memory in single GGC run: 17233k
Garbage: 181124k -> 181126k
Leak: 9178k
Overhead: 23427k -> 23428k
GGC runs: 298
Pre-IPA-Garbage: 45256k
Pre-IPA-Leak: 45116k
Pre-IPA-Overhead: 7607k
Post-IPA-Garbage: 45256k
Post-IPA-Leak: 45116k
Post-IPA-Overhead: 7607k
comparing insn-attrtab.c compilation at -O2 level:
Overall memory needed: 202421k
Peak memory use before GGC: 54418k
Peak memory use after GGC: 44649k
Maximum of released memory in single GGC run: 18696k
Garbage: 211641k -> 211649k
Leak: 9193k
Overhead: 29298k -> 29301k
GGC runs: 331
Pre-IPA-Garbage: 45281k
Pre-IPA-Leak: 45120k
Pre-IPA-Overhead: 7609k
Post-IPA-Garbage: 45281k
Post-IPA-Leak: 45120k
Post-IPA-Overhead: 7609k
comparing insn-attrtab.c compilation at -O3 level:
Overall memory needed: 205857k -> 206109k
Peak memory use before GGC: 54430k
Peak memory use after GGC: 44658k
Maximum of released memory in single GGC run: 18679k
Garbage: 229893k -> 229989k
Leak: 9211k
Overhead: 31200k -> 31223k
GGC runs: 351 -> 352
Pre-IPA-Garbage: 45281k
Pre-IPA-Leak: 45120k
Pre-IPA-Overhead: 7609k
Post-IPA-Garbage: 45281k
Post-IPA-Leak: 45120k
Post-IPA-Overhead: 7609k
comparing Gerald's testcase PR8361 compilation at -O0 level:
Overall memory needed: 146321k -> 146265k
Peak memory use before GGC: 81868k
Peak memory use after GGC: 81058k
Maximum of released memory in single GGC run: 13542k
Garbage: 192383k -> 192383k
Leak: 55448k
Overhead: 28823k -> 28823k
GGC runs: 436
Pre-IPA-Garbage: 105277k
Pre-IPA-Leak: 84587k
Pre-IPA-Overhead: 15579k
Post-IPA-Garbage: 105277k
Post-IPA-Leak: 84587k
Post-IPA-Overhead: 15579k
comparing Gerald's testcase PR8361 compilation at -O0 -g level:
Overall memory needed: 163689k -> 163757k
Peak memory use before GGC: 95614k
Peak memory use after GGC: 94667k
Maximum of released memory in single GGC run: 13985k
Garbage: 197353k -> 197353k
Leak: 82400k
Overhead: 35357k -> 35357k
GGC runs: 409
Pre-IPA-Garbage: 105779k
Pre-IPA-Leak: 101036k
Pre-IPA-Overhead: 19072k
Post-IPA-Garbage: 105779k
Post-IPA-Leak: 101036k
Post-IPA-Overhead: 19072k
comparing Gerald's testcase PR8361 compilation at -O1 level:
Amount of produced GGC garbage increased from 269059k to 269992k, overall 0.35%
Overall memory needed: 108649k -> 107909k
Peak memory use before GGC: 82538k -> 81507k
Peak memory use after GGC: 81713k -> 80699k
Maximum of released memory in single GGC run: 13775k -> 13777k
Garbage: 269059k -> 269992k
Leak: 52272k -> 52256k
Overhead: 31689k -> 31955k
GGC runs: 525
Pre-IPA-Garbage: 153885k -> 153091k
Pre-IPA-Leak: 86855k -> 85821k
Pre-IPA-Overhead: 19248k -> 19154k
Post-IPA-Garbage: 153885k -> 153091k
Post-IPA-Leak: 86855k -> 85821k
Post-IPA-Overhead: 19248k -> 19154k
comparing Gerald's testcase PR8361 compilation at -O2 level:
Amount of produced GGC garbage increased from 304549k to 306261k, overall 0.56%
Overall memory needed: 108129k -> 107593k
Peak memory use before GGC: 82566k -> 81283k
Peak memory use after GGC: 80858k -> 80157k
Maximum of released memory in single GGC run: 13773k
Garbage: 304549k -> 306261k
Leak: 52369k -> 52370k
Overhead: 36687k -> 37155k
GGC runs: 569 -> 570
Pre-IPA-Garbage: 156917k -> 156404k
Pre-IPA-Leak: 85949k -> 85114k
Pre-IPA-Overhead: 19461k -> 19383k
Post-IPA-Garbage: 156917k -> 156404k
Post-IPA-Leak: 85949k -> 85114k
Post-IPA-Overhead: 19461k -> 19383k
comparing Gerald's testcase PR8361 compilation at -O3 level:
Amount of produced GGC garbage increased from 335372k to 336826k, overall 0.43%
Overall memory needed: 114521k -> 113781k
Peak memory use before GGC: 83032k -> 81749k
Peak memory use after GGC: 80967k -> 80157k
Maximum of released memory in single GGC run: 13773k
Garbage: 335372k -> 336826k
Leak: 52415k -> 52415k
Overhead: 40618k -> 40764k
GGC runs: 606 -> 609
Pre-IPA-Garbage: 156917k -> 156404k
Pre-IPA-Leak: 85953k -> 85117k
Pre-IPA-Overhead: 19461k -> 19383k
Post-IPA-Garbage: 156917k -> 156404k
Post-IPA-Leak: 85953k -> 85117k
Post-IPA-Overhead: 19461k -> 19383k
comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
Overall memory needed: 358789k -> 358769k
Peak memory use before GGC: 78173k
Peak memory use after GGC: 49107k
Maximum of released memory in single GGC run: 37057k
Garbage: 140190k
Leak: 7711k
Overhead: 24960k
GGC runs: 86
Pre-IPA-Garbage: 12171k
Pre-IPA-Leak: 18626k
Pre-IPA-Overhead: 2403k
Post-IPA-Garbage: 12171k
Post-IPA-Leak: 18626k
Post-IPA-Overhead: 2403k
comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
Overall memory needed: 359561k -> 359521k
Peak memory use before GGC: 78856k
Peak memory use after GGC: 49791k
Maximum of released memory in single GGC run: 37041k
Garbage: 140255k
Leak: 9707k
Overhead: 25529k
GGC runs: 94
Pre-IPA-Garbage: 12173k
Pre-IPA-Leak: 18873k
Pre-IPA-Overhead: 2456k
Post-IPA-Garbage: 12173k
Post-IPA-Leak: 18873k
Post-IPA-Overhead: 2456k
comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
Overall memory allocated via mmap and sbrk increased from 308773k to 347669k, overall 12.60%
Peak amount of GGC memory allocated before garbage collecting increased from 80235k to 94211k, overall 17.42%
Peak amount of GGC memory still allocated after garbage collecting increased from 69462k to 82935k, overall 19.40%
Amount of produced GGC garbage increased from 224434k to 256003k, overall 14.07%
Amount of memory still referenced at the end of compilation increased from 9462k to 9535k, overall 0.77%
Overall memory needed: 308773k -> 347669k
Peak memory use before GGC: 80235k -> 94211k
Peak memory use after GGC: 69462k -> 82935k
Maximum of released memory in single GGC run: 38514k -> 47307k
Garbage: 224434k -> 256003k
Leak: 9462k -> 9535k
Overhead: 32358k -> 35484k
GGC runs: 95 -> 97
Amount of produced pre-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
Amount of memory referenced pre-ipa increased from 63974k to 64580k, overall 0.95%
Pre-IPA-Garbage: 41119k -> 42051k
Pre-IPA-Leak: 63974k -> 64580k
Pre-IPA-Overhead: 7105k -> 7108k
Amount of produced post-ipa-GGC garbage increased from 41119k to 42051k, overall 2.27%
Amount of memory referenced post-ipa increased from 63974k to 64580k, overall 0.95%
Post-IPA-Garbage: 41119k -> 42051k
Post-IPA-Leak: 63974k -> 64580k
Post-IPA-Overhead: 7105k -> 7108k
comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
Overall memory allocated via mmap and sbrk increased from 518213k to 666965k, overall 28.70%
Peak amount of GGC memory allocated before garbage collecting increased from 80260k to 90301k, overall 12.51%
Peak amount of GGC memory still allocated after garbage collecting increased from 69463k to 82936k, overall 19.40%
Amount of produced GGC garbage increased from 266746k to 302290k, overall 13.32%
Amount of memory still referenced at the end of compilation increased from 9463k to 11372k, overall 20.17%
Overall memory needed: 518213k -> 666965k
Peak memory use before GGC: 80260k -> 90301k
Peak memory use after GGC: 69463k -> 82936k
Maximum of released memory in single GGC run: 38750k -> 38640k
Garbage: 266746k -> 302290k
Leak: 9463k -> 11372k
Overhead: 42002k -> 50851k
GGC runs: 107
Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
Pre-IPA-Garbage: 90152k -> 84972k
Pre-IPA-Leak: 80240k -> 86483k
Pre-IPA-Overhead: 11095k -> 11064k
Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
Post-IPA-Garbage: 90152k -> 84972k
Post-IPA-Leak: 80240k -> 86483k
Post-IPA-Overhead: 11095k -> 11064k
comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
Overall memory allocated via mmap and sbrk increased from 1031281k to 1233337k, overall 19.59%
Amount of produced GGC garbage increased from 329142k to 347506k, overall 5.58%
Overall memory needed: 1031281k -> 1233337k
Peak memory use before GGC: 135102k -> 133837k
Peak memory use after GGC: 126444k -> 126291k
Maximum of released memory in single GGC run: 54329k -> 51246k
Garbage: 329142k -> 347506k
Leak: 10302k -> 10289k
Overhead: 42368k -> 43798k
GGC runs: 106 -> 107
Amount of produced pre-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
Amount of memory referenced pre-ipa increased from 80240k to 86483k, overall 7.78%
Pre-IPA-Garbage: 90152k -> 84972k
Pre-IPA-Leak: 80240k -> 86483k
Pre-IPA-Overhead: 11095k -> 11064k
Amount of produced post-ipa-GGC garbage decreased from 90152k to 84972k, overall -6.10%
Amount of memory referenced post-ipa increased from 80240k to 86483k, overall 7.78%
Post-IPA-Garbage: 90152k -> 84972k
Post-IPA-Leak: 80240k -> 86483k
Post-IPA-Overhead: 11095k -> 11064k
Head of the ChangeLog is:
--- /usr/src/SpecTests/sandbox-haydn-memory/x86_64/mem-result/ChangeLog 2009-05-29 07:03:50.000000000 +0000
+++ /usr/src/SpecTests/sandbox-haydn-memory/gcc/gcc/ChangeLog 2009-05-29 21:48:20.000000000 +0000
@@ -1,3 +1,310 @@
+2009-05-29 Eric Botcazou <ebotcazou@adacore.com>
+
+ * tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
+ (force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
+ (ptr_difference_cost): Use affine combinations to compute it.
+ (difference_cost): Likewise.
+ (get_computation_cost_at): Compute more accurate cost for addresses
+ if the ratio is a multiplier allowed in addresses.
+ For non-addresses, consider that an additional offset or symbol is
+ added only once.
+
+2009-05-29 Jakub Jelinek <jakub@redhat.com>
+
+ * config/i386/i386.c (ix86_decompose_address): Avoid useless
+ 0 displacement. Add 0 displacement if base is %[er]bp or %r13.
+
+ * config/i386/i386.md (prefix_data16, prefix_rep): Set to 0 for
+ TYPE_SSE{MULADD,4ARG,IADD1,CVT1} by default.
+ (prefix_rex): For UNIT_MMX don't imply the prefix by default
+ if MODE_DI.
+ (prefix_extra): Default to 2 for TYPE_SSE{MULADD,4ARG} and
+ to 1 for TYPE_SSE{IADD1,CVT1}.
+ (prefix_vex_imm8): Removed.
+ (length_vex): Only pass 1 as second argument to
+ ix86_attr_length_vex_default if prefix_extra is 0.
+ (modrm): For TYPE_INCDEC only set to 0 if not TARGET_64BIT.
+ (length): For prefix vex computation use length_immediate
+ attribute instead of prefix_vex_imm8.
+ (cmpqi_ext_3_insn, cmpqi_ext_3_insn_rex64,
+ addqi_ext_1, addqi_ext_1_rex64, *testqi_ext_0, andqi_ext_0,
+ *andqi_ext_0_cc, *iorqi_ext_0, *xorqi_ext_0, *xorqi_cc_ext_1,
+ *xorqi_cc_ext_1_rex64): Override modrm attribute to 1.
+ (extendsidi2_rex64, extendhidi2, extendqidi2, extendhisi2,
+ *extendhisi2_zext, extendqihi2, extendqisi2, *extendqisi2_zext): Emit
+ a space in between the operands.
+ (*anddi_1_rex64, *andsi_1): Likewise. Override prefix_rex to 1
+ if one operand is 0xff and the other one si, di, bp or sp.
+ (*andhi_1): Override prefix_rex to 1 if one operand is 0xff and the
+ other one si, di, bp or sp.
+ (*btsq, *btrq, *btcq, *btdi_rex64, *btsi): Add mode attribute.
+ (*ffssi_1, *ffsdi_1, ctzsi2, ctzdi2): Add
+ type and mode attributes.
+ (*bsr, *bsr_rex64, *bsrhi): Add type attribute.
+ (*cmpfp_i_mixed, *cmpfp_iu_mixed): For TYPE_SSECOMI, clear
+ prefix_rep attribute and set prefix_data16 attribute iff MODE_DF.
+ (*cmpfp_i_sse, *cmpfp_iu_sse): Clear prefix_rep attribute and set
+ prefix_data16 attribute iff MODE_DF.
+ (*movsi_1): For TYPE_SSEMOV MODE_SI set prefix_data16 attribute.
+ (fix_trunc<mode>di_sse): Set prefix_rex attribute.
+ (*adddi_4_rex64, *addsi_4): Use const128_operand instead of
+ constm128_operand in length_immediate computation.
+ (*addhi_4): Likewise. Fix mode attribute to MODE_HI.
+ (anddi_1_rex64): Use movzbl/movzwl instead of movzbq/movzwq.
+ (*avx_ashlti3, sse2_ashlti3, *avx_lshrti3, sse2_lshrti3): Set
+ length_immediate attribute to 1.
+ (x86_fnstsw_1, x86_fnstcw_1, x86_fldcw_1): Fix length attribute.
+ (*movdi_1_rex64): Override prefix_rex or prefix_data16 attributes
+ for certain alternatives.
+ (*movdf_nointeger, *movdf_integer_rex64, *movdf_integer): Override
+ prefix_data16 attribute if MODE_V1DF.
+ (*avx_setcc<mode>, *sse_setcc<mode>, *sse5_setcc<mode>): Set
+ length_immediate to 1.
+ (set_got_rex64, set_rip_rex64): Remove length attribute, set
+ length_address to 4, set mode attribute to MODE_DI.
+ (set_got_offset_rex64): Likewise. Set length_immediate to 0.
+ (fxam<mode>2_i387): Set length attribute to 4.
+ (*prefetch_sse, *prefetch_sse_rex, *prefetch_3dnow,
+ *prefetch_3dnow_rex): Override length_address attribute.
+ (sse4_2_crc32<mode>): Override prefix_data16 and prefix_rex
+ attributes.
+ * config/i386/predicates.md (ext_QIreg_nomode_operand): New predicate.
+ (constm128_operand): Removed.
+ * config/i386/i386.c (memory_address_length): For
+ disp && !index && !base in 64-bit mode account for SIB byte if
+ print_operand_address can't optimize disp32 into disp32(%rip)
+ and UNSPEC doesn't imply (%rip) addressing. Add 1 to length
+ for fs: or gs: segment.
+ (ix86_attr_length_immediate_default): When checking if shortform
+ is possible, truncate immediate to the length of the non-shortened
+ immediate.
+ (ix86_attr_length_address_default): Ignore MEM_P operands
+ with X constraint.
+ (ix86_attr_length_vex_default): Only check for DImode on
+ GENERAL_REG_P operands.
+ * config/i386/sse.md (<sse>_comi, <sse>_ucomi): Clear
+ prefix_rep attribute, set prefix_data16 attribute iff MODE_DF.
+ (sse_cvttps2pi): Clear prefix_rep attribute.
+ (sse2_cvttps2dq, *sse2_cvtpd2dq, sse2_cvtps2pd): Clear prefix_data16
+ attribute.
+ (*sse2_cvttpd2dq): Don't clear prefix_rep attribute.
+ (*avx_ashr<mode>3, ashr<mode>3, *avx_lshr<mode>3, lshr<mode>3,
+ *avx_ashl<mode>3, ashl<mode>3): Set length_immediate attribute to 1
+ iff operand 2 is const_int_operand.
+ (*vec_dupv4si, avx_shufpd256_1, *avx_shufpd_<mode>,
+ sse2_shufpd_<mode>): Set length_immediate attribute to 1.
+ (sse2_pshufd_1): Likewise. Set prefix attribute to maybe_vex
+ instead of vex.
+ (sse2_pshuflw_1, sse2_pshufhw_1): Set length_immediate to 1 and clear
+ prefix_data16.
+ (sse2_unpckhpd, sse2_unpcklpd, sse2_storehpd, *vec_concatv2df): Set
+ prefix_data16 attribute for movlpd and movhpd instructions.
+ (sse2_loadhpd, sse2_loadlpd, sse2_movsd): Likewise. Override
+ length_immediate for shufpd instruction.
+ (sse2_movntsi, sse3_lddqu): Clear prefix_data16 attribute.
+ (avx_cmpp<avxmodesuffixf2c><mode>3,
+ avx_cmps<ssemodesuffixf2c><mode>3, *avx_maskcmp<mode>3,
+ <sse>_maskcmp<mode>3, <sse>_vmmaskcmp<mode>3,
+ avx_shufps256_1, *avx_shufps_<mode>, sse_shufps_<mode>,
+ *vec_dupv4sf_avx, *vec_dupv4sf): Set
+ length_immediate attribute to 1.
+ (*avx_cvtsi2ssq, *avx_cvtsi2sdq): Set length_vex attribute to 4.
+ (sse_cvtsi2ssq, sse2_cvtsi2sdq): Set prefix_rex attribute to 1.
+ (sse2_cvtpi2pd, sse_loadlps, sse2_storelpd): Override
+ prefix_data16 attribute for the first alternative to 1.
+ (*avx_loadlps): Override length_immediate for the first alternative.
+ (*vec_concatv2sf_avx): Override length_immediate and prefix_extra
+ attributes for second alternative.
+ (*vec_concatv2sf_sse4_1): Override length_immediate and
+ prefix_data16 attributes for second alternative.
+ (*vec_setv4sf_avx, *avx_insertps, vec_extract_lo_<mode>,
+ vec_extract_hi_<mode>, vec_extract_lo_v16hi,
+ vec_extract_hi_v16hi, vec_extract_lo_v32qi,
+ vec_extract_hi_v32qi): Set prefix_extra and length_immediate to 1.
+ (*vec_setv4sf_sse4_1, sse4_1_insertps, *sse4_1_extractps): Set
+ prefix_data16 and length_immediate to 1.
+ (*avx_mulv2siv2di3, *avx_mulv4si3, sse4_2_gtv2di3): Set prefix_extra
+ to 1.
+ (*avx_<code><mode>3, *avx_eq<mode>3, *avx_gt<mode>3): Set
+ prefix_extra attribute for variants that don't have 0f prefix
+ alone.
+ (*avx_pinsr<ssevecsize>): Likewise. Set length_immediate to 1.
+ (*sse4_1_pinsrb, *sse2_pinsrw, *sse4_1_pinsrd, *sse4_1_pextrb,
+ *sse4_1_pextrb_memory, *sse2_pextrw, *sse4_1_pextrw_memory,
+ *sse4_1_pextrd): Set length_immediate to 1.
+ (*sse4_1_pinsrd): Likewise. Set prefix_extra to 1.
+ (*sse4_1_pinsrq, *sse4_1_pextrq): Set prefix_rex and length_immediate
+ to 1.
+ (*vec_extractv2di_1_rex64_avx, *vec_extractv2di_1_rex64,
+ *vec_extractv2di_1_avx, *vec_extractv2di_1_sse2): Override
+ length_immediate to 1 for second alternative.
+ (*vec_concatv2si_avx, *vec_concatv2di_rex64_avx): Override
+ prefix_extra and length_immediate attributes for the first
+ alternative.
+ (vec_concatv2si_sse4_1): Override length_immediate to 1 for the
+ first alternative.
+ (*vec_concatv2di_rex64_sse4_1): Likewise. Override prefix_rex
+ to 1 for the first and third alternative.
+ (*vec_concatv2di_rex64_sse): Override prefix_rex to 1 for the second
+ alternative.
+ (*sse2_maskmovdqu, *sse2_maskmovdqu_rex64): Override length_vex
+ attribute.
+ (*sse_sfence, sse2_mfence, sse2_lfence): Override length_address
+ attribute to 0.
+ (*avx_phaddwv8hi3, *avx_phadddv4si3, *avx_phaddswv8hi3,
+ *avx_phsubwv8hi3, *avx_phsubdv4si3, *avx_phsubswv8hi,
+ *avx_pmaddubsw128, *avx_pmulhrswv8hi3, *avx_pshufbv16qi3,
+ *avx_psign<mode>3): Set prefix_extra attribute to 1.
+ (ssse3_phaddwv4hi3, ssse3_phadddv2si3, ssse3_phaddswv4hi3,
+ ssse3_phsubwv4hi3, ssse3_phsubdv2si3, ssse3_phsubswv4hi3,
+ ssse3_pmaddubsw, *ssse3_pmulhrswv4hi, ssse3_pshufbv8qi3,
+ ssse3_psign<mode>3): Override prefix_rex attribute.
+ (*avx_palignrti): Override prefix_extra and length_immediate
+ to 1.
+ (ssse3_palignrti): Override length_immediate to 1.
+ (ssse3_palignrdi): Override length_immediate to 1, override
+ prefix_rex attribute.
+ (abs<mode>2): Override prefix_rep to 0, override prefix_rex
+ attribute.
+ (sse4a_extrqi): Override length_immediate to 2.
+ (sse4a_insertqi): Likewise. Override prefix_data16 to 0.
+ (sse4a_insertq): Override prefix_data16 to 0.
+ (avx_blendp<avxmodesuffixf2c><avxmodesuffix>,
+ avx_blendvp<avxmodesuffixf2c><avxmodesuffix>,
+ avx_dpp<avxmodesuffixf2c><avxmodesuffix>, *avx_mpsadbw,
+ *avx_pblendvb, *avx_pblendw, avx_roundp<avxmodesuffixf2c>256,
+ avx_rounds<avxmodesuffixf2c>256): Override prefix_extra
+ and length_immediate to 1.
+ (sse4_1_blendp<ssemodesuffixf2c>, sse4_1_dpp<ssemodesuffixf2c>,
+ sse4_2_pcmpestr, sse4_2_pcmpestri, sse4_2_pcmpestrm,
+ sse4_2_pcmpestr_cconly, sse4_2_pcmpistr, sse4_2_pcmpistri,
+ sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Override prefix_data16
+ and length_immediate to 1.
+ (sse4_1_blendvp<ssemodesuffixf2c>): Override prefix_data16 to 1.
+ (sse4_1_mpsadbw, sse4_1_pblendw): Override length_immediate to 1.
+ (*avx_packusdw, avx_vtestp<avxmodesuffixf2c><avxmodesuffix>,
+ avx_ptest256): Override prefix_extra to 1.
+ (sse4_1_roundp<ssemodesuffixf2c>, sse4_1_rounds<ssemodesuffixf2c>):
+ Override prefix_data16 and length_immediate to 1.
+ (sse5_pperm_zero_v16qi_v8hi, sse5_pperm_sign_v16qi_v8hi,
+ sse5_pperm_zero_v8hi_v4si, sse5_pperm_sign_v8hi_v4si,
+ sse5_pperm_zero_v4si_v2di, sse5_pperm_sign_v4si_v2di,
+ sse5_vrotl<mode>3, sse5_ashl<mode>3, sse5_lshl<mode>3): Override
+ prefix_data16 to 0 and prefix_extra to 2.
+ (sse5_rotl<mode>3, sse5_rotr<mode>3): Override length_immediate to 1.
+ (sse5_frcz<mode>2, sse5_vmfrcz<mode>2): Don't override prefix_extra
+ attribute.
+ (*sse5_vmmaskcmp<mode>3, sse5_com_tf<mode>3,
+ sse5_maskcmp<mode>3, sse5_maskcmp<mode>3, sse5_maskcmp_uns<mode>3):
+ Override prefix_data16 and prefix_rep to 0, length_immediate to 1
+ and prefix_extra to 2.
+ (sse5_maskcmp_uns2<mode>3, sse5_pcom_tf<mode>3): Override
+ prefix_data16 to 0, length_immediate to 1 and prefix_extra to 2.
+ (*avx_aesenc, *avx_aesenclast, *avx_aesdec, *avx_aesdeclast,
+ avx_vpermilvar<mode>3,
+ avx_vbroadcasts<avxmodesuffixf2c><avxmodesuffix>,
+ avx_vbroadcastss256, avx_vbroadcastf128_p<avxmodesuffixf2c>256,
+ avx_maskloadp<avxmodesuffixf2c><avxmodesuffix>,
+ avx_maskstorep<avxmodesuffixf2c><avxmodesuffix>):
+ Override prefix_extra to 1.
+ (aeskeygenassist, pclmulqdq): Override length_immediate to 1.
+ (*vpclmulqdq, avx_vpermil<mode>, avx_vperm2f128<mode>3,
+ vec_set_lo_<mode>, vec_set_hi_<mode>, vec_set_lo_v16hi,
+ vec_set_hi_v16hi, vec_set_lo_v32qi, vec_set_hi_v32qi): Override
+ prefix_extra and length_immediate to 1.
+ (*avx_vzeroall, avx_vzeroupper, avx_vzeroupper_rex64): Override
+ modrm to 0.
+ (*vec_concat<mode>_avx): Override prefix_extra and length_immediate
+ to 1 for the first alternative.
+ * config/i386/mmx.md (*mov<mode>_internal_rex64): Override
+ prefix_rep, prefix_data16 and/or prefix_rex attributes in certain
+ cases.
+ (*mov<mode>_internal_avx, *movv2sf_internal_rex64,
+ *movv2sf_internal_avx, *movv2sf_internal): Override
+ prefix_rep attribute for certain alternatives.
+ (*mov<mode>_internal): Override prefix_rep or prefix_data16
+ attributes for certain alternatives.
+ (*movv2sf_internal_rex64_avx): Override prefix_rep and length_vex
+ attributes for certain alternatives.
+ (*mmx_addv2sf3, *mmx_subv2sf3, *mmx_mulv2sf3,
+ *mmx_<code>v2sf3_finite, *mmx_<code>v2sf3, mmx_rcpv2sf2,
+ mmx_rcpit1v2sf3, mmx_rcpit2v2sf3, mmx_rsqrtv2sf2, mmx_rsqit1v2sf3,
+ mmx_haddv2sf3, mmx_hsubv2sf3, mmx_addsubv2sf3,
+ *mmx_eqv2sf3, mmx_gtv2sf3, mmx_gev2sf3, mmx_pf2id, mmx_pf2iw,
+ mmx_pi2fw, mmx_floatv2si2, mmx_pswapdv2sf2, *mmx_pmulhrwv4hi3,
+ mmx_pswapdv2si2): Set prefix_extra attribute to 1.
+ (mmx_ashr<mode>3, mmx_lshr<mode>3, mmx_ashl<mode>3): Set
+ length_immediate to 1 if operand 2 is const_int_operand.
+ (*mmx_pinsrw, mmx_pextrw, mmx_pshufw_1, *vec_dupv4hi,
+ *vec_extractv2si_1): Set length_immediate
+ attribute to 1.
+ (*mmx_uavgv8qi3): Override prefix_extra attribute to 1 if
+ using old 3DNOW insn rather than SSE/3DNOW_A.
+ (mmx_emms, mmx_femms): Clear modrm attribute.
+
+2009-05-29 Martin Jambor <mjambor@suse.cz>
+
+ * tree-sra.c: New implementation of SRA.
+
+ * params.def (PARAM_SRA_MAX_STRUCTURE_SIZE): Removed.
+ (PARAM_SRA_MAX_STRUCTURE_COUNT): Removed.
+ (PARAM_SRA_FIELD_STRUCTURE_RATIO): Removed.
+ * params.h (SRA_MAX_STRUCTURE_SIZE): Removed.
+ (SRA_MAX_STRUCTURE_COUNT): Removed.
+ (SRA_FIELD_STRUCTURE_RATIO): Removed.
+ * doc/invoke.texi (sra-max-structure-size): Removed.
+ (sra-field-structure-ratio): Removed.
+
+2009-05-29 Jakub Jelinek <jakub@redhat.com>
+
+ PR middle-end/40291
+ * builtins.c (expand_builtin_memcmp): Convert len to sizetype
+ before expansion.
+
+2009-05-29 Andrey Belevantsev <abel@ispras.ru>
+
+ PR rtl-optimization/40101
+ * sel-sched-ir.c (get_seqno_by_preds): Allow returning negative
+ seqno. Adjust comment.
+ * sel-sched.c (find_seqno_for_bookkeeping): Assert that when
+ inserting bookkeeping before a jump, the jump is not scheduled.
+ When no positive seqno found, provide a value. Add comment.
+
+2009-05-29 Richard Guenther <rguenther@suse.de>
+
+ * tree-ssa-alias.c (nonaliasing_component_refs_p): Remove
+ short-cutting on the first component.
+
+2009-05-29 Jakub Jelinek <jakub@redhat.com>
+
+ PR middle-end/39958
+ * omp-low.c (scan_omp_1_op): Call remap_type on TREE_TYPE
+ for trees other than decls/types.
+
+2009-05-29 Richard Guenther <rguenther@suse.de>
+
+ * tree-ssa-operands.c (get_expr_operands): Do not handle
+ INDIRECT_REFs in the handled-component case. Remove
+ unused get_ref_base_and_extent case.
+ * tree-dfa.c (get_ref_base_and_extent): Avoid calling
+ tree_low_cst and host_integerp where possible.
+ * tree-ssa-structalias.c (equiv_class_label_eq): Check hash
+ codes for equivalence.
+ * dce.c (find_call_stack_args): Avoid redundant bitmap queries.
+
+2009-05-29 David Billinghurst <billingd@gcc.gnu.org>
+
+ * config.gcc: Add i386/t-fprules-softfp and soft-fp/t-softfp
+ to tmake_file for i[34567]86-*-cygwin*.
+
+2009-05-29 Jakub Jelinek <jakub@redhat.com>
+
+ PR target/40017
+ * config/rs6000/rs6000-c.c (_Bool_keyword): New variable.
+ (altivec_categorize_keyword, init_vector_keywords,
+ rs6000_cpu_cpp_builtins): Define _Bool as conditional macro
+ similar to bool.
+
2009-05-29 Kai Tietz <kai.tietz@onevision.com>
* tree.c (handle_dll_attribute): Check if node is
The results can be reproduced by building a compiler with
--enable-gather-detailed-mem-stats targetting x86-64
and compiling preprocessed combine.c or testcase from PR8632 with:
-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in. Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.
Your testing script.