This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fwprop, updated patch and SPEC results


> This is a reissue of the fwprop patch from the 4.2 timeframe.  The new 
> pass is unchanged from that code.  I did fix the FSF mailing address in 
> fwprop.c.  :-)
> 
> Given that Steven decided to deal himself with path following rather 
> than getting rid of it, there are some differences in the opts.c hunks, 
> and we don't need an additional GCSE pass as was in last May's patches.

Hi,
it is really nice to see fwprop in ;).  There seems to be some memory
consumption issue on huge CFGs however.  There seems to be some weird
noise in the mmap/sbrk numbers I didn't yet chance to analyze, but 45%
on PR28071 looks quite off the noise.
THat thing consist of very huge CFG consisting of expanded MIN/MAX
functions (many diamonds)...

Honza

comparing combine.c compilation at -O0 level:
    Overall memory needed: 28370k -> 28378k
    Peak memory use before GGC: 9293k
    Peak memory use after GGC: 8832k
    Maximum of released memory in single GGC run: 2666k
    Garbage: 36856k
    Leak: 6441k
    Overhead: 4860k
    GGC runs: 280

comparing combine.c compilation at -O1 level:
  Amount of produced GGC garbage increased from 57445k to 57582k, overall 0.24%
    Overall memory needed: 40210k -> 40214k
    Peak memory use before GGC: 17281k
    Peak memory use after GGC: 17106k
    Maximum of released memory in single GGC run: 2382k -> 2363k
    Garbage: 57445k -> 57582k
    Leak: 6505k -> 6495k
    Overhead: 6200k -> 6224k
    GGC runs: 355

comparing combine.c compilation at -O2 level:
  Amount of memory still referenced at the end of compilation increased from 6593k to 6603k, overall 0.15%
    Overall memory needed: 29790k
    Peak memory use before GGC: 17277k
    Peak memory use after GGC: 17106k
    Maximum of released memory in single GGC run: 2883k -> 2803k
    Garbage: 76254k -> 74898k
    Leak: 6593k -> 6603k
    Overhead: 8744k -> 8470k
    GGC runs: 420 -> 413

comparing combine.c compilation at -O3 level:
    Overall memory needed: 28894k
    Peak memory use before GGC: 18217k -> 18218k
    Peak memory use after GGC: 17833k -> 17834k
    Maximum of released memory in single GGC run: 4104k
    Garbage: 106198k -> 104230k
    Leak: 6668k
    Overhead: 12303k -> 11907k
    GGC runs: 469 -> 462

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 88230k
    Peak memory use before GGC: 69777k
    Peak memory use after GGC: 44187k
    Maximum of released memory in single GGC run: 36963k
    Garbage: 129065k
    Leak: 9501k
    Overhead: 16993k
    GGC runs: 216

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 114174k -> 115034k
    Peak memory use before GGC: 90363k
    Peak memory use after GGC: 83725k
    Maximum of released memory in single GGC run: 31806k -> 31852k
    Garbage: 277740k -> 277769k
    Leak: 9343k -> 9343k
    Overhead: 29775k -> 29778k
    GGC runs: 223

comparing insn-attrtab.c compilation at -O2 level:
  Ovarall memory allocated via mmap and sbrk decreased from 134058k to 120390k, overall -11.35%
    Overall memory needed: 134058k -> 120390k
    Peak memory use before GGC: 92593k
    Peak memory use after GGC: 84705k
    Maximum of released memory in single GGC run: 30380k -> 30394k
    Garbage: 319045k -> 317192k
    Leak: 9345k
    Overhead: 36716k -> 36353k
    GGC runs: 247 -> 246

comparing insn-attrtab.c compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 115570k to 134218k, overall 16.14%
    Overall memory needed: 115570k -> 134218k
    Peak memory use before GGC: 92618k
    Peak memory use after GGC: 84731k
    Maximum of released memory in single GGC run: 30570k -> 30584k
    Garbage: 319697k -> 317844k
    Leak: 9348k
    Overhead: 36914k -> 36551k
    GGC runs: 250

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 119538k
    Peak memory use before GGC: 92680k
    Peak memory use after GGC: 91760k
    Maximum of released memory in single GGC run: 19314k
    Garbage: 205600k
    Leak: 47677k
    Overhead: 20817k
    GGC runs: 402

comparing Gerald's testcase PR8361 compilation at -O1 level:
    Overall memory needed: 119278k
    Peak memory use before GGC: 97848k
    Peak memory use after GGC: 95638k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 444357k -> 444206k
    Leak: 50010k -> 50011k
    Overhead: 32820k -> 32784k
    GGC runs: 552

comparing Gerald's testcase PR8361 compilation at -O2 level:
    Overall memory needed: 119286k
    Peak memory use before GGC: 97848k
    Peak memory use after GGC: 95638k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 506005k -> 503957k
    Leak: 50715k -> 50716k
    Overhead: 40490k -> 40089k
    GGC runs: 610 -> 609

comparing Gerald's testcase PR8361 compilation at -O3 level:
    Overall memory needed: 118930k
    Peak memory use before GGC: 97894k
    Peak memory use after GGC: 96924k
    Maximum of released memory in single GGC run: 18847k
    Garbage: 525605k -> 523592k
    Leak: 50291k -> 50291k
    Overhead: 40993k -> 40599k
    GGC runs: 623 -> 622

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 137946k
    Peak memory use before GGC: 81898k
    Peak memory use after GGC: 58777k
    Maximum of released memory in single GGC run: 45493k
    Garbage: 147195k
    Leak: 7522k
    Overhead: 25300k
    GGC runs: 83

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 424310k -> 423006k
    Peak memory use before GGC: 205260k
    Peak memory use after GGC: 201036k
    Maximum of released memory in single GGC run: 101716k -> 101714k
    Garbage: 271708k -> 271706k
    Leak: 47588k
    Overhead: 30829k -> 30829k
    GGC runs: 101

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
  Amount of produced GGC garbage increased from 350433k to 351905k, overall 0.42%
    Overall memory needed: 351126k -> 352334k
    Peak memory use before GGC: 206011k -> 206001k
    Peak memory use after GGC: 201787k -> 201777k
    Maximum of released memory in single GGC run: 108042k -> 108617k
    Garbage: 350433k -> 351905k
    Leak: 48171k -> 48171k
    Overhead: 46275k -> 46573k
    GGC runs: 108 -> 110

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
  Overall memory allocated via mmap and sbrk increased from 535350k to 781042k, overall 45.89%
  Amount of produced GGC garbage increased from 491202k to 494299k, overall 0.63%
    Overall memory needed: 535350k -> 781042k
    Peak memory use before GGC: 314918k -> 314916k
    Peak memory use after GGC: 293261k -> 293259k
    Maximum of released memory in single GGC run: 163448k -> 165331k
    Garbage: 491202k -> 494299k
    Leak: 65503k -> 65503k
    Overhead: 59091k -> 59714k
    GGC runs: 95 -> 98

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2006-11-04 05:20:54.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2006-11-04 08:57:00.000000000 +0000
@@ -1,3 +1,32 @@
+2006-11-03  Paolo Bonzini  <bonzini@gnu.org>
+            Steven Bosscher  <stevenb.gcc@gmail.com>
+
+        * fwprop.c: New file.
+        * Makefile.in: Add fwprop.o.
+        * tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New.
+        * passes.c (init_optimization_passes): Schedule forward propagation.
+        * rtlanal.c (loc_mentioned_in_p): Support NULL value of the second
+        parameter.
+        * timevar.def (TV_FWPROP): New.
+        * common.opt (-fforward-propagate): New.
+        * opts.c (decode_options): Enable forward propagation at -O2.
+        * gcse.c (one_cprop_pass): Do not run local cprop unless touching jumps.
+        * cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr,
+        canon_for_address, table_size): Remove.
+        (new_basic_block, insert, remove_from_table): Remove references to
+        table_size.
+        (fold_rtx): Process SUBREGs and MEMs with equiv_constant, make
+        simplification loop more straightforward by not calling fold_rtx
+        recursively.
+        (equiv_constant): Move here a small part of fold_rtx_subreg,
+        do not call fold_rtx.  Call avoid_constant_pool_reference
+        to process MEMs.
+        * recog.c (canonicalize_change_group): New.
+        * recog.h (canonicalize_change_group): New.
+
+        * doc/invoke.texi (Optimization Options): Document fwprop.
+        * doc/passes.texi (RTL passes): Document fwprop.
+
 2006-11-03  Geoffrey Keating  <geoffk@apple.com>
 
 	* c-decl.c (WANT_C99_INLINE_SEMANTICS): New, set to 1.
@@ -23,7 +52,6 @@
 
 2006-11-03  Paul Brook  <paul@codesourcery.com>
 
-	gcc/
 	* config/arm/arm.c (arm_file_start): New function.
 	(TARGET_ASM_FILE_START): Define.
 	(arm_default_cpu): New variable.


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]