[PATCH]: Fix PR tree-optimization/27755, add partial anticipation

Jan Hubicka hubicka@ucw.cz
Wed Nov 15 01:45:00 GMT 2006


> Besides this algorithmic fix, i've also added support for partial
> anticipation.  Partial anticipation is what happens when the value can
> be made live on some paths leading to a block, instead of all paths
> leading to a block.  Supporting this is just an optimality issue, and
> because it requires calculation of another dataflow problem (IE slows
> down PRE by about 20-30%), i've made it only enabled at -O3.  This is
> not really a knob that is worth adding, the amount of partial
> anticipation related redundancies that occurs in real programs is
> small (<10%).  I've yet to find a case where it truly matters, which
> is why it is relegated to -O3 :).  It is more for completeness than
> anything else.

Hi,
this also seems to do 8% of memory consumption on combine.c -O3.
I am not sure about if we want to mix -O3 with compile time and code
growth implications.  We already do have -fexpensive-optimizations
enabled at -O2, perhaps we can have separate knob for thise such as
-freally-expensive-optimizations ;))

Honza

comparing empty function compilation at -O0 level:
    Overall memory needed: 18219k -> 18223k
    Peak memory use before GGC: 2229k
    Peak memory use after GGC: 1936k
    Maximum of released memory in single GGC run: 293k
    Garbage: 421k
    Leak: 2266k
    Overhead: 445k
    GGC runs: 3

comparing empty function compilation at -O0 -g level:
    Overall memory needed: 18235k -> 18239k
    Peak memory use before GGC: 2254k
    Peak memory use after GGC: 1961k
    Maximum of released memory in single GGC run: 293k
    Garbage: 424k
    Leak: 2296k
    Overhead: 448k
    GGC runs: 3

comparing empty function compilation at -O1 level:
    Overall memory needed: 18323k -> 18327k
    Peak memory use before GGC: 2229k
    Peak memory use after GGC: 1936k
    Maximum of released memory in single GGC run: 293k
    Garbage: 427k
    Leak: 2269k
    Overhead: 445k
    GGC runs: 4

comparing empty function compilation at -O2 level:
    Overall memory needed: 18335k -> 18339k
    Peak memory use before GGC: 2229k
    Peak memory use after GGC: 1936k
    Maximum of released memory in single GGC run: 293k
    Garbage: 430k
    Leak: 2269k
    Overhead: 446k
    GGC runs: 4

comparing empty function compilation at -O3 level:
    Overall memory needed: 18335k -> 18339k
    Peak memory use before GGC: 2229k
    Peak memory use after GGC: 1936k
    Maximum of released memory in single GGC run: 293k
    Garbage: 430k
    Leak: 2269k
    Overhead: 446k
    GGC runs: 4

comparing combine.c compilation at -O0 level:
    Overall memory needed: 28403k -> 28407k
    Peak memory use before GGC: 9304k
    Peak memory use after GGC: 8843k
    Maximum of released memory in single GGC run: 2666k
    Garbage: 36845k
    Leak: 6454k
    Overhead: 4862k
    GGC runs: 280

comparing combine.c compilation at -O0 -g level:
    Overall memory needed: 30463k -> 30467k
    Peak memory use before GGC: 10818k
    Peak memory use after GGC: 10448k
    Maximum of released memory in single GGC run: 2420k
    Garbage: 37420k
    Leak: 9175k
    Overhead: 5484k
    GGC runs: 270

comparing combine.c compilation at -O1 level:
    Overall memory needed: 40247k -> 40251k
    Peak memory use before GGC: 17292k
    Peak memory use after GGC: 17117k
    Maximum of released memory in single GGC run: 2332k
    Garbage: 57470k
    Leak: 6508k
    Overhead: 6220k
    GGC runs: 356

comparing combine.c compilation at -O2 level:
    Overall memory needed: 29802k
    Peak memory use before GGC: 17288k
    Peak memory use after GGC: 17117k
    Maximum of released memory in single GGC run: 2868k -> 2869k
    Garbage: 74930k -> 74938k
    Leak: 6614k -> 6614k
    Overhead: 8475k -> 8476k
    GGC runs: 413

comparing combine.c compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 18229k to 18417k, overall 1.03%
  Amount of produced GGC garbage increased from 103713k to 112681k, overall 8.65%
  Amount of memory still referenced at the end of compilation increased from 6676k to 6684k, overall 0.12%
    Overall memory needed: 28902k
    Peak memory use before GGC: 18229k -> 18417k
    Peak memory use after GGC: 17845k -> 17846k
    Maximum of released memory in single GGC run: 4105k -> 4106k
    Garbage: 103713k -> 112681k
    Leak: 6676k -> 6684k
    Overhead: 11824k -> 13027k
    GGC runs: 462 -> 463

    Overall memory needed: 28403k -> 28407k
    Peak memory use before GGC: 9304k
    Peak memory use after GGC: 8843k
    Maximum of released memory in single GGC run: 2666k
    Garbage: 36845k
    Leak: 6454k
    Overhead: 4862k
    GGC runs: 280

comparing combine.c compilation at -O1 level:
    Overall memory needed: 40247k -> 40251k
    Peak memory use before GGC: 17292k
    Peak memory use after GGC: 17117k
    Maximum of released memory in single GGC run: 2332k
    Garbage: 57470k
    Leak: 6508k
    Overhead: 6220k
    GGC runs: 356

comparing combine.c compilation at -O2 level:
    Overall memory needed: 29802k
    Peak memory use before GGC: 17288k
    Peak memory use after GGC: 17117k
    Maximum of released memory in single GGC run: 2868k -> 2869k
    Garbage: 74930k -> 74938k
    Leak: 6614k -> 6614k
    Overhead: 8475k -> 8476k
    GGC runs: 413

comparing combine.c compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 18229k to 18417k, overall 1.03%
  Amount of produced GGC garbage increased from 103713k to 112681k, overall 8.65%
  Amount of memory still referenced at the end of compilation increased from 6676k to 6684k, overall 0.12%
    Overall memory needed: 28902k
    Peak memory use before GGC: 18229k -> 18417k
    Peak memory use after GGC: 17845k -> 17846k
    Maximum of released memory in single GGC run: 4105k -> 4106k
    Garbage: 103713k -> 112681k
    Leak: 6676k -> 6684k
    Overhead: 11824k -> 13027k
    GGC runs: 462 -> 463

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 88242k
    Peak memory use before GGC: 69788k
    Peak memory use after GGC: 44198k
    Maximum of released memory in single GGC run: 36963k
    Garbage: 129062k
    Leak: 9514k
    Overhead: 16996k
    GGC runs: 216

comparing insn-attrtab.c compilation at -O0 -g level:
    Overall memory needed: 89406k
    Peak memory use before GGC: 70910k
    Peak memory use after GGC: 45426k
    Maximum of released memory in single GGC run: 36965k
    Garbage: 130490k
    Leak: 10889k
    Overhead: 17344k
    GGC runs: 212

comparing insn-attrtab.c compilation at -O1 level:
    Overall memory needed: 114174k -> 112882k
    Peak memory use before GGC: 90374k
    Peak memory use after GGC: 83736k
    Maximum of released memory in single GGC run: 31852k
    Garbage: 277771k
    Leak: 9357k
    Overhead: 29782k
    GGC runs: 222

comparing insn-attrtab.c compilation at -O2 level:
    Overall memory needed: 120402k -> 119758k
    Peak memory use before GGC: 92604k
    Peak memory use after GGC: 84716k
    Maximum of released memory in single GGC run: 30394k
    Garbage: 317208k
    Leak: 9359k
    Overhead: 36365k
    GGC runs: 245

comparing insn-attrtab.c compilation at -O3 level:
  Ovarall memory allocated via mmap and sbrk decreased from 134222k to 129418k, overall -3.71%
    Overall memory needed: 134222k -> 129418k
    Peak memory use before GGC: 92629k
    Peak memory use after GGC: 84742k
    Maximum of released memory in single GGC run: 30580k -> 30581k
    Garbage: 317837k -> 318070k
    Leak: 9362k
    Overhead: 36562k -> 36601k
    GGC runs: 249

comparing Gerald's testcase PR8361 compilation at -O0 level:
    Overall memory needed: 119550k
    Peak memory use before GGC: 92691k
    Peak memory use after GGC: 91771k
    Maximum of released memory in single GGC run: 19314k
    Garbage: 205599k
    Leak: 47691k
    Overhead: 20819k
    GGC runs: 401

comparing Gerald's testcase PR8361 compilation at -O0 -g level:
    Overall memory needed: 132422k
    Peak memory use before GGC: 105067k
    Peak memory use after GGC: 104026k
    Maximum of released memory in single GGC run: 19474k
    Garbage: 212185k
    Leak: 70052k
    Overhead: 26134k
    GGC runs: 377

comparing Gerald's testcase PR8361 compilation at -O1 level:
    Overall memory needed: 119302k
    Peak memory use before GGC: 97860k
    Peak memory use after GGC: 95650k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 443668k
    Leak: 50024k
    Overhead: 32734k
    GGC runs: 551

comparing Gerald's testcase PR8361 compilation at -O2 level:
    Overall memory needed: 119294k
    Peak memory use before GGC: 97860k
    Peak memory use after GGC: 95650k
    Maximum of released memory in single GGC run: 18600k
    Garbage: 502917k -> 501735k
    Leak: 50729k -> 50707k
    Overhead: 39974k -> 39818k
    GGC runs: 606 -> 605

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of produced GGC garbage increased from 522136k to 523878k, overall 0.33%
    Overall memory needed: 118926k
    Peak memory use before GGC: 97906k
    Peak memory use after GGC: 96936k
    Maximum of released memory in single GGC run: 18847k
    Garbage: 522136k -> 523878k
    Leak: 50305k -> 50282k
    Overhead: 40502k -> 40826k
    GGC runs: 617 -> 620

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 137958k
    Peak memory use before GGC: 81909k
    Peak memory use after GGC: 58788k
    Maximum of released memory in single GGC run: 45493k
    Garbage: 147243k
    Leak: 7536k
    Overhead: 25302k
    GGC runs: 82

comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
    Overall memory needed: 138130k
    Peak memory use before GGC: 82526k
    Peak memory use after GGC: 59405k
    Maximum of released memory in single GGC run: 45558k
    Garbage: 147414k
    Leak: 9178k
    Overhead: 25734k
    GGC runs: 88

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 424422k
    Peak memory use before GGC: 205229k
    Peak memory use after GGC: 201005k
    Maximum of released memory in single GGC run: 101903k
    Garbage: 271986k
    Leak: 47601k
    Overhead: 31280k
    GGC runs: 101

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 352290k -> 352034k
    Peak memory use before GGC: 206002k
    Peak memory use after GGC: 201778k
    Maximum of released memory in single GGC run: 108808k
    Garbage: 352211k
    Leak: 48184k
    Overhead: 47025k
    GGC runs: 110

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 781306k -> 781466k
    Peak memory use before GGC: 314925k
    Peak memory use after GGC: 293268k
    Maximum of released memory in single GGC run: 165331k
    Garbage: 494373k
    Leak: 65517k
    Overhead: 59915k
    GGC runs: 98

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2006-11-14 02:45:56.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2006-11-14 19:16:34.000000000 +0000
@@ -1,3 +1,41 @@
+2006-11-14  Daniel Berlin  <dberlin@dberlin.org>
+
+	Fix PR tree-optimization/27755
+
+	* tree-ssa-pre.c: Update comments.
+	(bb_bitmap_sets): Add pa_in and  deferred member.
+	(BB_DEFERRED): New macro.
+	(maximal_set): New variable.
+	(pre_stats): Add pa_insert member.
+	(bitmap_set_and): Short circuit orig == dest.
+	(bitmap_set_subtract_values): New function.
+	(bitmap_set_contains_expr): Ditto.
+	(translate_vuses_through_block): Add phiblock argument.
+	(dependent_clean): New function.
+	(compute_antic_aux): Update for maximal_set changes.
+	(compute_partial_antic_aux): New function.
+	(compute_antic): Handle partial anticipation.
+	(do_partial_partial_insertion): New function.
+	(insert_aux): Handle partial anticipation.
+	(add_to_sets): Add to maximal set.
+	(compute_avail): Ditto.
+	(init_pre): Initialize maximal_set.
+	(execute_pre): Do partial anticipation if -O3+.
+
+2006-11-14  Paolo Bonzini  <bonzini@gnu.org>
+
+	PR rtl-optimization/29798
+
+	* fwprop.c (use_killed_between): Check that DEF_INSN dominates
+	TARGET_INSN before any other check.
+	(fwprop_init): Always calculate dominators.
+	(fwprop_done): Always free them.
+
+2006-11-14  Kaveh R. Ghazi  <ghazi@caip.rutgers.edu>
+
+	* fold-const.c (fold_strip_sign_ops): Handle COMPOUND_EXPR and
+	COND_EXPR.
+
 2006-11-13  DJ Delorie  <dj@redhat.com>
 
 	* config/m32c/m32c.c (m32c_prepare_shift): Use a separate


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.



More information about the Gcc-patches mailing list