[PATCH]: Fix PR tree-optimization/27755, add partial anticipation
Jan Hubicka
hubicka@ucw.cz
Wed Nov 15 01:45:00 GMT 2006
> Besides this algorithmic fix, i've also added support for partial
> anticipation. Partial anticipation is what happens when the value can
> be made live on some paths leading to a block, instead of all paths
> leading to a block. Supporting this is just an optimality issue, and
> because it requires calculation of another dataflow problem (IE slows
> down PRE by about 20-30%), i've made it only enabled at -O3. This is
> not really a knob that is worth adding, the amount of partial
> anticipation related redundancies that occurs in real programs is
> small (<10%). I've yet to find a case where it truly matters, which
> is why it is relegated to -O3 :). It is more for completeness than
> anything else.
Hi,
this also seems to do 8% of memory consumption on combine.c -O3.
I am not sure about if we want to mix -O3 with compile time and code
growth implications. We already do have -fexpensive-optimizations
enabled at -O2, perhaps we can have separate knob for thise such as
-freally-expensive-optimizations ;))
Honza
comparing empty function compilation at -O0 level:
Overall memory needed: 18219k -> 18223k
Peak memory use before GGC: 2229k
Peak memory use after GGC: 1936k
Maximum of released memory in single GGC run: 293k
Garbage: 421k
Leak: 2266k
Overhead: 445k
GGC runs: 3
comparing empty function compilation at -O0 -g level:
Overall memory needed: 18235k -> 18239k
Peak memory use before GGC: 2254k
Peak memory use after GGC: 1961k
Maximum of released memory in single GGC run: 293k
Garbage: 424k
Leak: 2296k
Overhead: 448k
GGC runs: 3
comparing empty function compilation at -O1 level:
Overall memory needed: 18323k -> 18327k
Peak memory use before GGC: 2229k
Peak memory use after GGC: 1936k
Maximum of released memory in single GGC run: 293k
Garbage: 427k
Leak: 2269k
Overhead: 445k
GGC runs: 4
comparing empty function compilation at -O2 level:
Overall memory needed: 18335k -> 18339k
Peak memory use before GGC: 2229k
Peak memory use after GGC: 1936k
Maximum of released memory in single GGC run: 293k
Garbage: 430k
Leak: 2269k
Overhead: 446k
GGC runs: 4
comparing empty function compilation at -O3 level:
Overall memory needed: 18335k -> 18339k
Peak memory use before GGC: 2229k
Peak memory use after GGC: 1936k
Maximum of released memory in single GGC run: 293k
Garbage: 430k
Leak: 2269k
Overhead: 446k
GGC runs: 4
comparing combine.c compilation at -O0 level:
Overall memory needed: 28403k -> 28407k
Peak memory use before GGC: 9304k
Peak memory use after GGC: 8843k
Maximum of released memory in single GGC run: 2666k
Garbage: 36845k
Leak: 6454k
Overhead: 4862k
GGC runs: 280
comparing combine.c compilation at -O0 -g level:
Overall memory needed: 30463k -> 30467k
Peak memory use before GGC: 10818k
Peak memory use after GGC: 10448k
Maximum of released memory in single GGC run: 2420k
Garbage: 37420k
Leak: 9175k
Overhead: 5484k
GGC runs: 270
comparing combine.c compilation at -O1 level:
Overall memory needed: 40247k -> 40251k
Peak memory use before GGC: 17292k
Peak memory use after GGC: 17117k
Maximum of released memory in single GGC run: 2332k
Garbage: 57470k
Leak: 6508k
Overhead: 6220k
GGC runs: 356
comparing combine.c compilation at -O2 level:
Overall memory needed: 29802k
Peak memory use before GGC: 17288k
Peak memory use after GGC: 17117k
Maximum of released memory in single GGC run: 2868k -> 2869k
Garbage: 74930k -> 74938k
Leak: 6614k -> 6614k
Overhead: 8475k -> 8476k
GGC runs: 413
comparing combine.c compilation at -O3 level:
Peak amount of GGC memory allocated before garbage collecting increased from 18229k to 18417k, overall 1.03%
Amount of produced GGC garbage increased from 103713k to 112681k, overall 8.65%
Amount of memory still referenced at the end of compilation increased from 6676k to 6684k, overall 0.12%
Overall memory needed: 28902k
Peak memory use before GGC: 18229k -> 18417k
Peak memory use after GGC: 17845k -> 17846k
Maximum of released memory in single GGC run: 4105k -> 4106k
Garbage: 103713k -> 112681k
Leak: 6676k -> 6684k
Overhead: 11824k -> 13027k
GGC runs: 462 -> 463
Overall memory needed: 28403k -> 28407k
Peak memory use before GGC: 9304k
Peak memory use after GGC: 8843k
Maximum of released memory in single GGC run: 2666k
Garbage: 36845k
Leak: 6454k
Overhead: 4862k
GGC runs: 280
comparing combine.c compilation at -O1 level:
Overall memory needed: 40247k -> 40251k
Peak memory use before GGC: 17292k
Peak memory use after GGC: 17117k
Maximum of released memory in single GGC run: 2332k
Garbage: 57470k
Leak: 6508k
Overhead: 6220k
GGC runs: 356
comparing combine.c compilation at -O2 level:
Overall memory needed: 29802k
Peak memory use before GGC: 17288k
Peak memory use after GGC: 17117k
Maximum of released memory in single GGC run: 2868k -> 2869k
Garbage: 74930k -> 74938k
Leak: 6614k -> 6614k
Overhead: 8475k -> 8476k
GGC runs: 413
comparing combine.c compilation at -O3 level:
Peak amount of GGC memory allocated before garbage collecting increased from 18229k to 18417k, overall 1.03%
Amount of produced GGC garbage increased from 103713k to 112681k, overall 8.65%
Amount of memory still referenced at the end of compilation increased from 6676k to 6684k, overall 0.12%
Overall memory needed: 28902k
Peak memory use before GGC: 18229k -> 18417k
Peak memory use after GGC: 17845k -> 17846k
Maximum of released memory in single GGC run: 4105k -> 4106k
Garbage: 103713k -> 112681k
Leak: 6676k -> 6684k
Overhead: 11824k -> 13027k
GGC runs: 462 -> 463
comparing insn-attrtab.c compilation at -O0 level:
Overall memory needed: 88242k
Peak memory use before GGC: 69788k
Peak memory use after GGC: 44198k
Maximum of released memory in single GGC run: 36963k
Garbage: 129062k
Leak: 9514k
Overhead: 16996k
GGC runs: 216
comparing insn-attrtab.c compilation at -O0 -g level:
Overall memory needed: 89406k
Peak memory use before GGC: 70910k
Peak memory use after GGC: 45426k
Maximum of released memory in single GGC run: 36965k
Garbage: 130490k
Leak: 10889k
Overhead: 17344k
GGC runs: 212
comparing insn-attrtab.c compilation at -O1 level:
Overall memory needed: 114174k -> 112882k
Peak memory use before GGC: 90374k
Peak memory use after GGC: 83736k
Maximum of released memory in single GGC run: 31852k
Garbage: 277771k
Leak: 9357k
Overhead: 29782k
GGC runs: 222
comparing insn-attrtab.c compilation at -O2 level:
Overall memory needed: 120402k -> 119758k
Peak memory use before GGC: 92604k
Peak memory use after GGC: 84716k
Maximum of released memory in single GGC run: 30394k
Garbage: 317208k
Leak: 9359k
Overhead: 36365k
GGC runs: 245
comparing insn-attrtab.c compilation at -O3 level:
Ovarall memory allocated via mmap and sbrk decreased from 134222k to 129418k, overall -3.71%
Overall memory needed: 134222k -> 129418k
Peak memory use before GGC: 92629k
Peak memory use after GGC: 84742k
Maximum of released memory in single GGC run: 30580k -> 30581k
Garbage: 317837k -> 318070k
Leak: 9362k
Overhead: 36562k -> 36601k
GGC runs: 249
comparing Gerald's testcase PR8361 compilation at -O0 level:
Overall memory needed: 119550k
Peak memory use before GGC: 92691k
Peak memory use after GGC: 91771k
Maximum of released memory in single GGC run: 19314k
Garbage: 205599k
Leak: 47691k
Overhead: 20819k
GGC runs: 401
comparing Gerald's testcase PR8361 compilation at -O0 -g level:
Overall memory needed: 132422k
Peak memory use before GGC: 105067k
Peak memory use after GGC: 104026k
Maximum of released memory in single GGC run: 19474k
Garbage: 212185k
Leak: 70052k
Overhead: 26134k
GGC runs: 377
comparing Gerald's testcase PR8361 compilation at -O1 level:
Overall memory needed: 119302k
Peak memory use before GGC: 97860k
Peak memory use after GGC: 95650k
Maximum of released memory in single GGC run: 18600k
Garbage: 443668k
Leak: 50024k
Overhead: 32734k
GGC runs: 551
comparing Gerald's testcase PR8361 compilation at -O2 level:
Overall memory needed: 119294k
Peak memory use before GGC: 97860k
Peak memory use after GGC: 95650k
Maximum of released memory in single GGC run: 18600k
Garbage: 502917k -> 501735k
Leak: 50729k -> 50707k
Overhead: 39974k -> 39818k
GGC runs: 606 -> 605
comparing Gerald's testcase PR8361 compilation at -O3 level:
Amount of produced GGC garbage increased from 522136k to 523878k, overall 0.33%
Overall memory needed: 118926k
Peak memory use before GGC: 97906k
Peak memory use after GGC: 96936k
Maximum of released memory in single GGC run: 18847k
Garbage: 522136k -> 523878k
Leak: 50305k -> 50282k
Overhead: 40502k -> 40826k
GGC runs: 617 -> 620
comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
Overall memory needed: 137958k
Peak memory use before GGC: 81909k
Peak memory use after GGC: 58788k
Maximum of released memory in single GGC run: 45493k
Garbage: 147243k
Leak: 7536k
Overhead: 25302k
GGC runs: 82
comparing PR rtl-optimization/28071 testcase compilation at -O0 -g level:
Overall memory needed: 138130k
Peak memory use before GGC: 82526k
Peak memory use after GGC: 59405k
Maximum of released memory in single GGC run: 45558k
Garbage: 147414k
Leak: 9178k
Overhead: 25734k
GGC runs: 88
comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
Overall memory needed: 424422k
Peak memory use before GGC: 205229k
Peak memory use after GGC: 201005k
Maximum of released memory in single GGC run: 101903k
Garbage: 271986k
Leak: 47601k
Overhead: 31280k
GGC runs: 101
comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
Overall memory needed: 352290k -> 352034k
Peak memory use before GGC: 206002k
Peak memory use after GGC: 201778k
Maximum of released memory in single GGC run: 108808k
Garbage: 352211k
Leak: 48184k
Overhead: 47025k
GGC runs: 110
comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
Overall memory needed: 781306k -> 781466k
Peak memory use before GGC: 314925k
Peak memory use after GGC: 293268k
Maximum of released memory in single GGC run: 165331k
Garbage: 494373k
Leak: 65517k
Overhead: 59915k
GGC runs: 98
Head of the ChangeLog is:
--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog 2006-11-14 02:45:56.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog 2006-11-14 19:16:34.000000000 +0000
@@ -1,3 +1,41 @@
+2006-11-14 Daniel Berlin <dberlin@dberlin.org>
+
+ Fix PR tree-optimization/27755
+
+ * tree-ssa-pre.c: Update comments.
+ (bb_bitmap_sets): Add pa_in and deferred member.
+ (BB_DEFERRED): New macro.
+ (maximal_set): New variable.
+ (pre_stats): Add pa_insert member.
+ (bitmap_set_and): Short circuit orig == dest.
+ (bitmap_set_subtract_values): New function.
+ (bitmap_set_contains_expr): Ditto.
+ (translate_vuses_through_block): Add phiblock argument.
+ (dependent_clean): New function.
+ (compute_antic_aux): Update for maximal_set changes.
+ (compute_partial_antic_aux): New function.
+ (compute_antic): Handle partial anticipation.
+ (do_partial_partial_insertion): New function.
+ (insert_aux): Handle partial anticipation.
+ (add_to_sets): Add to maximal set.
+ (compute_avail): Ditto.
+ (init_pre): Initialize maximal_set.
+ (execute_pre): Do partial anticipation if -O3+.
+
+2006-11-14 Paolo Bonzini <bonzini@gnu.org>
+
+ PR rtl-optimization/29798
+
+ * fwprop.c (use_killed_between): Check that DEF_INSN dominates
+ TARGET_INSN before any other check.
+ (fwprop_init): Always calculate dominators.
+ (fwprop_done): Always free them.
+
+2006-11-14 Kaveh R. Ghazi <ghazi@caip.rutgers.edu>
+
+ * fold-const.c (fold_strip_sign_ops): Handle COMPOUND_EXPR and
+ COND_EXPR.
+
2006-11-13 DJ Delorie <dj@redhat.com>
* config/m32c/m32c.c (m32c_prepare_shift): Use a separate
The results can be reproduced by building a compiler with
--enable-gather-detailed-mem-stats targetting x86-64
and compiling preprocessed combine.c or testcase from PR8632 with:
-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q
The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in. Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.
Your testing script.
More information about the Gcc-patches
mailing list