This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/5] IPA ICF pass


On 10/11/2014 10:19 AM, Jan Hubicka wrote:

After few days of measurement and tuning, I was able to get numbers to the following shape:
Execution times (seconds)
  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1412 kB ( 0%) ggc
  phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) wall 1028813 kB (24%) ggc
  phase stream in         :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) wall 3246453 kB (76%) ggc
  phase stream out        :   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) wall       2 kB ( 0%) ggc
  callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall      40 kB ( 0%) ggc
  ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) wall       0 kB ( 0%) ggc
  ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) wall      21 kB ( 0%) ggc
  ipa devirtualization    :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall   13704 kB ( 0%) ggc
  ipa cp                  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) wall  188558 kB ( 4%) ggc
  ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) wall  494738 kB (12%) ggc
  ipa comdats             :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
  ipa lto gimple in       :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) wall  537970 kB (13%) ggc
  ipa lto gimple out      :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) wall       2 kB ( 0%) ggc
  ipa lto decl in         :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) wall 2441687 kB (57%) ggc
  ipa lto decl out        :   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) wall       0 kB ( 0%) ggc
  ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall   15692 kB ( 0%) ggc
  ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
  ipa lto cgraph I/O      :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) wall  407182 kB (10%) ggc
  ipa lto decl merge      :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) wall    8220 kB ( 0%) ggc
  ipa lto cgraph merge    :   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) wall   14605 kB ( 0%) ggc
  whopr wpa               :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) wall       1 kB ( 0%) ggc
  whopr wpa I/O           :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) wall       0 kB ( 0%) ggc
  whopr partitioning      :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) wall    4943 kB ( 0%) ggc
  ipa reference           :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) wall       0 kB ( 0%) ggc
  ipa profile             :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
  ipa pure const          :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) wall       0 kB ( 0%) ggc
  ipa icf                 :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) wall    1352 kB ( 0%) ggc
  inline parameters       :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
  tree SSA rewrite        :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall   18919 kB ( 0%) ggc
  tree SSA other          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
  tree SSA incremental    :   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) wall   11325 kB ( 0%) ggc
  tree operand scan       :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) wall  116283 kB ( 3%) ggc
  dominance frontiers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
  dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc
  varconst                :   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
  loop fini               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
  unaccounted todo        :   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) wall       0 kB ( 0%) ggc
  TOTAL                 :  47.49             3.48            77.46            4276682 kB

and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem

35% means that 35% of all function bodies are compared with something else? That feels pretty high.
but overall numbers are not so terrible.

Currently, the pass is able to merge 32K functions. As you know, we group functions to so called classes.
According to stats, average non-singular class size contains at the end of comparison 7.39 candidates and we
have 5K such functions. Because we load body for each candidate in such groups, it gives us minimum number
of loaded bodies: 37K. As we load 70K function, we have still place to improve. But I guess WPA body-less
comparison is quite efficient.


with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data
structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list
and a flag is used to identify if a set is removed or not.

Interesting, I would not expect bottleneck in a congruence solving :)

The problem was just the hash_set that showed to be slow data structure for a set of operations needed
in congruence solving.


I have no clue who complicated can it be to implement release_body function to an operation that
really releases the memory?

I suppose one can keep the caches from streamer and free trees read.  Freeing
gimple statemnts, cfg should be relatively easy.

Lets however first try to tune the implementation rather than try to this hack
implemented. Explicit ggc_free calls traditionally tended to cause some negative
reactions wrt memory fragmentation concerns.

Agree with suggested approach.



Markus' problem with -fprofile-use has been removed, IPA-ICF is preceding devirtualization pass. I hope it is fine?

Yes, I think devirtualization should actually work better with identical
virutal methods merged.  We just need to be sure it sees through the newly
introduced aliases (there should be no thunks for virutal methods)

Thanks,
Martin


Honza



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]