This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 3/5] IPA ICF pass
- From: Martin Liška <mliska at suse dot cz>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Mon, 13 Oct 2014 15:06:13 +0200
- Subject: Re: [PATCH 3/5] IPA ICF pass
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LSU dot 2 dot 11 dot 1407052337210 dot 30120 at tuna dot site> <20140705225351 dot GK16837 at kam dot mff dot cuni dot cz> <53C7E626 dot 8080400 at suse dot cz> <54255A09 dot 1090305 at suse dot cz> <20140926144441 dot GA4266 at x4> <20140926232713 dot GC7334 at kam dot mff dot cuni dot cz> <20140927055921 dot GA299 at x4> <5426940B dot 2060300 at suse dot cz> <20140928022057 dot GB21582 at atrey dot karlin dot mff dot cuni dot cz> <54387267 dot 3030106 at suse dot cz> <20141011081944 dot GD5172 at kam dot mff dot cuni dot cz>
On 10/11/2014 10:19 AM, Jan Hubicka wrote:
After few days of measurement and tuning, I was able to get numbers to the following shape:
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1412 kB ( 0%) ggc
phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc
phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc
phase stream out : 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc
callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc
ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc
ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc
ipa devirtualization : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc
ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc
ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc
ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc
ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc
ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc
ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc
ipa lto decl out : 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc
ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc
ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc
ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall 8220 kB ( 0%) ggc
ipa lto cgraph merge : 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc
whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc
whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc
whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall 4943 kB ( 0%) ggc
ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc
ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc
ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc
ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall 1352 kB ( 0%) ggc
inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
tree SSA rewrite : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc
tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree SSA incremental : 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc
tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc
dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc
varconst : 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
unaccounted todo : 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc
TOTAL : 47.49 3.48 77.46 4276682 kB
and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem
35% means that 35% of all function bodies are compared with something else? That feels pretty high.
but overall numbers are not so terrible.
Currently, the pass is able to merge 32K functions. As you know, we group functions to so called classes.
According to stats, average non-singular class size contains at the end of comparison 7.39 candidates and we
have 5K such functions. Because we load body for each candidate in such groups, it gives us minimum number
of loaded bodies: 37K. As we load 70K function, we have still place to improve. But I guess WPA body-less
comparison is quite efficient.
with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data
structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list
and a flag is used to identify if a set is removed or not.
Interesting, I would not expect bottleneck in a congruence solving :)
The problem was just the hash_set that showed to be slow data structure for a set of operations needed
in congruence solving.
I have no clue who complicated can it be to implement release_body function to an operation that
really releases the memory?
I suppose one can keep the caches from streamer and free trees read. Freeing
gimple statemnts, cfg should be relatively easy.
Lets however first try to tune the implementation rather than try to this hack
implemented. Explicit ggc_free calls traditionally tended to cause some negative
reactions wrt memory fragmentation concerns.
Agree with suggested approach.
Markus' problem with -fprofile-use has been removed, IPA-ICF is preceding devirtualization pass. I hope it is fine?
Yes, I think devirtualization should actually work better with identical
virutal methods merged. We just need to be sure it sees through the newly
introduced aliases (there should be no thunks for virutal methods)
Thanks,
Martin
Honza