This is the mail archive of the
mailing list for the GCC project.
Re: ipa-devirt.c TLC
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org, dodji at seketeli dot org
- Date: Thu, 22 Aug 2013 16:45:11 +0200
- Subject: Re: ipa-devirt.c TLC
- References: <20130821230831 dot GD16124 at kam dot mff dot cuni dot cz> <6178967d-e63d-4aba-b36c-d78e0c162523 at email dot android dot com>
> Btw, how does this all scale with the worst testcase? That is, a very deep
> inheritance hierarchy with each level virtually overriding all methods and
> very many duplicate input files? Is that really the worst case? We should
> make sure not to introduce quadratic or worse algorithms. (I didn't see you
> limiting the number of possible call targets)
Very many duplicate input files are not an issue, since I identify the ODR
types and handle them only by leaders.
To get things working well in practice (not in worst case) I use the query
cache. The purpose of query cache is to re-use the target lists when same
virutal call appears in program many times. The cache tokens passed to users
allows them to not walk the same list many times when the work depends only on
callees (as it does for dead code removal).
So to trigger the qudaratic case you need to introduce something like 1000
classes inheriting each other and each overwriting the method and then call
with type of each of them (so you have 1000 different calls). Then the
call target query cache will have 1+2+..1000 entries.
Worse yet, make the first baseclass to have 1000 methods and you can trigger
linear walks in the gimple_get_virt_method_for_binfo linear slowness. Of course
here you will generate quadratically sized vtables into the binary.
I made simple script constructing this and we died in FE (binfos are quadratic
already and we have non-linear walks on them). So I declared it premature
optimization and decided to get things correct first.
The simple way around is to start returning empty list once call target cache
is, say 100*number of virtual methods in unit.
The mainline version with patch enabling it on LTO (I wait for approval of the
ODR patch that makes use of VTABLE manglings) I get the overall overhead of
call target analysis and walks in unreachable code removal on Firefox under 2%.
Vast majority goes to constructor folding that is called from
gimple_get_virt_method_for_binfo. Sensible speedup would be to reconstruct
vectors of virtual methods from vtables instead of walking the vtables
everywhere, but for that I would like to change all the binfo walks we have in
middle-end, so I would like to cleanup and fix existing code a bit first.
Finally the cache can be made smaller (linear to vtable size in case of single)
inheritance by simply making it to return partial vectors and have list of
outer type to contain the list of inner type. It means that I will need to
invent my own type for return value of possible_polymorphic_call_targets and
iterators on that + some smarter way to tell passes calling me when they
do redundant work.
If you preffer to have the limit on cache size now, I can push out a patch.