This is the mail archive of the
mailing list for the GCC project.
Re: Optimize type streaming
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Richard Biener <rguenther at suse dot de>, Dominique Dhumieres <dominiq at lps dot ens dot fr>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 9 Jul 2014 10:58:57 +0200
- Subject: Re: Optimize type streaming
- Authentication-results: sourceware.org; auth=none
- References: <20140629111311 dot A2B96105 at mailhost dot lps dot ens dot fr> <20140629195303 dot GA14692 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1407071002000 dot 5753 at zhemvz dot fhfr dot qr> <20140708083414 dot GA27367 at kam dot mff dot cuni dot cz>
perhaps I could write bit more on my longer term plans. At the moment 30% of firefox WPA is taken
by straming trees and another roughly 30% is taken by inliner. It is bit anoying but relatively
easy to optimize inliner, but trees represent bigger problem.
According to the stats average tree is streamed in 20times and according to perf we spend about 1/4th
by unpacking the sections and then actual read of fields & SCC unification dominates. At low level,
tree streaming is already pretty well optimized.
I started to look into the following:
1) putting types&decls on diet
I started to move individual fields into more fitting locations, getting rid of
one field for many different reasons. I am trying to do this incrementally
and keeping about one field per week flow. Currenlty I am stuck at:
(moving DECL_ARGUMENTS). The plan is to
- get rid of decl_with_vis. I removed all fields except for symbol table pointer
(that will stay) and some flags I plan to handle soon - comdat, weak and visibility.
The last one is harder because C++ FE uses it on type declarations, but it is almost done).
The rest of flags (few variable/function specific items that has nothing
to do with visibility) can go into decl_common where it is enough of space.
- get rid of decl_non_common
Here I need to move arguments and results. Have patches for both.
- I plan to do the same on type side - decompose TYPE_NON_COMMON in favour of explicit type
- experiment with getting rid of RTL pointer
I plan to test moving DECL_RTL into on-side tables (one global for
global RTLs and one local for per-functoin RTLs). This should get us closer moving RTL
into per-function storage again and make RTL easier to reclaim.
- Once done with these I can recast the inheritance to have DATA_TYPE and DATA_DECL
that is common base of types/decls that do have data associated with them. Those can
cary mode, sizes, alias info that is not needed for functions, labels, type declarations
I also wonder if we need canonical types for FUNCTION_TYPE/METHOD_TYPE and other thing
that is not associated with readable data.
This has bit of multiple inheritance issues (that I do not want to introduce),
since we have decls with symbol table and decls with data. I think simple union
for that single symtab pointer will do. In fact I already tested restricting
DECL_SIZE&friends to decls with data, but there is a lot of frontend updating to do,
as these fields are overriden for many of the FE declarations. (it is reason why I
added FE machinery to allow custom memory storage for newly added ecls in the patch above)
Naturally this is good from maintenance point of view, it has potential to reduce memory
footprint, streaming size, improve mergeability of trees (if definition and external declarations
looks the same in tree decls, we will merge more type variants, because currently we keep class types
in two copies, one for unit definig them and other for units using them) and also avoid
stremaing of stale pointers, but it is a slow progress and the direct benefits are limited.
2) put BINFOs on diet
BINFOs are currently added to every class type. We can drop them in case they do
not hold useful information for devirtualization neither debug info. This is now
quite well defined. Main offender is ipa-prop that still uses get_binfo_at_offset
and walks binfos it should not. I am working on it.
3) ODR type merging
I have patches for this, but want to go bit curefuly - I need to discuss with Jason
the anonymous types and get code for checking ODR violations working well.
Basically for ODR types I can merge variant lists that results in leaner debug info
and bit less of streaming WPA->ltrans
It is also important for type propagation and I have prototype to handle canonical types
of ODR and anonymous types specially.
This actually increases LTO stream sizes (uncompressed) by about 6% to stream explicit
mangled names. My 4.10 with the patch is still faster than 4.9 but definitely would be
happier if there was easier way around
4) Reduce size of LTO streams
This is what I was shooting for with the variant streaming (in addition to have sanity checker
for 3 as bugs in these may turn types into a crazy soup quite easily).
Types and decls are most common things to stream, 50% of types are variants, so not streaming
duplicated data in variants has chance to save about 30-40% of type storage.
Decls inherits some stuff from types (99% of time), like DECL_SIZE and friends.
In my tests I went from compression ration over 3 to 2.1 keeping about the same gzipped
data - so this speeds up unpacking & rebuilding trees, since direct copies are faster than
LTO streamer table lookups.
5) Avoid merging of unmergeable things
This is the patch that drops hashtable to 1 for things where we know we do not want to merge.
This is needed for correctness of ODR types and it also improves compression ration of the
streams as SCC hashes are hard to gzip.
6) Put variable initializers into named sections (as function bodies)
This is supposed to help vtables, but I am always too lazy to dive into details of our
ugly low level section API.
7) Improve streaming of locations, as discussed several times. Again I am bit discouraged
but need to make extra section etc. Location lookup still shows high in the profile.
So some of my immediate plans.