This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimize type streaming


Hello,
perhaps I could write bit more on my longer term plans.  At the moment 30% of firefox WPA is taken
by straming trees and another roughly 30% is taken by inliner.  It is bit anoying but relatively
easy to optimize inliner, but trees represent bigger problem.

According to the stats average tree is streamed in 20times and according to perf we spend about 1/4th
by unpacking the sections and then actual read of fields & SCC unification dominates.  At low level,
tree streaming is already pretty well optimized.

I started to look into the following:

1) putting types&decls on diet
 
   I started to move individual fields into more fitting locations, getting rid of
   one field for many different reasons.  I am trying to do this incrementally
   and keeping about one field per week flow. Currenlty I am stuck at:

   https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01969.html

   (moving DECL_ARGUMENTS). The plan is to 
   - get rid of decl_with_vis.  I removed all fields except for symbol table pointer
     (that will stay) and some flags I plan to handle soon - comdat, weak and visibility.
     The last one is harder because C++ FE uses it on type declarations, but it is almost done).
     The rest of flags (few variable/function specific items that has nothing
     to do with visibility) can go into decl_common where it is enough of space.
   - get rid of decl_non_common
     Here I need to move arguments and results. Have patches for both.
   - I plan to do the same on type side - decompose TYPE_NON_COMMON in favour of explicit type
     hiearchy.
   - experiment with getting rid of RTL pointer 

     I plan to test moving DECL_RTL into on-side tables (one global for
     global RTLs and one local for per-functoin RTLs). This should get us closer moving RTL
     into per-function storage again and make RTL easier to reclaim.
   - Once done with these I can recast the inheritance to have DATA_TYPE and DATA_DECL
     that is common base of types/decls that do have data associated with them.  Those can
     cary mode, sizes, alias info that is not needed for functions, labels, type declarations
     etc.

     I also wonder if we need canonical types for FUNCTION_TYPE/METHOD_TYPE and other thing
     that is not associated with readable data.

     This has bit of multiple inheritance issues (that I do not want to introduce),
     since we have decls with symbol table and decls with data.  I think simple union
     for that single symtab pointer will do.  In fact I already tested restricting
     DECL_SIZE&friends to decls with data, but there is a lot of frontend updating to do,
     as these fields are overriden for many of the FE declarations.  (it is reason why I
     added FE machinery to allow custom memory storage for newly added ecls in the patch above)

   Naturally this is good from maintenance point of view, it has potential to reduce memory
   footprint, streaming size, improve mergeability of trees (if definition and external declarations
   looks the same in tree decls, we will merge more type variants, because currently we keep class types
   in two copies, one for unit definig them and other for units using them) and also avoid
   stremaing of stale pointers, but it is a slow progress and the direct benefits are limited.

2) put BINFOs on diet

   BINFOs are currently added to every class type.  We can drop them in case they do
   not hold useful information for devirtualization neither debug info.  This is now
   quite well defined.  Main offender is ipa-prop that still uses get_binfo_at_offset
   and walks binfos it should not.  I am working on it.

3) ODR type merging

   I have patches for this, but want to go bit curefuly - I need to discuss with Jason
   the anonymous types and get code for checking ODR violations working well.

   Basically for ODR types I can merge variant lists that results in leaner debug info
   and bit less of streaming WPA->ltrans
   It is also important for type propagation and I have prototype to handle canonical types
   of ODR and anonymous types specially.

   This actually increases LTO stream sizes (uncompressed) by about 6% to stream explicit
   mangled names.  My 4.10 with the patch is still faster than 4.9 but definitely would be
   happier if there was easier way around

4) Reduce size of LTO streams

   This is what I was shooting for with the variant streaming (in addition to have sanity checker
   for 3 as bugs in these may turn types into a crazy soup quite easily).
   Types and decls are most common things to stream, 50% of types are variants, so not streaming
   duplicated data in variants has chance to save about 30-40% of type storage.
   Decls inherits some stuff from types (99% of time), like DECL_SIZE and friends.

   In my tests I went from compression ration over 3 to 2.1 keeping about the same gzipped
   data - so this speeds up unpacking & rebuilding trees, since direct copies are faster than
   LTO streamer table lookups.

5) Avoid merging of unmergeable things

   This is the patch that drops hashtable to 1 for things where we know we do not want to merge.
   This is needed for correctness of ODR types and it also improves compression ration of the
   streams as SCC hashes are hard to gzip.

6) Put variable initializers into named sections (as function bodies)

   This is supposed to help vtables, but I am always too lazy to dive into details of our
   ugly low level section API.

7) Improve streaming of locations, as discussed several times.  Again I am bit discouraged
   but need to make extra section etc.  Location lookup still shows high in the profile.
   
So some of my immediate plans.
Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]