This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Teaching SCC merging about unit local trees


On Thu, Jun 12, 2014 at 12:29 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jun 12, 2014 at 10:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Richard,
>> as briefly discussed before, I would like to teach LTO type merging to not merge
>> types that was declared in anonymous namespaces and use C++ ODR type names
>> (stored in DECL_ASSEMBLER_NAME of the TYPE_DECL) to break down canonical types
>> by their names.
>>
>> First thing I need to arrange IMO is to not merge two anonymous types from
>> two different units.  While looking into it I noticed that the current code
>> in unify_scc that refuses to merge local decls produces conflicts and seems
>> useless excercise to do.
>>
>> This patch introduces special hash code 1 that specify that given SCC is known
>> to be local and should bypass the merging logic. This is propagated down and
>> seems to quite noticeably reduce size of SCC hash:
>>
>> [WPA] read 10190717 SCCs of average size 1.980409
>> [WPA] 20181785 tree bodies read in total
>> [WPA] tree SCC table: size 4194301, 1882700 elements, collision ratio: 0.815497
>> [WPA] tree SCC max chain length 140 (size 1)
>> [WPA] Compared 3392363 SCCs, 2718822 collisions (0.801454)
>> [WPA] Merged 3314075 SCCs
>> [WPA] Merged 9693632 tree bodies
>> [WPA] Merged 2467704 types
>> [WPA] 1783262 types prevailed (4491218 associated trees)
>> [WPA] GIMPLE canonical type table: size 131071, 94867 elements, 1783347 searches, 737056 collisions (ratio: 0.413299)
>> [WPA] GIMPLE canonical type pointer-map: 94867 elements, 3973875 searches
>> [WPA] Compression: 282828785 input bytes, 831186147 uncompressed bytes (ratio: 2.938832)
>> [WPA] Size of mmap'd section decls: 282828785 bytes
>>
>> to:
>>
>> [WPA] read 10172291 SCCs of average size 1.982162
>> [WPA] 20163124 tree bodies read in total
>> [WPA] tree SCC table: size 2097143, 988764 elements, collision ratio: 0.684967
>> [WPA] tree SCC max chain length 140 (size 1)
>> [WPA] Compared 3060932 SCCs, 2405009 collisions (0.785711)
>> [WPA] Merged 3040565 SCCs
>> [WPA] Merged 9246482 tree bodies
>> [WPA] Merged 2382312 types
>> [WPA] 1868611 types prevailed (4728465 associated trees)
>> [WPA] GIMPLE canonical type table: size 131071, 94910 elements, 1868696 searches, 790939 collisions (ratio: 0.423257)
>> [WPA] GIMPLE canonical type pointer-map: 94910 elements, 4216423 searches
>> [WPA] Compression: 273322455 input bytes, 824178095 uncompressed bytes (ratio: 3.015406)
>>
>> We merge less, but not by much and I think we was not right not merge in that cases.
>
> If we merge things we may not merge then the fix is to compare_tree_sccs_1,
> not introducing special cases like you propose.
>
> That is, if we are not allowed to merge anonymous namespaces then
> make sure we don't.  We already should not merge types with
> TYPE_CONTEXT == such namespace by means of
>
>       /* ???  Global types from different TUs have non-matching
>          TRANSLATION_UNIT_DECLs.  Still merge them if they are otherwise
>          equal.  */
>       if (TYPE_FILE_SCOPE_P (t1) && TYPE_FILE_SCOPE_P (t2))
>         ;
>       else
>         compare_tree_edges (TYPE_CONTEXT (t1), TYPE_CONTEXT (t2));
>
> but we possibly merge a subset of decl kinds from "different" namespaces :
>
>       /* ???  Global decls from different TUs have non-matching
>          TRANSLATION_UNIT_DECLs.  Only consider a small set of
>          decls equivalent, we should not end up merging others.  */
>       if ((code == TYPE_DECL
>            || code == NAMESPACE_DECL
>            || code == IMPORTED_DECL
>            || code == CONST_DECL
>            || (VAR_OR_FUNCTION_DECL_P (t1)
>                && (TREE_PUBLIC (t1) || DECL_EXTERNAL (t1))))
>           && DECL_FILE_SCOPE_P (t1) && DECL_FILE_SCOPE_P (t2))
>         ;
>       else
>         compare_tree_edges (DECL_CONTEXT (t1), DECL_CONTEXT (t2));
>
> Not sure what we end up doing for NAMESPACE_DECL itself (and what
> fields we stream for it).  It would be interesting to check that.
>
> Thus, make sure we don't merge namespace {} and namespace {} from
> two different units.
>
> But effectively you say we have two classes of "global" trees, first
> those that are mergeable across TUs and second those that are not.
> This IMHO means we want to separate those to two different LTO
> sections and simply skip all the merging code for the second (instead
> of adding hacks to the merging code).

As that also restricts the "pointers" we can have.  Mergeable stuff
may not refer to non-mergeable stuff.  Breaks down for initializers:

static int x;
int *p = &x;

though you could say that as p is initialized (thus not DECL_COMMON)
this instance cannot be merged with anything else - other entities
are 'extern int *p' (tree merging is different from symtab merging).

Thus int *p = &x; is also non-mergeable (everything that has tree
pointers refer to sth not mergeable is not mergeable).

We have similar "issues" with tree_is_indexable and pointers violating
constraints (like those variadic return types).

Similar as I had the idea to compute "indexability" during the SCC walk
we can compute "mergability" during it as well.

Richard.

> Richard.
>
>>
>> Would something like this make sense? (I am not saying my definition of unit_local_tree_p
>> is most polished one :)
>>
>> I think next step could be to make anonymous types to bypass the canonical type
>> merging (i.e. simply save the chains as they comde from frontends forthose) and
>> then look into computing the type names in free lang data, using odr name hash instaed
>> of canonical type hash for those named types + link them to canonical type hash
>> entries and if we end up with unnamed type in canonical type hash, then make its
>> alias class to conflict with all the named types.
>>
>> Honza
>>
>> Index: lto-streamer-out.c
>> ===================================================================
>> --- lto-streamer-out.c  (revision 211488)
>> +++ lto-streamer-out.c  (working copy)
>> @@ -54,6 +54,47 @@ along with GCC; see the file COPYING3.
>>  #include "cfgloop.h"
>>  #include "builtins.h"
>>
>> +/* Return if T can never be shared across units.  */
>> +static bool
>> +unit_local_tree_p (tree t)
>> +{
>> +  switch (TREE_CODE (t))
>> +    {
>> +      case VAR_DECL:
>> +       /* Automatic variables are always unit local.  */
>> +       if (!TREE_STATIC (t) && !DECL_EXTERNAL (t)
>> +           && !DECL_HARD_REGISTER (t))
>> +         return true;
>> +       /* ... fall through ... */
>> +
>> +      case FUNCTION_DECL:
>> +       /* Non-public declarations are alwyas local. */
>> +       if (!TREE_PUBLIC (t))
>> +         return true;
>> +
>> +       /* Public definitions that would cause linker error if
>> +          appeared in other unit.  */
>> +       if (TREE_PUBLIC (t)
>> +           && !DECL_EXTERNAL (t)
>> +           && !DECL_WEAK (t))
>> +         return true;
>> +       return false;
>> +      case NAMESPACE_DECL:
>> +       return !TREE_PUBLIC (t);
>> +      case TRANSLATION_UNIT_DECL:
>> +       return true;
>> +      case PARM_DECL:
>> +      case RESULT_DECL:
>> +      case LABEL_DECL:
>> +      case SSA_NAME:
>> +       return true;
>> +      default:
>> +       if (TYPE_P (t)
>> +           && type_in_anonymous_namespace_p (t))
>> +         return true;
>> +       return false;
>> +    }
>> +}
>>
>>  static void lto_write_tree (struct output_block*, tree, bool);
>>
>> @@ -686,7 +727,9 @@ DFS_write_tree_body (struct output_block
>>  #undef DFS_follow_tree_edge
>>  }
>>
>> -/* Return a hash value for the tree T.  */
>> +/* Return a hash value for the tree T.
>> +   If T is local to unit or refers anything local to unit, return 1.
>> +   Otherwise return non-1.  */
>>
>>  static hashval_t
>>  hash_tree (struct streamer_tree_cache_d *cache, tree t)
>> @@ -694,10 +737,19 @@ hash_tree (struct streamer_tree_cache_d
>>  #define visit(SIBLING) \
>>    do { \
>>      unsigned ix; \
>> +    hashval_t h; \
>>      if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
>> -      v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \
>> +      { \
>> +        h = streamer_tree_cache_get_hash (cache, ix); \
>> +        if (h == STREAMER_TREE_CACHE_HASH_LOCAL) \
>> +         return STREAMER_TREE_CACHE_HASH_LOCAL; \
>> +        v = iterative_hash_hashval_t (h, v); \
>> +      } \
>>    } while (0)
>>
>> +  if (unit_local_tree_p (t))
>> +    return STREAMER_TREE_CACHE_HASH_LOCAL;
>> +
>>    /* Hash TS_BASE.  */
>>    enum tree_code code = TREE_CODE (t);
>>    hashval_t v = iterative_hash_host_wide_int (code, 0);
>> @@ -911,7 +963,11 @@ hash_tree (struct streamer_tree_cache_d
>>           hashval_t x;
>>           unsigned ix;
>>           if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), &ix))
>> -           x = streamer_tree_cache_get_hash (cache, ix);
>> +           {
>> +             x = streamer_tree_cache_get_hash (cache, ix);
>> +             if (x == STREAMER_TREE_CACHE_HASH_LOCAL)
>> +               return STREAMER_TREE_CACHE_HASH_LOCAL;
>> +           }
>>           else
>>             x = hash_tree (cache, TREE_TYPE (t));
>>           v = iterative_hash_hashval_t (x, v);
>> @@ -1101,7 +1157,7 @@ hash_tree (struct streamer_tree_cache_d
>>        visit (OMP_CLAUSE_CHAIN (t));
>>      }
>>
>> -  return v;
>> +  return v + (v == STREAMER_TREE_CACHE_HASH_LOCAL);
>>
>>  #undef visit
>>  }
>> @@ -1121,14 +1177,25 @@ scc_entry_compare (const void *p1_, cons
>>  }
>>
>>  /* Return a hash value for the SCC on the SCC stack from FIRST with
>> -   size SIZE.  */
>> +   size SIZE.
>> +   If STREAMER_TREE_CACHE_HASH_LOCAL is returned, we know the SCC will never
>> +   get merged with other unit.  */
>>
>>  static hashval_t
>>  hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size)
>>  {
>>    /* Compute hash values for the SCC members.  */
>>    for (unsigned i = 0; i < size; ++i)
>> -    sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>> +    {
>> +      sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>> +      /* If any member of SCC is local, whole SCC is.  */
>> +      if (sccstack[first+i].hash == STREAMER_TREE_CACHE_HASH_LOCAL)
>> +       {
>> +         for (unsigned i = 0; i < size; ++i)
>> +           sccstack[first+i].hash = STREAMER_TREE_CACHE_HASH_LOCAL;
>> +         return STREAMER_TREE_CACHE_HASH_LOCAL;
>> +       }
>> +    }
>>
>>    if (size == 1)
>>      return sccstack[first].hash;
>> @@ -1161,7 +1228,7 @@ hash_scc (struct streamer_tree_cache_d *
>>        sccstack[first+i].hash = tem[i];
>>        scc_hash = iterative_hash_hashval_t (tem[i], scc_hash);
>>      }
>> -  return scc_hash;
>> +  return scc_hash + (scc_hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>  }
>>
>>  /* DFS walk EXPR and stream SCCs of tree bodies if they are not
>> @@ -1242,26 +1309,29 @@ DFS_write_tree (struct output_block *ob,
>>               scc_hash = hash_scc (ob->writer_cache, first, size);
>>
>>               /* Put the entries with the least number of collisions first.  */
>> -             unsigned entry_start = 0;
>> -             scc_entry_len = size + 1;
>> -             for (unsigned i = 0; i < size;)
>> +             if (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL)
>>                 {
>> -                 unsigned from = i;
>> -                 for (i = i + 1; i < size
>> -                      && (sccstack[first + i].hash
>> -                          == sccstack[first + from].hash); ++i)
>> -                   ;
>> -                 if (i - from < scc_entry_len)
>> +                 unsigned entry_start = 0;
>> +                 scc_entry_len = size + 1;
>> +                 for (unsigned i = 0; i < size;)
>>                     {
>> -                     scc_entry_len = i - from;
>> -                     entry_start = from;
>> +                     unsigned from = i;
>> +                     for (i = i + 1; i < size
>> +                          && (sccstack[first + i].hash
>> +                              == sccstack[first + from].hash); ++i)
>> +                       ;
>> +                     if (i - from < scc_entry_len)
>> +                       {
>> +                         scc_entry_len = i - from;
>> +                         entry_start = from;
>> +                       }
>> +                   }
>> +                 for (unsigned i = 0; i < scc_entry_len; ++i)
>> +                   {
>> +                     scc_entry tem = sccstack[first + i];
>> +                     sccstack[first + i] = sccstack[first + entry_start + i];
>> +                     sccstack[first + entry_start + i] = tem;
>>                     }
>> -               }
>> -             for (unsigned i = 0; i < scc_entry_len; ++i)
>> -               {
>> -                 scc_entry tem = sccstack[first + i];
>> -                 sccstack[first + i] = sccstack[first + entry_start + i];
>> -                 sccstack[first + entry_start + i] = tem;
>>                 }
>>             }
>>
>> @@ -1271,7 +1341,7 @@ DFS_write_tree (struct output_block *ob,
>>           streamer_write_uhwi (ob, scc_hash);
>>
>>           /* Write size-1 SCCs without wrapping them inside SCC bundles.
>> -            All INTEGER_CSTs need to be handled this way as we need
>> +            All INTEGER_CSTs need thiso be handled this way as we need
>>              their type to materialize them.  Also builtins are handled
>>              this way.
>>              ???  We still wrap these in LTO_tree_scc so at the
>> @@ -1291,6 +1361,8 @@ DFS_write_tree (struct output_block *ob,
>>                   tree t = sccstack[first+i].t;
>>                   bool exists_p = streamer_tree_cache_insert (ob->writer_cache,
>>                                                               t, hash, &ix);
>> +                 gcc_assert (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>> +                             || hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>                   gcc_assert (!exists_p);
>>
>>                   if (!lto_is_streamable (t))
>> Index: lto/lto.c
>> ===================================================================
>> --- lto/lto.c   (revision 211488)
>> +++ lto/lto.c   (working copy)
>> @@ -1896,7 +1897,7 @@ lto_read_decls (struct lto_file_decl_dat
>>             continue;
>>
>>           /* Try to unify the SCC with already existing ones.  */
>> -         if (!flag_ltrans
>> +         if (!flag_ltrans && scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>>               && unify_scc (data_in->reader_cache, from,
>>                             len, scc_entry_len, scc_hash))
>>             continue;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]