[RFC] Teaching SCC merging about unit local trees

Richard Biener richard.guenther@gmail.com
Thu Jun 12 10:46:00 GMT 2014


On Thu, Jun 12, 2014 at 12:34 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jun 12, 2014 at 12:29 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jun 12, 2014 at 10:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Richard,
>>> as briefly discussed before, I would like to teach LTO type merging to not merge
>>> types that was declared in anonymous namespaces and use C++ ODR type names
>>> (stored in DECL_ASSEMBLER_NAME of the TYPE_DECL) to break down canonical types
>>> by their names.
>>>
>>> First thing I need to arrange IMO is to not merge two anonymous types from
>>> two different units.  While looking into it I noticed that the current code
>>> in unify_scc that refuses to merge local decls produces conflicts and seems
>>> useless excercise to do.
>>>
>>> This patch introduces special hash code 1 that specify that given SCC is known
>>> to be local and should bypass the merging logic. This is propagated down and
>>> seems to quite noticeably reduce size of SCC hash:
>>>
>>> [WPA] read 10190717 SCCs of average size 1.980409
>>> [WPA] 20181785 tree bodies read in total
>>> [WPA] tree SCC table: size 4194301, 1882700 elements, collision ratio: 0.815497
>>> [WPA] tree SCC max chain length 140 (size 1)
>>> [WPA] Compared 3392363 SCCs, 2718822 collisions (0.801454)
>>> [WPA] Merged 3314075 SCCs
>>> [WPA] Merged 9693632 tree bodies
>>> [WPA] Merged 2467704 types
>>> [WPA] 1783262 types prevailed (4491218 associated trees)
>>> [WPA] GIMPLE canonical type table: size 131071, 94867 elements, 1783347 searches, 737056 collisions (ratio: 0.413299)
>>> [WPA] GIMPLE canonical type pointer-map: 94867 elements, 3973875 searches
>>> [WPA] Compression: 282828785 input bytes, 831186147 uncompressed bytes (ratio: 2.938832)
>>> [WPA] Size of mmap'd section decls: 282828785 bytes
>>>
>>> to:
>>>
>>> [WPA] read 10172291 SCCs of average size 1.982162
>>> [WPA] 20163124 tree bodies read in total
>>> [WPA] tree SCC table: size 2097143, 988764 elements, collision ratio: 0.684967
>>> [WPA] tree SCC max chain length 140 (size 1)
>>> [WPA] Compared 3060932 SCCs, 2405009 collisions (0.785711)
>>> [WPA] Merged 3040565 SCCs
>>> [WPA] Merged 9246482 tree bodies
>>> [WPA] Merged 2382312 types
>>> [WPA] 1868611 types prevailed (4728465 associated trees)
>>> [WPA] GIMPLE canonical type table: size 131071, 94910 elements, 1868696 searches, 790939 collisions (ratio: 0.423257)
>>> [WPA] GIMPLE canonical type pointer-map: 94910 elements, 4216423 searches
>>> [WPA] Compression: 273322455 input bytes, 824178095 uncompressed bytes (ratio: 3.015406)
>>>
>>> We merge less, but not by much and I think we was not right not merge in that cases.
>>
>> If we merge things we may not merge then the fix is to compare_tree_sccs_1,
>> not introducing special cases like you propose.
>>
>> That is, if we are not allowed to merge anonymous namespaces then
>> make sure we don't.  We already should not merge types with
>> TYPE_CONTEXT == such namespace by means of
>>
>>       /* ???  Global types from different TUs have non-matching
>>          TRANSLATION_UNIT_DECLs.  Still merge them if they are otherwise
>>          equal.  */
>>       if (TYPE_FILE_SCOPE_P (t1) && TYPE_FILE_SCOPE_P (t2))
>>         ;
>>       else
>>         compare_tree_edges (TYPE_CONTEXT (t1), TYPE_CONTEXT (t2));
>>
>> but we possibly merge a subset of decl kinds from "different" namespaces :
>>
>>       /* ???  Global decls from different TUs have non-matching
>>          TRANSLATION_UNIT_DECLs.  Only consider a small set of
>>          decls equivalent, we should not end up merging others.  */
>>       if ((code == TYPE_DECL
>>            || code == NAMESPACE_DECL
>>            || code == IMPORTED_DECL
>>            || code == CONST_DECL
>>            || (VAR_OR_FUNCTION_DECL_P (t1)
>>                && (TREE_PUBLIC (t1) || DECL_EXTERNAL (t1))))
>>           && DECL_FILE_SCOPE_P (t1) && DECL_FILE_SCOPE_P (t2))
>>         ;
>>       else
>>         compare_tree_edges (DECL_CONTEXT (t1), DECL_CONTEXT (t2));
>>
>> Not sure what we end up doing for NAMESPACE_DECL itself (and what
>> fields we stream for it).  It would be interesting to check that.
>>
>> Thus, make sure we don't merge namespace {} and namespace {} from
>> two different units.
>>
>> But effectively you say we have two classes of "global" trees, first
>> those that are mergeable across TUs and second those that are not.
>> This IMHO means we want to separate those to two different LTO
>> sections and simply skip all the merging code for the second (instead
>> of adding hacks to the merging code).
>
> As that also restricts the "pointers" we can have.  Mergeable stuff
> may not refer to non-mergeable stuff.  Breaks down for initializers:
>
> static int x;
> int *p = &x;
>
> though you could say that as p is initialized (thus not DECL_COMMON)
> this instance cannot be merged with anything else - other entities
> are 'extern int *p' (tree merging is different from symtab merging).
>
> Thus int *p = &x; is also non-mergeable (everything that has tree
> pointers refer to sth not mergeable is not mergeable).
>
> We have similar "issues" with tree_is_indexable and pointers violating
> constraints (like those variadic return types).
>
> Similar as I had the idea to compute "indexability" during the SCC walk
> we can compute "mergability" during it as well.

That is, have a tree_may_be_mergeable_p (), call it during the DFS
walk storing it alongside the visited edges and thus obtain a result
for each SCC, stream that as a flag (a special hash value is ugly,
but well ... I guess it works).  The important part is to make an SCC
!tree_may_be_mergeable_p () if any of the outgoing edges from an SCC
are !tree_may_be_mergeable_p ().  You seem to miss this.

Richard.

> Richard.
>
>> Richard.
>>
>>>
>>> Would something like this make sense? (I am not saying my definition of unit_local_tree_p
>>> is most polished one :)
>>>
>>> I think next step could be to make anonymous types to bypass the canonical type
>>> merging (i.e. simply save the chains as they comde from frontends forthose) and
>>> then look into computing the type names in free lang data, using odr name hash instaed
>>> of canonical type hash for those named types + link them to canonical type hash
>>> entries and if we end up with unnamed type in canonical type hash, then make its
>>> alias class to conflict with all the named types.
>>>
>>> Honza
>>>
>>> Index: lto-streamer-out.c
>>> ===================================================================
>>> --- lto-streamer-out.c  (revision 211488)
>>> +++ lto-streamer-out.c  (working copy)
>>> @@ -54,6 +54,47 @@ along with GCC; see the file COPYING3.
>>>  #include "cfgloop.h"
>>>  #include "builtins.h"
>>>
>>> +/* Return if T can never be shared across units.  */
>>> +static bool
>>> +unit_local_tree_p (tree t)
>>> +{
>>> +  switch (TREE_CODE (t))
>>> +    {
>>> +      case VAR_DECL:
>>> +       /* Automatic variables are always unit local.  */
>>> +       if (!TREE_STATIC (t) && !DECL_EXTERNAL (t)
>>> +           && !DECL_HARD_REGISTER (t))
>>> +         return true;
>>> +       /* ... fall through ... */
>>> +
>>> +      case FUNCTION_DECL:
>>> +       /* Non-public declarations are alwyas local. */
>>> +       if (!TREE_PUBLIC (t))
>>> +         return true;
>>> +
>>> +       /* Public definitions that would cause linker error if
>>> +          appeared in other unit.  */
>>> +       if (TREE_PUBLIC (t)
>>> +           && !DECL_EXTERNAL (t)
>>> +           && !DECL_WEAK (t))
>>> +         return true;
>>> +       return false;
>>> +      case NAMESPACE_DECL:
>>> +       return !TREE_PUBLIC (t);
>>> +      case TRANSLATION_UNIT_DECL:
>>> +       return true;
>>> +      case PARM_DECL:
>>> +      case RESULT_DECL:
>>> +      case LABEL_DECL:
>>> +      case SSA_NAME:
>>> +       return true;
>>> +      default:
>>> +       if (TYPE_P (t)
>>> +           && type_in_anonymous_namespace_p (t))
>>> +         return true;
>>> +       return false;
>>> +    }
>>> +}
>>>
>>>  static void lto_write_tree (struct output_block*, tree, bool);
>>>
>>> @@ -686,7 +727,9 @@ DFS_write_tree_body (struct output_block
>>>  #undef DFS_follow_tree_edge
>>>  }
>>>
>>> -/* Return a hash value for the tree T.  */
>>> +/* Return a hash value for the tree T.
>>> +   If T is local to unit or refers anything local to unit, return 1.
>>> +   Otherwise return non-1.  */
>>>
>>>  static hashval_t
>>>  hash_tree (struct streamer_tree_cache_d *cache, tree t)
>>> @@ -694,10 +737,19 @@ hash_tree (struct streamer_tree_cache_d
>>>  #define visit(SIBLING) \
>>>    do { \
>>>      unsigned ix; \
>>> +    hashval_t h; \
>>>      if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
>>> -      v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \
>>> +      { \
>>> +        h = streamer_tree_cache_get_hash (cache, ix); \
>>> +        if (h == STREAMER_TREE_CACHE_HASH_LOCAL) \
>>> +         return STREAMER_TREE_CACHE_HASH_LOCAL; \
>>> +        v = iterative_hash_hashval_t (h, v); \
>>> +      } \
>>>    } while (0)
>>>
>>> +  if (unit_local_tree_p (t))
>>> +    return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> +
>>>    /* Hash TS_BASE.  */
>>>    enum tree_code code = TREE_CODE (t);
>>>    hashval_t v = iterative_hash_host_wide_int (code, 0);
>>> @@ -911,7 +963,11 @@ hash_tree (struct streamer_tree_cache_d
>>>           hashval_t x;
>>>           unsigned ix;
>>>           if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), &ix))
>>> -           x = streamer_tree_cache_get_hash (cache, ix);
>>> +           {
>>> +             x = streamer_tree_cache_get_hash (cache, ix);
>>> +             if (x == STREAMER_TREE_CACHE_HASH_LOCAL)
>>> +               return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> +           }
>>>           else
>>>             x = hash_tree (cache, TREE_TYPE (t));
>>>           v = iterative_hash_hashval_t (x, v);
>>> @@ -1101,7 +1157,7 @@ hash_tree (struct streamer_tree_cache_d
>>>        visit (OMP_CLAUSE_CHAIN (t));
>>>      }
>>>
>>> -  return v;
>>> +  return v + (v == STREAMER_TREE_CACHE_HASH_LOCAL);
>>>
>>>  #undef visit
>>>  }
>>> @@ -1121,14 +1177,25 @@ scc_entry_compare (const void *p1_, cons
>>>  }
>>>
>>>  /* Return a hash value for the SCC on the SCC stack from FIRST with
>>> -   size SIZE.  */
>>> +   size SIZE.
>>> +   If STREAMER_TREE_CACHE_HASH_LOCAL is returned, we know the SCC will never
>>> +   get merged with other unit.  */
>>>
>>>  static hashval_t
>>>  hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size)
>>>  {
>>>    /* Compute hash values for the SCC members.  */
>>>    for (unsigned i = 0; i < size; ++i)
>>> -    sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>>> +    {
>>> +      sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>>> +      /* If any member of SCC is local, whole SCC is.  */
>>> +      if (sccstack[first+i].hash == STREAMER_TREE_CACHE_HASH_LOCAL)
>>> +       {
>>> +         for (unsigned i = 0; i < size; ++i)
>>> +           sccstack[first+i].hash = STREAMER_TREE_CACHE_HASH_LOCAL;
>>> +         return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> +       }
>>> +    }
>>>
>>>    if (size == 1)
>>>      return sccstack[first].hash;
>>> @@ -1161,7 +1228,7 @@ hash_scc (struct streamer_tree_cache_d *
>>>        sccstack[first+i].hash = tem[i];
>>>        scc_hash = iterative_hash_hashval_t (tem[i], scc_hash);
>>>      }
>>> -  return scc_hash;
>>> +  return scc_hash + (scc_hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>>  }
>>>
>>>  /* DFS walk EXPR and stream SCCs of tree bodies if they are not
>>> @@ -1242,26 +1309,29 @@ DFS_write_tree (struct output_block *ob,
>>>               scc_hash = hash_scc (ob->writer_cache, first, size);
>>>
>>>               /* Put the entries with the least number of collisions first.  */
>>> -             unsigned entry_start = 0;
>>> -             scc_entry_len = size + 1;
>>> -             for (unsigned i = 0; i < size;)
>>> +             if (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL)
>>>                 {
>>> -                 unsigned from = i;
>>> -                 for (i = i + 1; i < size
>>> -                      && (sccstack[first + i].hash
>>> -                          == sccstack[first + from].hash); ++i)
>>> -                   ;
>>> -                 if (i - from < scc_entry_len)
>>> +                 unsigned entry_start = 0;
>>> +                 scc_entry_len = size + 1;
>>> +                 for (unsigned i = 0; i < size;)
>>>                     {
>>> -                     scc_entry_len = i - from;
>>> -                     entry_start = from;
>>> +                     unsigned from = i;
>>> +                     for (i = i + 1; i < size
>>> +                          && (sccstack[first + i].hash
>>> +                              == sccstack[first + from].hash); ++i)
>>> +                       ;
>>> +                     if (i - from < scc_entry_len)
>>> +                       {
>>> +                         scc_entry_len = i - from;
>>> +                         entry_start = from;
>>> +                       }
>>> +                   }
>>> +                 for (unsigned i = 0; i < scc_entry_len; ++i)
>>> +                   {
>>> +                     scc_entry tem = sccstack[first + i];
>>> +                     sccstack[first + i] = sccstack[first + entry_start + i];
>>> +                     sccstack[first + entry_start + i] = tem;
>>>                     }
>>> -               }
>>> -             for (unsigned i = 0; i < scc_entry_len; ++i)
>>> -               {
>>> -                 scc_entry tem = sccstack[first + i];
>>> -                 sccstack[first + i] = sccstack[first + entry_start + i];
>>> -                 sccstack[first + entry_start + i] = tem;
>>>                 }
>>>             }
>>>
>>> @@ -1271,7 +1341,7 @@ DFS_write_tree (struct output_block *ob,
>>>           streamer_write_uhwi (ob, scc_hash);
>>>
>>>           /* Write size-1 SCCs without wrapping them inside SCC bundles.
>>> -            All INTEGER_CSTs need to be handled this way as we need
>>> +            All INTEGER_CSTs need thiso be handled this way as we need
>>>              their type to materialize them.  Also builtins are handled
>>>              this way.
>>>              ???  We still wrap these in LTO_tree_scc so at the
>>> @@ -1291,6 +1361,8 @@ DFS_write_tree (struct output_block *ob,
>>>                   tree t = sccstack[first+i].t;
>>>                   bool exists_p = streamer_tree_cache_insert (ob->writer_cache,
>>>                                                               t, hash, &ix);
>>> +                 gcc_assert (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>>> +                             || hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>>                   gcc_assert (!exists_p);
>>>
>>>                   if (!lto_is_streamable (t))
>>> Index: lto/lto.c
>>> ===================================================================
>>> --- lto/lto.c   (revision 211488)
>>> +++ lto/lto.c   (working copy)
>>> @@ -1896,7 +1897,7 @@ lto_read_decls (struct lto_file_decl_dat
>>>             continue;
>>>
>>>           /* Try to unify the SCC with already existing ones.  */
>>> -         if (!flag_ltrans
>>> +         if (!flag_ltrans && scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>>>               && unify_scc (data_in->reader_cache, from,
>>>                             len, scc_entry_len, scc_hash))
>>>             continue;



More information about the Gcc-patches mailing list