[RFC] Teaching SCC merging about unit local trees
Richard Biener
richard.guenther@gmail.com
Thu Jun 12 10:46:00 GMT 2014
On Thu, Jun 12, 2014 at 12:34 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jun 12, 2014 at 12:29 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jun 12, 2014 at 10:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Richard,
>>> as briefly discussed before, I would like to teach LTO type merging to not merge
>>> types that was declared in anonymous namespaces and use C++ ODR type names
>>> (stored in DECL_ASSEMBLER_NAME of the TYPE_DECL) to break down canonical types
>>> by their names.
>>>
>>> First thing I need to arrange IMO is to not merge two anonymous types from
>>> two different units. While looking into it I noticed that the current code
>>> in unify_scc that refuses to merge local decls produces conflicts and seems
>>> useless excercise to do.
>>>
>>> This patch introduces special hash code 1 that specify that given SCC is known
>>> to be local and should bypass the merging logic. This is propagated down and
>>> seems to quite noticeably reduce size of SCC hash:
>>>
>>> [WPA] read 10190717 SCCs of average size 1.980409
>>> [WPA] 20181785 tree bodies read in total
>>> [WPA] tree SCC table: size 4194301, 1882700 elements, collision ratio: 0.815497
>>> [WPA] tree SCC max chain length 140 (size 1)
>>> [WPA] Compared 3392363 SCCs, 2718822 collisions (0.801454)
>>> [WPA] Merged 3314075 SCCs
>>> [WPA] Merged 9693632 tree bodies
>>> [WPA] Merged 2467704 types
>>> [WPA] 1783262 types prevailed (4491218 associated trees)
>>> [WPA] GIMPLE canonical type table: size 131071, 94867 elements, 1783347 searches, 737056 collisions (ratio: 0.413299)
>>> [WPA] GIMPLE canonical type pointer-map: 94867 elements, 3973875 searches
>>> [WPA] Compression: 282828785 input bytes, 831186147 uncompressed bytes (ratio: 2.938832)
>>> [WPA] Size of mmap'd section decls: 282828785 bytes
>>>
>>> to:
>>>
>>> [WPA] read 10172291 SCCs of average size 1.982162
>>> [WPA] 20163124 tree bodies read in total
>>> [WPA] tree SCC table: size 2097143, 988764 elements, collision ratio: 0.684967
>>> [WPA] tree SCC max chain length 140 (size 1)
>>> [WPA] Compared 3060932 SCCs, 2405009 collisions (0.785711)
>>> [WPA] Merged 3040565 SCCs
>>> [WPA] Merged 9246482 tree bodies
>>> [WPA] Merged 2382312 types
>>> [WPA] 1868611 types prevailed (4728465 associated trees)
>>> [WPA] GIMPLE canonical type table: size 131071, 94910 elements, 1868696 searches, 790939 collisions (ratio: 0.423257)
>>> [WPA] GIMPLE canonical type pointer-map: 94910 elements, 4216423 searches
>>> [WPA] Compression: 273322455 input bytes, 824178095 uncompressed bytes (ratio: 3.015406)
>>>
>>> We merge less, but not by much and I think we was not right not merge in that cases.
>>
>> If we merge things we may not merge then the fix is to compare_tree_sccs_1,
>> not introducing special cases like you propose.
>>
>> That is, if we are not allowed to merge anonymous namespaces then
>> make sure we don't. We already should not merge types with
>> TYPE_CONTEXT == such namespace by means of
>>
>> /* ??? Global types from different TUs have non-matching
>> TRANSLATION_UNIT_DECLs. Still merge them if they are otherwise
>> equal. */
>> if (TYPE_FILE_SCOPE_P (t1) && TYPE_FILE_SCOPE_P (t2))
>> ;
>> else
>> compare_tree_edges (TYPE_CONTEXT (t1), TYPE_CONTEXT (t2));
>>
>> but we possibly merge a subset of decl kinds from "different" namespaces :
>>
>> /* ??? Global decls from different TUs have non-matching
>> TRANSLATION_UNIT_DECLs. Only consider a small set of
>> decls equivalent, we should not end up merging others. */
>> if ((code == TYPE_DECL
>> || code == NAMESPACE_DECL
>> || code == IMPORTED_DECL
>> || code == CONST_DECL
>> || (VAR_OR_FUNCTION_DECL_P (t1)
>> && (TREE_PUBLIC (t1) || DECL_EXTERNAL (t1))))
>> && DECL_FILE_SCOPE_P (t1) && DECL_FILE_SCOPE_P (t2))
>> ;
>> else
>> compare_tree_edges (DECL_CONTEXT (t1), DECL_CONTEXT (t2));
>>
>> Not sure what we end up doing for NAMESPACE_DECL itself (and what
>> fields we stream for it). It would be interesting to check that.
>>
>> Thus, make sure we don't merge namespace {} and namespace {} from
>> two different units.
>>
>> But effectively you say we have two classes of "global" trees, first
>> those that are mergeable across TUs and second those that are not.
>> This IMHO means we want to separate those to two different LTO
>> sections and simply skip all the merging code for the second (instead
>> of adding hacks to the merging code).
>
> As that also restricts the "pointers" we can have. Mergeable stuff
> may not refer to non-mergeable stuff. Breaks down for initializers:
>
> static int x;
> int *p = &x;
>
> though you could say that as p is initialized (thus not DECL_COMMON)
> this instance cannot be merged with anything else - other entities
> are 'extern int *p' (tree merging is different from symtab merging).
>
> Thus int *p = &x; is also non-mergeable (everything that has tree
> pointers refer to sth not mergeable is not mergeable).
>
> We have similar "issues" with tree_is_indexable and pointers violating
> constraints (like those variadic return types).
>
> Similar as I had the idea to compute "indexability" during the SCC walk
> we can compute "mergability" during it as well.
That is, have a tree_may_be_mergeable_p (), call it during the DFS
walk storing it alongside the visited edges and thus obtain a result
for each SCC, stream that as a flag (a special hash value is ugly,
but well ... I guess it works). The important part is to make an SCC
!tree_may_be_mergeable_p () if any of the outgoing edges from an SCC
are !tree_may_be_mergeable_p (). You seem to miss this.
Richard.
> Richard.
>
>> Richard.
>>
>>>
>>> Would something like this make sense? (I am not saying my definition of unit_local_tree_p
>>> is most polished one :)
>>>
>>> I think next step could be to make anonymous types to bypass the canonical type
>>> merging (i.e. simply save the chains as they comde from frontends forthose) and
>>> then look into computing the type names in free lang data, using odr name hash instaed
>>> of canonical type hash for those named types + link them to canonical type hash
>>> entries and if we end up with unnamed type in canonical type hash, then make its
>>> alias class to conflict with all the named types.
>>>
>>> Honza
>>>
>>> Index: lto-streamer-out.c
>>> ===================================================================
>>> --- lto-streamer-out.c (revision 211488)
>>> +++ lto-streamer-out.c (working copy)
>>> @@ -54,6 +54,47 @@ along with GCC; see the file COPYING3.
>>> #include "cfgloop.h"
>>> #include "builtins.h"
>>>
>>> +/* Return if T can never be shared across units. */
>>> +static bool
>>> +unit_local_tree_p (tree t)
>>> +{
>>> + switch (TREE_CODE (t))
>>> + {
>>> + case VAR_DECL:
>>> + /* Automatic variables are always unit local. */
>>> + if (!TREE_STATIC (t) && !DECL_EXTERNAL (t)
>>> + && !DECL_HARD_REGISTER (t))
>>> + return true;
>>> + /* ... fall through ... */
>>> +
>>> + case FUNCTION_DECL:
>>> + /* Non-public declarations are alwyas local. */
>>> + if (!TREE_PUBLIC (t))
>>> + return true;
>>> +
>>> + /* Public definitions that would cause linker error if
>>> + appeared in other unit. */
>>> + if (TREE_PUBLIC (t)
>>> + && !DECL_EXTERNAL (t)
>>> + && !DECL_WEAK (t))
>>> + return true;
>>> + return false;
>>> + case NAMESPACE_DECL:
>>> + return !TREE_PUBLIC (t);
>>> + case TRANSLATION_UNIT_DECL:
>>> + return true;
>>> + case PARM_DECL:
>>> + case RESULT_DECL:
>>> + case LABEL_DECL:
>>> + case SSA_NAME:
>>> + return true;
>>> + default:
>>> + if (TYPE_P (t)
>>> + && type_in_anonymous_namespace_p (t))
>>> + return true;
>>> + return false;
>>> + }
>>> +}
>>>
>>> static void lto_write_tree (struct output_block*, tree, bool);
>>>
>>> @@ -686,7 +727,9 @@ DFS_write_tree_body (struct output_block
>>> #undef DFS_follow_tree_edge
>>> }
>>>
>>> -/* Return a hash value for the tree T. */
>>> +/* Return a hash value for the tree T.
>>> + If T is local to unit or refers anything local to unit, return 1.
>>> + Otherwise return non-1. */
>>>
>>> static hashval_t
>>> hash_tree (struct streamer_tree_cache_d *cache, tree t)
>>> @@ -694,10 +737,19 @@ hash_tree (struct streamer_tree_cache_d
>>> #define visit(SIBLING) \
>>> do { \
>>> unsigned ix; \
>>> + hashval_t h; \
>>> if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
>>> - v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \
>>> + { \
>>> + h = streamer_tree_cache_get_hash (cache, ix); \
>>> + if (h == STREAMER_TREE_CACHE_HASH_LOCAL) \
>>> + return STREAMER_TREE_CACHE_HASH_LOCAL; \
>>> + v = iterative_hash_hashval_t (h, v); \
>>> + } \
>>> } while (0)
>>>
>>> + if (unit_local_tree_p (t))
>>> + return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> +
>>> /* Hash TS_BASE. */
>>> enum tree_code code = TREE_CODE (t);
>>> hashval_t v = iterative_hash_host_wide_int (code, 0);
>>> @@ -911,7 +963,11 @@ hash_tree (struct streamer_tree_cache_d
>>> hashval_t x;
>>> unsigned ix;
>>> if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), &ix))
>>> - x = streamer_tree_cache_get_hash (cache, ix);
>>> + {
>>> + x = streamer_tree_cache_get_hash (cache, ix);
>>> + if (x == STREAMER_TREE_CACHE_HASH_LOCAL)
>>> + return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> + }
>>> else
>>> x = hash_tree (cache, TREE_TYPE (t));
>>> v = iterative_hash_hashval_t (x, v);
>>> @@ -1101,7 +1157,7 @@ hash_tree (struct streamer_tree_cache_d
>>> visit (OMP_CLAUSE_CHAIN (t));
>>> }
>>>
>>> - return v;
>>> + return v + (v == STREAMER_TREE_CACHE_HASH_LOCAL);
>>>
>>> #undef visit
>>> }
>>> @@ -1121,14 +1177,25 @@ scc_entry_compare (const void *p1_, cons
>>> }
>>>
>>> /* Return a hash value for the SCC on the SCC stack from FIRST with
>>> - size SIZE. */
>>> + size SIZE.
>>> + If STREAMER_TREE_CACHE_HASH_LOCAL is returned, we know the SCC will never
>>> + get merged with other unit. */
>>>
>>> static hashval_t
>>> hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size)
>>> {
>>> /* Compute hash values for the SCC members. */
>>> for (unsigned i = 0; i < size; ++i)
>>> - sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>>> + {
>>> + sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
>>> + /* If any member of SCC is local, whole SCC is. */
>>> + if (sccstack[first+i].hash == STREAMER_TREE_CACHE_HASH_LOCAL)
>>> + {
>>> + for (unsigned i = 0; i < size; ++i)
>>> + sccstack[first+i].hash = STREAMER_TREE_CACHE_HASH_LOCAL;
>>> + return STREAMER_TREE_CACHE_HASH_LOCAL;
>>> + }
>>> + }
>>>
>>> if (size == 1)
>>> return sccstack[first].hash;
>>> @@ -1161,7 +1228,7 @@ hash_scc (struct streamer_tree_cache_d *
>>> sccstack[first+i].hash = tem[i];
>>> scc_hash = iterative_hash_hashval_t (tem[i], scc_hash);
>>> }
>>> - return scc_hash;
>>> + return scc_hash + (scc_hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>> }
>>>
>>> /* DFS walk EXPR and stream SCCs of tree bodies if they are not
>>> @@ -1242,26 +1309,29 @@ DFS_write_tree (struct output_block *ob,
>>> scc_hash = hash_scc (ob->writer_cache, first, size);
>>>
>>> /* Put the entries with the least number of collisions first. */
>>> - unsigned entry_start = 0;
>>> - scc_entry_len = size + 1;
>>> - for (unsigned i = 0; i < size;)
>>> + if (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL)
>>> {
>>> - unsigned from = i;
>>> - for (i = i + 1; i < size
>>> - && (sccstack[first + i].hash
>>> - == sccstack[first + from].hash); ++i)
>>> - ;
>>> - if (i - from < scc_entry_len)
>>> + unsigned entry_start = 0;
>>> + scc_entry_len = size + 1;
>>> + for (unsigned i = 0; i < size;)
>>> {
>>> - scc_entry_len = i - from;
>>> - entry_start = from;
>>> + unsigned from = i;
>>> + for (i = i + 1; i < size
>>> + && (sccstack[first + i].hash
>>> + == sccstack[first + from].hash); ++i)
>>> + ;
>>> + if (i - from < scc_entry_len)
>>> + {
>>> + scc_entry_len = i - from;
>>> + entry_start = from;
>>> + }
>>> + }
>>> + for (unsigned i = 0; i < scc_entry_len; ++i)
>>> + {
>>> + scc_entry tem = sccstack[first + i];
>>> + sccstack[first + i] = sccstack[first + entry_start + i];
>>> + sccstack[first + entry_start + i] = tem;
>>> }
>>> - }
>>> - for (unsigned i = 0; i < scc_entry_len; ++i)
>>> - {
>>> - scc_entry tem = sccstack[first + i];
>>> - sccstack[first + i] = sccstack[first + entry_start + i];
>>> - sccstack[first + entry_start + i] = tem;
>>> }
>>> }
>>>
>>> @@ -1271,7 +1341,7 @@ DFS_write_tree (struct output_block *ob,
>>> streamer_write_uhwi (ob, scc_hash);
>>>
>>> /* Write size-1 SCCs without wrapping them inside SCC bundles.
>>> - All INTEGER_CSTs need to be handled this way as we need
>>> + All INTEGER_CSTs need thiso be handled this way as we need
>>> their type to materialize them. Also builtins are handled
>>> this way.
>>> ??? We still wrap these in LTO_tree_scc so at the
>>> @@ -1291,6 +1361,8 @@ DFS_write_tree (struct output_block *ob,
>>> tree t = sccstack[first+i].t;
>>> bool exists_p = streamer_tree_cache_insert (ob->writer_cache,
>>> t, hash, &ix);
>>> + gcc_assert (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>>> + || hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>>> gcc_assert (!exists_p);
>>>
>>> if (!lto_is_streamable (t))
>>> Index: lto/lto.c
>>> ===================================================================
>>> --- lto/lto.c (revision 211488)
>>> +++ lto/lto.c (working copy)
>>> @@ -1896,7 +1897,7 @@ lto_read_decls (struct lto_file_decl_dat
>>> continue;
>>>
>>> /* Try to unify the SCC with already existing ones. */
>>> - if (!flag_ltrans
>>> + if (!flag_ltrans && scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>>> && unify_scc (data_in->reader_cache, from,
>>> len, scc_entry_len, scc_hash))
>>> continue;
More information about the Gcc-patches
mailing list