This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Teaching SCC merging about unit local trees


On Thu, Jun 12, 2014 at 10:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Richard,
> as briefly discussed before, I would like to teach LTO type merging to not merge
> types that was declared in anonymous namespaces and use C++ ODR type names
> (stored in DECL_ASSEMBLER_NAME of the TYPE_DECL) to break down canonical types
> by their names.
>
> First thing I need to arrange IMO is to not merge two anonymous types from
> two different units.  While looking into it I noticed that the current code
> in unify_scc that refuses to merge local decls produces conflicts and seems
> useless excercise to do.
>
> This patch introduces special hash code 1 that specify that given SCC is known
> to be local and should bypass the merging logic. This is propagated down and
> seems to quite noticeably reduce size of SCC hash:
>
> [WPA] read 10190717 SCCs of average size 1.980409
> [WPA] 20181785 tree bodies read in total
> [WPA] tree SCC table: size 4194301, 1882700 elements, collision ratio: 0.815497
> [WPA] tree SCC max chain length 140 (size 1)
> [WPA] Compared 3392363 SCCs, 2718822 collisions (0.801454)
> [WPA] Merged 3314075 SCCs
> [WPA] Merged 9693632 tree bodies
> [WPA] Merged 2467704 types
> [WPA] 1783262 types prevailed (4491218 associated trees)
> [WPA] GIMPLE canonical type table: size 131071, 94867 elements, 1783347 searches, 737056 collisions (ratio: 0.413299)
> [WPA] GIMPLE canonical type pointer-map: 94867 elements, 3973875 searches
> [WPA] Compression: 282828785 input bytes, 831186147 uncompressed bytes (ratio: 2.938832)
> [WPA] Size of mmap'd section decls: 282828785 bytes
>
> to:
>
> [WPA] read 10172291 SCCs of average size 1.982162
> [WPA] 20163124 tree bodies read in total
> [WPA] tree SCC table: size 2097143, 988764 elements, collision ratio: 0.684967
> [WPA] tree SCC max chain length 140 (size 1)
> [WPA] Compared 3060932 SCCs, 2405009 collisions (0.785711)
> [WPA] Merged 3040565 SCCs
> [WPA] Merged 9246482 tree bodies
> [WPA] Merged 2382312 types
> [WPA] 1868611 types prevailed (4728465 associated trees)
> [WPA] GIMPLE canonical type table: size 131071, 94910 elements, 1868696 searches, 790939 collisions (ratio: 0.423257)
> [WPA] GIMPLE canonical type pointer-map: 94910 elements, 4216423 searches
> [WPA] Compression: 273322455 input bytes, 824178095 uncompressed bytes (ratio: 3.015406)
>
> We merge less, but not by much and I think we was not right not merge in that cases.

If we merge things we may not merge then the fix is to compare_tree_sccs_1,
not introducing special cases like you propose.

That is, if we are not allowed to merge anonymous namespaces then
make sure we don't.  We already should not merge types with
TYPE_CONTEXT == such namespace by means of

      /* ???  Global types from different TUs have non-matching
         TRANSLATION_UNIT_DECLs.  Still merge them if they are otherwise
         equal.  */
      if (TYPE_FILE_SCOPE_P (t1) && TYPE_FILE_SCOPE_P (t2))
        ;
      else
        compare_tree_edges (TYPE_CONTEXT (t1), TYPE_CONTEXT (t2));

but we possibly merge a subset of decl kinds from "different" namespaces :

      /* ???  Global decls from different TUs have non-matching
         TRANSLATION_UNIT_DECLs.  Only consider a small set of
         decls equivalent, we should not end up merging others.  */
      if ((code == TYPE_DECL
           || code == NAMESPACE_DECL
           || code == IMPORTED_DECL
           || code == CONST_DECL
           || (VAR_OR_FUNCTION_DECL_P (t1)
               && (TREE_PUBLIC (t1) || DECL_EXTERNAL (t1))))
          && DECL_FILE_SCOPE_P (t1) && DECL_FILE_SCOPE_P (t2))
        ;
      else
        compare_tree_edges (DECL_CONTEXT (t1), DECL_CONTEXT (t2));

Not sure what we end up doing for NAMESPACE_DECL itself (and what
fields we stream for it).  It would be interesting to check that.

Thus, make sure we don't merge namespace {} and namespace {} from
two different units.

But effectively you say we have two classes of "global" trees, first
those that are mergeable across TUs and second those that are not.
This IMHO means we want to separate those to two different LTO
sections and simply skip all the merging code for the second (instead
of adding hacks to the merging code).

Richard.

>
> Would something like this make sense? (I am not saying my definition of unit_local_tree_p
> is most polished one :)
>
> I think next step could be to make anonymous types to bypass the canonical type
> merging (i.e. simply save the chains as they comde from frontends forthose) and
> then look into computing the type names in free lang data, using odr name hash instaed
> of canonical type hash for those named types + link them to canonical type hash
> entries and if we end up with unnamed type in canonical type hash, then make its
> alias class to conflict with all the named types.
>
> Honza
>
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c  (revision 211488)
> +++ lto-streamer-out.c  (working copy)
> @@ -54,6 +54,47 @@ along with GCC; see the file COPYING3.
>  #include "cfgloop.h"
>  #include "builtins.h"
>
> +/* Return if T can never be shared across units.  */
> +static bool
> +unit_local_tree_p (tree t)
> +{
> +  switch (TREE_CODE (t))
> +    {
> +      case VAR_DECL:
> +       /* Automatic variables are always unit local.  */
> +       if (!TREE_STATIC (t) && !DECL_EXTERNAL (t)
> +           && !DECL_HARD_REGISTER (t))
> +         return true;
> +       /* ... fall through ... */
> +
> +      case FUNCTION_DECL:
> +       /* Non-public declarations are alwyas local. */
> +       if (!TREE_PUBLIC (t))
> +         return true;
> +
> +       /* Public definitions that would cause linker error if
> +          appeared in other unit.  */
> +       if (TREE_PUBLIC (t)
> +           && !DECL_EXTERNAL (t)
> +           && !DECL_WEAK (t))
> +         return true;
> +       return false;
> +      case NAMESPACE_DECL:
> +       return !TREE_PUBLIC (t);
> +      case TRANSLATION_UNIT_DECL:
> +       return true;
> +      case PARM_DECL:
> +      case RESULT_DECL:
> +      case LABEL_DECL:
> +      case SSA_NAME:
> +       return true;
> +      default:
> +       if (TYPE_P (t)
> +           && type_in_anonymous_namespace_p (t))
> +         return true;
> +       return false;
> +    }
> +}
>
>  static void lto_write_tree (struct output_block*, tree, bool);
>
> @@ -686,7 +727,9 @@ DFS_write_tree_body (struct output_block
>  #undef DFS_follow_tree_edge
>  }
>
> -/* Return a hash value for the tree T.  */
> +/* Return a hash value for the tree T.
> +   If T is local to unit or refers anything local to unit, return 1.
> +   Otherwise return non-1.  */
>
>  static hashval_t
>  hash_tree (struct streamer_tree_cache_d *cache, tree t)
> @@ -694,10 +737,19 @@ hash_tree (struct streamer_tree_cache_d
>  #define visit(SIBLING) \
>    do { \
>      unsigned ix; \
> +    hashval_t h; \
>      if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> -      v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \
> +      { \
> +        h = streamer_tree_cache_get_hash (cache, ix); \
> +        if (h == STREAMER_TREE_CACHE_HASH_LOCAL) \
> +         return STREAMER_TREE_CACHE_HASH_LOCAL; \
> +        v = iterative_hash_hashval_t (h, v); \
> +      } \
>    } while (0)
>
> +  if (unit_local_tree_p (t))
> +    return STREAMER_TREE_CACHE_HASH_LOCAL;
> +
>    /* Hash TS_BASE.  */
>    enum tree_code code = TREE_CODE (t);
>    hashval_t v = iterative_hash_host_wide_int (code, 0);
> @@ -911,7 +963,11 @@ hash_tree (struct streamer_tree_cache_d
>           hashval_t x;
>           unsigned ix;
>           if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), &ix))
> -           x = streamer_tree_cache_get_hash (cache, ix);
> +           {
> +             x = streamer_tree_cache_get_hash (cache, ix);
> +             if (x == STREAMER_TREE_CACHE_HASH_LOCAL)
> +               return STREAMER_TREE_CACHE_HASH_LOCAL;
> +           }
>           else
>             x = hash_tree (cache, TREE_TYPE (t));
>           v = iterative_hash_hashval_t (x, v);
> @@ -1101,7 +1157,7 @@ hash_tree (struct streamer_tree_cache_d
>        visit (OMP_CLAUSE_CHAIN (t));
>      }
>
> -  return v;
> +  return v + (v == STREAMER_TREE_CACHE_HASH_LOCAL);
>
>  #undef visit
>  }
> @@ -1121,14 +1177,25 @@ scc_entry_compare (const void *p1_, cons
>  }
>
>  /* Return a hash value for the SCC on the SCC stack from FIRST with
> -   size SIZE.  */
> +   size SIZE.
> +   If STREAMER_TREE_CACHE_HASH_LOCAL is returned, we know the SCC will never
> +   get merged with other unit.  */
>
>  static hashval_t
>  hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size)
>  {
>    /* Compute hash values for the SCC members.  */
>    for (unsigned i = 0; i < size; ++i)
> -    sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
> +    {
> +      sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
> +      /* If any member of SCC is local, whole SCC is.  */
> +      if (sccstack[first+i].hash == STREAMER_TREE_CACHE_HASH_LOCAL)
> +       {
> +         for (unsigned i = 0; i < size; ++i)
> +           sccstack[first+i].hash = STREAMER_TREE_CACHE_HASH_LOCAL;
> +         return STREAMER_TREE_CACHE_HASH_LOCAL;
> +       }
> +    }
>
>    if (size == 1)
>      return sccstack[first].hash;
> @@ -1161,7 +1228,7 @@ hash_scc (struct streamer_tree_cache_d *
>        sccstack[first+i].hash = tem[i];
>        scc_hash = iterative_hash_hashval_t (tem[i], scc_hash);
>      }
> -  return scc_hash;
> +  return scc_hash + (scc_hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>  }
>
>  /* DFS walk EXPR and stream SCCs of tree bodies if they are not
> @@ -1242,26 +1309,29 @@ DFS_write_tree (struct output_block *ob,
>               scc_hash = hash_scc (ob->writer_cache, first, size);
>
>               /* Put the entries with the least number of collisions first.  */
> -             unsigned entry_start = 0;
> -             scc_entry_len = size + 1;
> -             for (unsigned i = 0; i < size;)
> +             if (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL)
>                 {
> -                 unsigned from = i;
> -                 for (i = i + 1; i < size
> -                      && (sccstack[first + i].hash
> -                          == sccstack[first + from].hash); ++i)
> -                   ;
> -                 if (i - from < scc_entry_len)
> +                 unsigned entry_start = 0;
> +                 scc_entry_len = size + 1;
> +                 for (unsigned i = 0; i < size;)
>                     {
> -                     scc_entry_len = i - from;
> -                     entry_start = from;
> +                     unsigned from = i;
> +                     for (i = i + 1; i < size
> +                          && (sccstack[first + i].hash
> +                              == sccstack[first + from].hash); ++i)
> +                       ;
> +                     if (i - from < scc_entry_len)
> +                       {
> +                         scc_entry_len = i - from;
> +                         entry_start = from;
> +                       }
> +                   }
> +                 for (unsigned i = 0; i < scc_entry_len; ++i)
> +                   {
> +                     scc_entry tem = sccstack[first + i];
> +                     sccstack[first + i] = sccstack[first + entry_start + i];
> +                     sccstack[first + entry_start + i] = tem;
>                     }
> -               }
> -             for (unsigned i = 0; i < scc_entry_len; ++i)
> -               {
> -                 scc_entry tem = sccstack[first + i];
> -                 sccstack[first + i] = sccstack[first + entry_start + i];
> -                 sccstack[first + entry_start + i] = tem;
>                 }
>             }
>
> @@ -1271,7 +1341,7 @@ DFS_write_tree (struct output_block *ob,
>           streamer_write_uhwi (ob, scc_hash);
>
>           /* Write size-1 SCCs without wrapping them inside SCC bundles.
> -            All INTEGER_CSTs need to be handled this way as we need
> +            All INTEGER_CSTs need thiso be handled this way as we need
>              their type to materialize them.  Also builtins are handled
>              this way.
>              ???  We still wrap these in LTO_tree_scc so at the
> @@ -1291,6 +1361,8 @@ DFS_write_tree (struct output_block *ob,
>                   tree t = sccstack[first+i].t;
>                   bool exists_p = streamer_tree_cache_insert (ob->writer_cache,
>                                                               t, hash, &ix);
> +                 gcc_assert (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
> +                             || hash == STREAMER_TREE_CACHE_HASH_LOCAL);
>                   gcc_assert (!exists_p);
>
>                   if (!lto_is_streamable (t))
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c   (revision 211488)
> +++ lto/lto.c   (working copy)
> @@ -1896,7 +1897,7 @@ lto_read_decls (struct lto_file_decl_dat
>             continue;
>
>           /* Try to unify the SCC with already existing ones.  */
> -         if (!flag_ltrans
> +         if (!flag_ltrans && scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL
>               && unify_scc (data_in->reader_cache, from,
>                             len, scc_entry_len, scc_hash))
>             continue;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]