RFC: variant and ODR based type merging during LTO streaming

Richard Biener rguenther@suse.de
Mon Oct 1 10:08:00 GMT 2018


On Fri, 28 Sep 2018, Jan Hubicka wrote:

> Hi,
> this is a proof-of-concept patch for type merging during LTO streaming. It
> does two things
> 1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list
>    This is useful at compilation time because frontends produce more variants
>    than needed. The effect of this is about 2% of decl stream
> 2) replace ODR types by their prevailing variant if ODR violation was not
>    detected.  This is useful at WPA time and saves about 50% of global decl
>    stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.
> 
> Here are stats of number of nodes hitting global stream during WPA->ltrans
> streaming of firefox:
> 
> Before  after
>    7354    7280 namespace_decl
>   46593   14084 union_type
>   18496   15405 translation_unit_decl
>   22816   21548 integer_type
>  107047   31681 enumeral_type
>   67072   46390 array_type
>  541220   99572 pointer_plus_expr
>  542360  100712 addr_expr
>  167657  154019 var_decl
>  864769  182691 tree_binfo
>  240200  206783 reference_type
>  410403  316877 function_type
> 1862985  522524 type_decl
> 1954864  652664 record_type
> 1070333  880209 method_type
> 1582055 1014270 pointer_type
> 1495367 1406670 tree_list
> 7926385 1483545 field_decl
> 1715133 1725612 function_decl
> 3384810 3191406 identifier_node
> 
> So largest savings are in field_decls (to 18% and they are not even merged
> fully by the patch), pointer tpes (to 64%), record types (to 30%) and type
> decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to
> reffer to vtables to 21%). Patch bootstraps, regtestes and seems to
> work reliably.
> 
> The merging is currently done during streaming by simply looking up prevailing
> type and populating the cache (so we do not get too many repeated lookups).
> Similar tricks can be added to checksum calculation.
> 
> I am not sure this is best possible place, especially for 2) since it provides
> really quite major reduction in the IL size.  I am quite sure we do not want
> to do that at stream in time in parallel to SCC merging because it affects the
> SCC components and also we do not want to merge this way types containing ODR
> violations as QOI issue. (merging two completely different types would lead
> to ICEs but merging two TBAA incompatible types is probably equally bad).
> 
> So we need to do following
>  a) read in the the types & do tree mering
>  b) populate odr type hash, do ODR violation checks and decide on what types
>     are consistent
>  c) do the ODR based merging sometime before the trees hit ltrans stream.
> 
> I was thinking that perhaps c) can also be added to mentions_vars_p machinery
> which would make it somewhat more costy but we could free those duplicates and
> save some more memory during WPA.  I am also not usre if longer term we
> would not want to have mode that bypasses WPA->ltrans stream completely and
> just compile out of WPA process in threads or forks.
> Drawback is that building vector recording every single type pointer is going
> to also consume quite some memory.
> 
> Patch is also not complete because it does not merge field decls. Analogous
> tricks can be added to streamer, but I just tought I would like to discuss
> alternative implementations first.
> 
> WPA compile time improves by about 11%, so merging benefits overcome the extra
> complexity in sreamer.
> 
> I hope that with this patch I will be able to increase default number of partitions
> since the global stream is getting less percentage of the ltrans files. 33%
> is still quite a lot, but far better than what we had previously. Hopefully
> we could still dramatically cut this down in followup.
> 
> Also obvious drawback is that this helps to C++ only. I looked into stats from
> C programs and also Toon's stats on Fortran and I tend to believe that C++
> is really much worse than those two.

The ODR savings really look good but as you say the implementation is
somewhat "tricky".

I'd like to see the type-variant done separately (of course) and
also differently.  This is because when enabling 
free-lang-data by default we should be able to get benefits for
non-LTO compilations as well.  Specifically I'd like us to
(in free-lang-data) "canonicalize" existing variants applying the
rules you derived for the lazier compare.  At the point you then
drop non-fld-walked variants we could weed out the resulting
duplicates, keeping only a prevailing one in the list and (ab-)using
GC to fixup references.  Like with ggc_register_fixup (void *from, void 
*to) which would when following edges at mark time, adjust them
according to this map (I'm not sure if we'd want to do it manually
and record a vector of possibly affected TREE_TYPE uses during the fld
walk).  We could also (easily?) optimize the streaming itself by
streaming only differences to the main variant of a type (but then
we could change our tree data structure to make that trivial).

I wonder how much "ODR" merging we'd get for free when we canonicalize
types some more - that is, how do ODR equal but tree unequal types
usually differ?  I guess it might be most of the time fields with
pointer to incomplete vs. complete type?

Thanks,
Richard.


> Honza
> 
> Index: ipa-devirt.c
> ===================================================================
> --- ipa-devirt.c	(revision 264689)
> +++ ipa-devirt.c	(working copy)
> @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
>    return val;
>  }
>  
> +/* Return the main variant of the odr type.  This is used for straming out
> +   to reduce number of type duplicates hitting the WPA->LTRANS streams.
> +   Do not do so when ODR violation was detected since the type may be
> +   structurally different then.  */
> +
> +tree
> +prevailing_odr_type (tree t)
> +{
> +  t = TYPE_MAIN_VARIANT (t);
> +  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> +     We can not merge these because this does not honnor precision and
> +     signedness.  */
> +  if (!type_with_linkage_p (t)
> +      || type_in_anonymous_namespace_p (t)
> +      || TREE_CODE (t) == INTEGER_TYPE
> +      || !COMPLETE_TYPE_P (t))
> +    return t;
> +  odr_type ot = get_odr_type (t, true);
> +  if (!ot || !ot->odr_violated)
> +    return ot->type;
> +  return t;
> +}
> +
>  /* Add TYPE od ODR type hash.  */
>  
>  void
> Index: ipa-utils.h
> ===================================================================
> --- ipa-utils.h	(revision 264689)
> +++ ipa-utils.h	(working copy)
> @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
>  			  location_t loc2 = UNKNOWN_LOCATION);
>  bool odr_or_derived_type_p (const_tree t);
>  bool odr_types_equivalent_p (tree type1, tree type2);
> +tree prevailing_odr_type (tree t);
>  
>  /* Return vector containing possible targets of polymorphic call E.
>     If COMPLETEP is non-NULL, store true if the list is complete. 
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 264689)
> +++ lto/lto.c	(working copy)
> @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
>  static void
>  gimple_register_canonical_type (tree t)
>  {
> +  if (flag_checking)
> +    verify_type (t);
>    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
>        || !canonical_type_used_p (t))
>      return;
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c	(revision 264689)
> +++ lto-streamer-out.c	(working copy)
> @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
>  #include "debug.h"
>  #include "omp-offload.h"
>  #include "print-tree.h"
> +#include "attribs.h"
> +#include "ipa-utils.h"
>  
>  
>  static void lto_write_tree (struct output_block*, tree, bool);
> @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
>    free (ob);
>  }
>  
> +/* Do same comparsion as check_qualified_type skipping lang part of type
> +   and be more permissive about type names: we only care that names are
> +   same (for diagnostics) and that ODR names are the same.  */
>  
> -/* Look up NODE in the type table and write the index for it to OB.  */
> +static bool
> +types_equal_p (tree t, tree v)
> +{
> +  if (TYPE_QUALS (t) != TYPE_QUALS (v))
> +    return false;
> +
> +  if (TYPE_NAME (t) != TYPE_NAME (v)
> +      && (!TYPE_NAME (t) || !TYPE_NAME (v)
> +	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> +	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> +	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> +	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> +	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> +     return false;
> +
> +  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> +    return false;
> +
> +  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> +			     TYPE_ATTRIBUTES (v)))
> +     return false;
> +
> +  /* Do not replace complete type by incomplete.  */
> +  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> +    return false;
> +
> +  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> +    return false;
> +
> +  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> +
> +  /* For pointer types and array types we also care about the type they
> +     reffer to.  */
> +  if (TREE_TYPE (t))
> +    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> +
> +  return true;
> +}
> +
> +/* Search list of type variants and look up first one that looks same.
> +   At compile time, this removes duplicates of types created by front-ends.
> +   At WPA time we also merge duplicated ODR types.  */
> +
> +static tree
> +first_compatible_type_variant (tree t)
> +{
> +  tree first = NULL;
> +  if (POINTER_TYPE_P (t))
> +    {
> +      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> +      if (t2 != TREE_TYPE (t))
> +	{
> +	  if (TREE_CODE (t) == POINTER_TYPE)
> +	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> +						TYPE_REF_CAN_ALIAS_ALL (t));
> +	  else
> +	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> +						TYPE_REF_CAN_ALIAS_ALL (t));
> +	}
> +      else
> +	first = TYPE_MAIN_VARIANT (t);
> +      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> +    }
> +  else
> +    first = prevailing_odr_type (t);
> +
> +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> +
> +  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> +    if (types_equal_p (t, v))
> +      {
> +	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> +	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> +	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> +        return v;
> +      }
> +
> +  /* Most of the time we will find v itself in the list.  With ODR type merging
> +     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> +     have corresponding type in the list.  */
> +  tree v = build_variant_type_copy (first);
> +  TYPE_READONLY (v) = TYPE_READONLY (t);
> +  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> +  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> +  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> +  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> +  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> +  TYPE_NAME (v) = TYPE_NAME (t);
> +  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> +  TREE_TYPE (v) = TREE_TYPE (t);
> +  gcc_checking_assert (types_equal_p (t, v));
> +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> +  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> +  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> +  return v;
> +}
> +
> +
> +/* Look up NODE in the type table and write the index for it to OB.
> +   Eliminate unnecesary type variants.  */
>  
>  static void
>  output_type_ref (struct output_block *ob, tree node)
>  {
> +  bool existed_p;
> +  unsigned int &index
> +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> +       get_or_insert (node, &existed_p);
>    streamer_write_record_start (ob, LTO_type_ref);
> -  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> +
> +  if (existed_p)
> +    {
> +      streamer_write_uhwi_stream (ob->main_stream, index);
> +      return;
> +    }
> +
> +  tree node2 = first_compatible_type_variant (node);
> +
> +  /* If NODE is first variant, just add it into encoder.  */
> +  if (node2 == node)
> +    {
> +      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> +      if (streamer_dump_file)
> +	{
> +	  print_node_brief (streamer_dump_file,
> +			    "    Encoding indexable ", node, 4);
> +	  fprintf (streamer_dump_file, "  as %i\n", index);
> +	}
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> +      streamer_write_uhwi_stream (ob->main_stream, index);
> +      return;
> +    }
> +
> +  index = 0xdead;
> +
> +  if (streamer_dump_file)
> +    {
> +      print_node_brief (streamer_dump_file, "    Type ", node, 4);
> +      fprintf (streamer_dump_file, "  prevailed by ");
> +      print_node_brief (streamer_dump_file, "", node2, 4);
> +      fprintf (streamer_dump_file, "\n");
> +    }
> +
> +  /* See if we already assgined one to the first variant.  If that is the case
> +     only duplicate the entry in the hashtable so we do not need to call
> +     first_compatible_type_variant redundantly.  */
> +  unsigned int &index2
> +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> +	get_or_insert (node2, &existed_p);
> +  /* The reference may be changed by hash table expansion.  */
> +  unsigned int i = index2;
> +
> +  if (existed_p)
> +    {
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> +      streamer_write_uhwi_stream (ob->main_stream, i);
> +      return;
> +    }
> +  else
> +    {
> +      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> +      i = index2;
> +      if (streamer_dump_file)
> +	{
> +	  print_node_brief (streamer_dump_file,
> +			    "    Encoding indexable ", node2, 4);
> +	  fprintf (streamer_dump_file, "  as %i \n", i);
> +	}
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> +      /* We need to search again to watch hash table resize.  */
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> +      gcc_assert (existed_p);
> +      streamer_write_uhwi_stream (ob->main_stream, i);
> +    }
> +  return;
>  }
>  
>  
> @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
>  #define visit(SIBLING) \
>    do { \
>      unsigned ix; \
> -    if (!SIBLING) \
> +    tree t2 = SIBLING; \
> +    if (t2 && TYPE_P (t2)) \
> +      t2 = first_compatible_type_variant (t2); \
> +    if (!t2) \
>        hstate.add_int (0); \
> -    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> +    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
>        hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
>      else if (map) \
> -      hstate.add_int (*map->get (SIBLING)); \
> +      hstate.add_int (*map->get (t2)); \
>      else \
>        hstate.add_int (1); \
>    } while (0)
> @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
>    if (this_ref_p && tree_is_indexable (expr))
>      return;
>  
> +  /* Replace type by its prevailing variant.  */
> +  if (TYPE_P (expr))
> +    expr = first_compatible_type_variant (expr);
> +
>    /* Check if we already streamed EXPR.  */
>    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
>      return;
> @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
>        return;
>      }
>  
> +  if (TYPE_P (expr))
> +    expr = first_compatible_type_variant (expr);
> +
> +  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> +
>    existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
>    if (existed_p)
>      {
> @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
>        gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
>        encoder->trees.reserve_exact (n);
>        for (j = 0; j < n; j++)
> -	encoder->trees.safe_push ((*trees)[j]);
> +	{
> +	  tree t = (*trees)[j];
> +	  if (TYPE_P (t))
> +	    t = first_compatible_type_variant (t);
> +	  if (streamer_dump_file)
> +	    {
> +	      print_node_brief (streamer_dump_file,
> +				"  Adding reference to ", t, 4);
> +	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
> +	    }
> +	  encoder->trees.safe_push (t);
> +	}
>      }
>  
>    lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
>  	  print_node_brief (streamer_dump_file, "", t, 4);
>            fprintf (streamer_dump_file, "\n");
>  	}
> +      gcc_checking_assert (!TYPE_P (t)
> +			   || t == first_compatible_type_variant (t));
>        if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
>  	stream_write_tree (ob, t, false);
>      }
> @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
>        t = lto_tree_ref_encoder_get_tree (encoder, index);
>        streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
>        gcc_assert (slot_num != (unsigned)-1);
> +      gcc_checking_assert (!TYPE_P (t)
> +			   || t == first_compatible_type_variant (t));
>        data[index + 1] = slot_num;
>      }
>  
> Index: tree-ssa-alias.c
> ===================================================================
> --- tree-ssa-alias.c	(revision 264689)
> +++ tree-ssa-alias.c	(working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-dfa.h"
>  #include "ipa-reference.h"
>  #include "varasm.h"
> +#include "ipa-utils.h"
>  
>  /* Broad overview of how alias analysis on gimple works:
>  
> @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
>  	     ??? Bitfields can overlap at RTL level so punt on them.  */
>  	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>  	    return false;
> +	  /* In LTO we may end up merging different variants of ODR types
> +	     but keep corresponding field decls unmerged.  */
> +	  if (in_lto_p
> +	      && type_with_linkage_p (type1)
> +	      && DECL_NAME (field1) == DECL_NAME (field2))
> +	    return false;
>  	  return true;
>  	}
>      }
> @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
>  	{
>  	  /* We're left with accessing different fields of a structure,
>  	     no possible overlap.  */
> -	  if (fieldx != fieldy)
> +	  if (fieldx != fieldy
> +	      && (in_lto_p || !type_with_linkage_p (typex)
> +		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
>  	    {
>  	      /* A field and its representative need to be considered the
>  		 same.  */
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 264689)
> +++ tree.c	(working copy)
> @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
>  
>    /* Traverse every type found freeing its language data.  */
>    FOR_EACH_VEC_ELT (fld.types, i, t)
> -    free_lang_data_in_type (t);
> +    {
> +      free_lang_data_in_type (t);
> +      while (TYPE_NEXT_VARIANT (t)
> +	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> +	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> +    }
>    if (flag_checking)
>      {
>        FOR_EACH_VEC_ELT (fld.types, i, t)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)



More information about the Gcc-patches mailing list