RFC: variant and ODR based type merging during LTO streaming
Richard Biener
rguenther@suse.de
Mon Oct 1 10:08:00 GMT 2018
On Fri, 28 Sep 2018, Jan Hubicka wrote:
> Hi,
> this is a proof-of-concept patch for type merging during LTO streaming. It
> does two things
> 1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list
> This is useful at compilation time because frontends produce more variants
> than needed. The effect of this is about 2% of decl stream
> 2) replace ODR types by their prevailing variant if ODR violation was not
> detected. This is useful at WPA time and saves about 50% of global decl
> stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.
>
> Here are stats of number of nodes hitting global stream during WPA->ltrans
> streaming of firefox:
>
> Before after
> 7354 7280 namespace_decl
> 46593 14084 union_type
> 18496 15405 translation_unit_decl
> 22816 21548 integer_type
> 107047 31681 enumeral_type
> 67072 46390 array_type
> 541220 99572 pointer_plus_expr
> 542360 100712 addr_expr
> 167657 154019 var_decl
> 864769 182691 tree_binfo
> 240200 206783 reference_type
> 410403 316877 function_type
> 1862985 522524 type_decl
> 1954864 652664 record_type
> 1070333 880209 method_type
> 1582055 1014270 pointer_type
> 1495367 1406670 tree_list
> 7926385 1483545 field_decl
> 1715133 1725612 function_decl
> 3384810 3191406 identifier_node
>
> So largest savings are in field_decls (to 18% and they are not even merged
> fully by the patch), pointer tpes (to 64%), record types (to 30%) and type
> decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to
> reffer to vtables to 21%). Patch bootstraps, regtestes and seems to
> work reliably.
>
> The merging is currently done during streaming by simply looking up prevailing
> type and populating the cache (so we do not get too many repeated lookups).
> Similar tricks can be added to checksum calculation.
>
> I am not sure this is best possible place, especially for 2) since it provides
> really quite major reduction in the IL size. I am quite sure we do not want
> to do that at stream in time in parallel to SCC merging because it affects the
> SCC components and also we do not want to merge this way types containing ODR
> violations as QOI issue. (merging two completely different types would lead
> to ICEs but merging two TBAA incompatible types is probably equally bad).
>
> So we need to do following
> a) read in the the types & do tree mering
> b) populate odr type hash, do ODR violation checks and decide on what types
> are consistent
> c) do the ODR based merging sometime before the trees hit ltrans stream.
>
> I was thinking that perhaps c) can also be added to mentions_vars_p machinery
> which would make it somewhat more costy but we could free those duplicates and
> save some more memory during WPA. I am also not usre if longer term we
> would not want to have mode that bypasses WPA->ltrans stream completely and
> just compile out of WPA process in threads or forks.
> Drawback is that building vector recording every single type pointer is going
> to also consume quite some memory.
>
> Patch is also not complete because it does not merge field decls. Analogous
> tricks can be added to streamer, but I just tought I would like to discuss
> alternative implementations first.
>
> WPA compile time improves by about 11%, so merging benefits overcome the extra
> complexity in sreamer.
>
> I hope that with this patch I will be able to increase default number of partitions
> since the global stream is getting less percentage of the ltrans files. 33%
> is still quite a lot, but far better than what we had previously. Hopefully
> we could still dramatically cut this down in followup.
>
> Also obvious drawback is that this helps to C++ only. I looked into stats from
> C programs and also Toon's stats on Fortran and I tend to believe that C++
> is really much worse than those two.
The ODR savings really look good but as you say the implementation is
somewhat "tricky".
I'd like to see the type-variant done separately (of course) and
also differently. This is because when enabling
free-lang-data by default we should be able to get benefits for
non-LTO compilations as well. Specifically I'd like us to
(in free-lang-data) "canonicalize" existing variants applying the
rules you derived for the lazier compare. At the point you then
drop non-fld-walked variants we could weed out the resulting
duplicates, keeping only a prevailing one in the list and (ab-)using
GC to fixup references. Like with ggc_register_fixup (void *from, void
*to) which would when following edges at mark time, adjust them
according to this map (I'm not sure if we'd want to do it manually
and record a vector of possibly affected TREE_TYPE uses during the fld
walk). We could also (easily?) optimize the streaming itself by
streaming only differences to the main variant of a type (but then
we could change our tree data structure to make that trivial).
I wonder how much "ODR" merging we'd get for free when we canonicalize
types some more - that is, how do ODR equal but tree unequal types
usually differ? I guess it might be most of the time fields with
pointer to incomplete vs. complete type?
Thanks,
Richard.
> Honza
>
> Index: ipa-devirt.c
> ===================================================================
> --- ipa-devirt.c (revision 264689)
> +++ ipa-devirt.c (working copy)
> @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
> return val;
> }
>
> +/* Return the main variant of the odr type. This is used for straming out
> + to reduce number of type duplicates hitting the WPA->LTRANS streams.
> + Do not do so when ODR violation was detected since the type may be
> + structurally different then. */
> +
> +tree
> +prevailing_odr_type (tree t)
> +{
> + t = TYPE_MAIN_VARIANT (t);
> + /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> + We can not merge these because this does not honnor precision and
> + signedness. */
> + if (!type_with_linkage_p (t)
> + || type_in_anonymous_namespace_p (t)
> + || TREE_CODE (t) == INTEGER_TYPE
> + || !COMPLETE_TYPE_P (t))
> + return t;
> + odr_type ot = get_odr_type (t, true);
> + if (!ot || !ot->odr_violated)
> + return ot->type;
> + return t;
> +}
> +
> /* Add TYPE od ODR type hash. */
>
> void
> Index: ipa-utils.h
> ===================================================================
> --- ipa-utils.h (revision 264689)
> +++ ipa-utils.h (working copy)
> @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
> location_t loc2 = UNKNOWN_LOCATION);
> bool odr_or_derived_type_p (const_tree t);
> bool odr_types_equivalent_p (tree type1, tree type2);
> +tree prevailing_odr_type (tree t);
>
> /* Return vector containing possible targets of polymorphic call E.
> If COMPLETEP is non-NULL, store true if the list is complete.
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c (revision 264689)
> +++ lto/lto.c (working copy)
> @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
> static void
> gimple_register_canonical_type (tree t)
> {
> + if (flag_checking)
> + verify_type (t);
> if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
> || !canonical_type_used_p (t))
> return;
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c (revision 264689)
> +++ lto-streamer-out.c (working copy)
> @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
> #include "debug.h"
> #include "omp-offload.h"
> #include "print-tree.h"
> +#include "attribs.h"
> +#include "ipa-utils.h"
>
>
> static void lto_write_tree (struct output_block*, tree, bool);
> @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
> free (ob);
> }
>
> +/* Do same comparsion as check_qualified_type skipping lang part of type
> + and be more permissive about type names: we only care that names are
> + same (for diagnostics) and that ODR names are the same. */
>
> -/* Look up NODE in the type table and write the index for it to OB. */
> +static bool
> +types_equal_p (tree t, tree v)
> +{
> + if (TYPE_QUALS (t) != TYPE_QUALS (v))
> + return false;
> +
> + if (TYPE_NAME (t) != TYPE_NAME (v)
> + && (!TYPE_NAME (t) || !TYPE_NAME (v)
> + || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> + || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> + || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> + != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> + || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> + return false;
> +
> + if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> + return false;
> +
> + if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> + TYPE_ATTRIBUTES (v)))
> + return false;
> +
> + /* Do not replace complete type by incomplete. */
> + if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> + return false;
> +
> + if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> + return false;
> +
> + gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> +
> + /* For pointer types and array types we also care about the type they
> + reffer to. */
> + if (TREE_TYPE (t))
> + return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> +
> + return true;
> +}
> +
> +/* Search list of type variants and look up first one that looks same.
> + At compile time, this removes duplicates of types created by front-ends.
> + At WPA time we also merge duplicated ODR types. */
> +
> +static tree
> +first_compatible_type_variant (tree t)
> +{
> + tree first = NULL;
> + if (POINTER_TYPE_P (t))
> + {
> + tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> + if (t2 != TREE_TYPE (t))
> + {
> + if (TREE_CODE (t) == POINTER_TYPE)
> + first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> + TYPE_REF_CAN_ALIAS_ALL (t));
> + else
> + first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> + TYPE_REF_CAN_ALIAS_ALL (t));
> + }
> + else
> + first = TYPE_MAIN_VARIANT (t);
> + gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> + }
> + else
> + first = prevailing_odr_type (t);
> +
> + gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> +
> + for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> + if (types_equal_p (t, v))
> + {
> + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> + gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> + return v;
> + }
> +
> + /* Most of the time we will find v itself in the list. With ODR type merging
> + we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> + have corresponding type in the list. */
> + tree v = build_variant_type_copy (first);
> + TYPE_READONLY (v) = TYPE_READONLY (t);
> + TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> + TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> + TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> + TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> + SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> + TYPE_NAME (v) = TYPE_NAME (t);
> + TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> + TREE_TYPE (v) = TREE_TYPE (t);
> + gcc_checking_assert (types_equal_p (t, v));
> + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> + gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> + return v;
> +}
> +
> +
> +/* Look up NODE in the type table and write the index for it to OB.
> + Eliminate unnecesary type variants. */
>
> static void
> output_type_ref (struct output_block *ob, tree node)
> {
> + bool existed_p;
> + unsigned int &index
> + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> + get_or_insert (node, &existed_p);
> streamer_write_record_start (ob, LTO_type_ref);
> - lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> +
> + if (existed_p)
> + {
> + streamer_write_uhwi_stream (ob->main_stream, index);
> + return;
> + }
> +
> + tree node2 = first_compatible_type_variant (node);
> +
> + /* If NODE is first variant, just add it into encoder. */
> + if (node2 == node)
> + {
> + index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> + if (streamer_dump_file)
> + {
> + print_node_brief (streamer_dump_file,
> + " Encoding indexable ", node, 4);
> + fprintf (streamer_dump_file, " as %i\n", index);
> + }
> + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> + streamer_write_uhwi_stream (ob->main_stream, index);
> + return;
> + }
> +
> + index = 0xdead;
> +
> + if (streamer_dump_file)
> + {
> + print_node_brief (streamer_dump_file, " Type ", node, 4);
> + fprintf (streamer_dump_file, " prevailed by ");
> + print_node_brief (streamer_dump_file, "", node2, 4);
> + fprintf (streamer_dump_file, "\n");
> + }
> +
> + /* See if we already assgined one to the first variant. If that is the case
> + only duplicate the entry in the hashtable so we do not need to call
> + first_compatible_type_variant redundantly. */
> + unsigned int &index2
> + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> + get_or_insert (node2, &existed_p);
> + /* The reference may be changed by hash table expansion. */
> + unsigned int i = index2;
> +
> + if (existed_p)
> + {
> + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> + tree_hash_table->get_or_insert (node, &existed_p) = i;
> + streamer_write_uhwi_stream (ob->main_stream, i);
> + return;
> + }
> + else
> + {
> + index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> + i = index2;
> + if (streamer_dump_file)
> + {
> + print_node_brief (streamer_dump_file,
> + " Encoding indexable ", node2, 4);
> + fprintf (streamer_dump_file, " as %i \n", i);
> + }
> + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> + /* We need to search again to watch hash table resize. */
> + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> + tree_hash_table->get_or_insert (node, &existed_p) = i;
> + gcc_assert (existed_p);
> + streamer_write_uhwi_stream (ob->main_stream, i);
> + }
> + return;
> }
>
>
> @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
> #define visit(SIBLING) \
> do { \
> unsigned ix; \
> - if (!SIBLING) \
> + tree t2 = SIBLING; \
> + if (t2 && TYPE_P (t2)) \
> + t2 = first_compatible_type_variant (t2); \
> + if (!t2) \
> hstate.add_int (0); \
> - else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> + else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
> hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
> else if (map) \
> - hstate.add_int (*map->get (SIBLING)); \
> + hstate.add_int (*map->get (t2)); \
> else \
> hstate.add_int (1); \
> } while (0)
> @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
> if (this_ref_p && tree_is_indexable (expr))
> return;
>
> + /* Replace type by its prevailing variant. */
> + if (TYPE_P (expr))
> + expr = first_compatible_type_variant (expr);
> +
> /* Check if we already streamed EXPR. */
> if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> return;
> @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
> return;
> }
>
> + if (TYPE_P (expr))
> + expr = first_compatible_type_variant (expr);
> +
> + gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> +
> existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> if (existed_p)
> {
> @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
> gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
> encoder->trees.reserve_exact (n);
> for (j = 0; j < n; j++)
> - encoder->trees.safe_push ((*trees)[j]);
> + {
> + tree t = (*trees)[j];
> + if (TYPE_P (t))
> + t = first_compatible_type_variant (t);
> + if (streamer_dump_file)
> + {
> + print_node_brief (streamer_dump_file,
> + " Adding reference to ", t, 4);
> + fprintf (streamer_dump_file, " as %i \n", (int)j);
> + }
> + encoder->trees.safe_push (t);
> + }
> }
>
> lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
> print_node_brief (streamer_dump_file, "", t, 4);
> fprintf (streamer_dump_file, "\n");
> }
> + gcc_checking_assert (!TYPE_P (t)
> + || t == first_compatible_type_variant (t));
> if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
> stream_write_tree (ob, t, false);
> }
> @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
> t = lto_tree_ref_encoder_get_tree (encoder, index);
> streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
> gcc_assert (slot_num != (unsigned)-1);
> + gcc_checking_assert (!TYPE_P (t)
> + || t == first_compatible_type_variant (t));
> data[index + 1] = slot_num;
> }
>
> Index: tree-ssa-alias.c
> ===================================================================
> --- tree-ssa-alias.c (revision 264689)
> +++ tree-ssa-alias.c (working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
> #include "tree-dfa.h"
> #include "ipa-reference.h"
> #include "varasm.h"
> +#include "ipa-utils.h"
>
> /* Broad overview of how alias analysis on gimple works:
>
> @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
> ??? Bitfields can overlap at RTL level so punt on them. */
> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
> return false;
> + /* In LTO we may end up merging different variants of ODR types
> + but keep corresponding field decls unmerged. */
> + if (in_lto_p
> + && type_with_linkage_p (type1)
> + && DECL_NAME (field1) == DECL_NAME (field2))
> + return false;
> return true;
> }
> }
> @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
> {
> /* We're left with accessing different fields of a structure,
> no possible overlap. */
> - if (fieldx != fieldy)
> + if (fieldx != fieldy
> + && (in_lto_p || !type_with_linkage_p (typex)
> + || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
> {
> /* A field and its representative need to be considered the
> same. */
> Index: tree.c
> ===================================================================
> --- tree.c (revision 264689)
> +++ tree.c (working copy)
> @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
>
> /* Traverse every type found freeing its language data. */
> FOR_EACH_VEC_ELT (fld.types, i, t)
> - free_lang_data_in_type (t);
> + {
> + free_lang_data_in_type (t);
> + while (TYPE_NEXT_VARIANT (t)
> + && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> + TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> + }
> if (flag_checking)
> {
> FOR_EACH_VEC_ELT (fld.types, i, t)
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
More information about the Gcc-patches
mailing list