This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix call graph partitioning in WHOPR
- From: Richard Guenther <rguenther at suse dot de>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org, dnovillo at google dot com
- Date: Tue, 20 Apr 2010 16:41:22 +0200 (CEST)
- Subject: Re: Fix call graph partitioning in WHOPR
- References: <20100420132915.GB6750@kam.mff.cuni.cz>
On Tue, 20 Apr 2010, Jan Hubicka wrote:
> Hi,
> this patch makes whopr call graph slicing to work well enough so I can buid GCC
> from SPEC2000 with -O3 -fwhole-program (i.e. with the cherry picking put on the
> sterods dragging all functions called once from unit to unit). Resulting
> binary works and is just few hundred bytes longer than -flto one with the
> corresponding IPA passes disabled. Resulting binary works, I did not benchmark
> it. So I believe that WHOR now works with inlining and whole program resonably
> wel.
>
> The current implementation seems confused. It is trying to make calls to
> functions from current partition to new partition to look external, inline
> functions picked from other units look extern inline and otherwise it does not
> care.
>
> First problem is that with -fwhole-program most functions are not externally
> visible and unreachable function removal performed by ltrans will promptly
> remove whole unit. This can be cured by making function called from other
> partitions externally visible, but such mocking of flags leads only to further
> problems. All this has just worked by accident because we partition based on
> source files and usually calls crossing source file are leading to externally
> visible functions anyway, at least without -fwhole-program.
>
> This patch makes cgraph code aware that it can be partitioned into ltrans unit
> by adding two new flags to cgraph node, in_other_partition and
> reachable_from_other_partition. The idea is to keep visibility flags as they
> are in WPA stage (so proper IPA passes don't need to be ready for fact that
> functions can completely change visibility from propagation to transformation)
> but make small ipa code running at ltrans aware that it don't necesarily need
> to see everything.
>
> At the moment the only such late small IPA pass is unreachable function removal
> (and the actual code in cgraphunit producing assembly) so it is easy job to do.
> I see that it adds need for extra check in some cases of small IPA passes, but
> I think it is unavoidable. Those functions accessed in parallel units are
> different anyway: for example we still want to use register passing conventions
> on them on x86 compilation.
>
> The patch thus removes flags mocking in lto-cgraph and all code dealing
> with forced_extern_inline. I also fixed way clones are output in lto-cgraph
> so they are reconstructed correctly (at mainline if clone appears in list before
> its original, the resulting callgraph is all wrong since we eventually overwrite
> the clone by the original). More fixes will be needed to make clone streaming
> work in general (i.e. for clones that are not inline clones and have different
> declaration than the former), but that should be handled incrementally.
>
> There is one side corner where ltrans can leak function body till end of
> compilation (and get ICE in code checking that all bodies are gone). Callgraph
> node removal is responsible for removing bodies when they are needed, but with
> inline clone in current partition, but offline clone reachbale from current
> partition won't end up in removal. After we remove all the inline clones, the
> presence of node in callgraph will keep body around. This needs to be cured by
> not mocing "analyzed" flag that we do currently. I will handle this later -
> there are more important issues to track for now.
>
> Bootstrapped/regtested x86_64-linux. OK?
Ok.
Thanks,
Richard.
> * cgraph.c (cgraph_remove_node): Kill bodies in other partitoin.
> (dump_cgraph_node): Dump new flags.
> * cgraph.h (struct cgraph_node): Add flags reachable_from_other_partition
> and in_other_partition.
> (cgraph_can_remove_if_no_direct_calls_p): Functions used by other partition
> can not be removed.
> * cgraphunit.c (cgraph_mark_functions_to_output): Functions used by the other
> partition must be output; silence sanity checking on leaking functions
> bodies from other paritition.
> * lto-cgraph.c (reachable_from_other_partition_p): New function.
> (lto_output_node): Output new flags; do not sanity check that inline
> clones are output; drop lto_forced_extern_inline_p code; do not mock
> visibility flags at partition boundaries.
> (add_node_to): New function.
> (output_cgraph): Use it to sort functions so masters appear before
> clones.
> (input_overwrite_node): Input new flags.
> * passes.c (ipa_write_summaries): Do not call
> lto_new_extern_inline_states.
> * lto-section-out.c (forced_extern_inline, lto_new_extern_inline_states,
> lto_delete_extern_inline_states, lto_force_functions_extern_inline,
> lto_forced_extern_inline_p): Kill.
> * lto-streamer.h (lto_new_extern_inline_states,
> * lto_delete_extern_inline_states, lto_force_functions_extern_inline,
> lto_forced_extern_inline_p): Kill.
>
> * lto.c (lto_add_inline_clones): Do not track inlined_decls.
> (lto_add_all_inlinees): Likewise.
> (lto_wpa_write_files): Likewise.
>
>
> Index: cgraph.c
> ===================================================================
> --- cgraph.c (revision 158542)
> +++ cgraph.c (working copy)
> @@ -1467,7 +1467,8 @@ cgraph_remove_node (struct cgraph_node *
> struct cgraph_node *n = (struct cgraph_node *) *slot;
> if (!n->clones && !n->clone_of && !n->global.inlined_to
> && (cgraph_global_info_ready
> - && (TREE_ASM_WRITTEN (n->decl) || DECL_EXTERNAL (n->decl))))
> + && (TREE_ASM_WRITTEN (n->decl) || DECL_EXTERNAL (n->decl)
> + || n->in_other_partition)))
> kill_body = true;
> }
> if (assembler_name_hash)
> @@ -1639,6 +1640,10 @@ dump_cgraph_node (FILE *f, struct cgraph
> if (cgraph_function_flags_ready)
> fprintf (f, " availability:%s",
> cgraph_availability_names [cgraph_function_body_availability (node)]);
> + if (node->analyzed)
> + fprintf (f, " analyzed");
> + if (node->in_other_partition)
> + fprintf (f, " in_other_partition");
> if (node->count)
> fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
> (HOST_WIDEST_INT)node->count);
> @@ -1666,6 +1671,8 @@ dump_cgraph_node (FILE *f, struct cgraph
> fprintf (f, " address_taken");
> else if (node->reachable)
> fprintf (f, " reachable");
> + else if (node->reachable_from_other_partition)
> + fprintf (f, " reachable_from_other_partition");
> if (gimple_has_body_p (node->decl))
> fprintf (f, " body");
> if (node->process)
> Index: cgraph.h
> ===================================================================
> --- cgraph.h (revision 158542)
> +++ cgraph.h (working copy)
> @@ -249,11 +249,15 @@ struct GTY((chain_next ("%h.next"), chai
> cgraph_remove_unreachable_nodes cgraph still can contain unreachable
> nodes when they are needed for virtual clone instantiation. */
> unsigned reachable : 1;
> + /* Set when function is reachable by call from other LTRANS partition. */
> + unsigned reachable_from_other_partition : 1;
> /* Set once the function is lowered (i.e. its CFG is built). */
> unsigned lowered : 1;
> /* Set once the function has been instantiated and its callee
> lists created. */
> unsigned analyzed : 1;
> + /* Set when function is available in the other LTO partition. */
> + unsigned in_other_partition : 1;
> /* Set when function is scheduled to be processed by local passes. */
> unsigned process : 1;
> /* Set for aliases once they got through assemble_alias. */
> @@ -723,7 +727,7 @@ cgraph_only_called_directly_p (struct cg
> static inline bool
> cgraph_can_remove_if_no_direct_calls_p (struct cgraph_node *node)
> {
> - return (!node->needed
> + return (!node->needed && !node->reachable_from_other_partition
> && (DECL_COMDAT (node->decl) || !node->local.externally_visible));
> }
>
> Index: cgraphunit.c
> ===================================================================
> --- cgraphunit.c (revision 158542)
> +++ cgraphunit.c (working copy)
> @@ -1130,7 +1130,7 @@ cgraph_mark_functions_to_output (void)
> outside the current compilation unit. */
> if (node->analyzed
> && !node->global.inlined_to
> - && (node->needed
> + && (node->needed || node->reachable_from_other_partition
> || (e && node->reachable))
> && !TREE_ASM_WRITTEN (decl)
> && !DECL_EXTERNAL (decl))
> @@ -1157,6 +1157,10 @@ cgraph_mark_functions_to_output (void)
> #ifdef ENABLE_CHECKING
> if (!node->global.inlined_to
> && gimple_has_body_p (decl)
> + /* FIXME: in ltrans unit when offline copy is outside partition but inline copies
> + are inside partition, we can end up not removing the body since we no longer
> + have analyzed node pointing to it. */
> + && !node->in_other_partition
> && !DECL_EXTERNAL (decl))
> {
> dump_cgraph_node (stderr, node);
> @@ -1165,6 +1169,7 @@ cgraph_mark_functions_to_output (void)
> #endif
> gcc_assert (node->global.inlined_to
> || !gimple_has_body_p (decl)
> + || node->in_other_partition
> || DECL_EXTERNAL (decl));
>
> }
> @@ -1178,6 +1183,10 @@ cgraph_mark_functions_to_output (void)
> tree decl = node->decl;
> if (!node->global.inlined_to
> && gimple_has_body_p (decl)
> + /* FIXME: in ltrans unit when offline copy is outside partition but inline copies
> + are inside partition, we can end up not removing the body since we no longer
> + have analyzed node pointing to it. */
> + && !node->in_other_partition
> && !DECL_EXTERNAL (decl))
> {
> dump_cgraph_node (stderr, node);
> Index: lto-cgraph.c
> ===================================================================
> --- lto-cgraph.c (revision 158542)
> +++ lto-cgraph.c (working copy)
> @@ -164,6 +164,23 @@ lto_output_edge (struct lto_simple_outpu
> bitpack_delete (bp);
> }
>
> +/* Return true when node is reachable from other partition. */
> +
> +static bool
> +reachable_from_other_partition_p (struct cgraph_node *node, cgraph_node_set set)
> +{
> + struct cgraph_edge *e;
> + if (node->needed)
> + return true;
> + if (!node->analyzed)
> + return false;
> + if (node->global.inlined_to)
> + return false;
> + for (e = node->callers; e; e = e->next_caller)
> + if (!cgraph_node_in_set_p (e->caller, set))
> + return true;
> + return false;
> +}
>
> /* Output the cgraph NODE to OB. ENCODER is used to find the
> reference number of NODE->inlined_to. SET is the set of nodes we
> @@ -183,6 +200,7 @@ lto_output_node (struct lto_simple_outpu
> unsigned local, externally_visible, inlinable, analyzed;
> bool boundary_p, wrote_decl_p;
> intptr_t ref;
> + bool in_other_partition = false;
>
> boundary_p = !cgraph_node_in_set_p (node, set);
> wrote_decl_p = bitmap_bit_p (written_decls, DECL_UID (node->decl));
> @@ -228,19 +246,17 @@ lto_output_node (struct lto_simple_outpu
> local static nodes to prevent clashes with other local statics. */
> if (boundary_p)
> {
> - /* Inline clones can not be part of boundary. */
> - gcc_assert (!node->global.inlined_to);
> - local = 0;
> - externally_visible = 1;
> - inlinable = 0;
> + /* Inline clones can not be part of boundary.
> + gcc_assert (!node->global.inlined_to);
> +
> + FIXME: At the moment they can be, when partition contains an inline
> + clone that is clone of inline clone from outside partition. We can
> + reshape the clone tree and make other tree to be the root, but it
> + needs a bit extra work and will be promplty done by cgraph_remove_node
> + after reading back. */
> + in_other_partition = 1;
> analyzed = 0;
> }
> - else if (lto_forced_extern_inline_p (node->decl))
> - {
> - local = 1;
> - externally_visible = 0;
> - inlinable = 1;
> - }
>
> lto_output_uleb128_stream (ob->main_stream, wrote_decl_p);
>
> @@ -263,8 +279,10 @@ lto_output_node (struct lto_simple_outpu
> bp_pack_value (bp, node->address_taken, 1);
> bp_pack_value (bp, node->abstract_and_needed, 1);
> bp_pack_value (bp, node->reachable, 1);
> + bp_pack_value (bp, analyzed && reachable_from_other_partition_p (node, set), 1);
> bp_pack_value (bp, node->lowered, 1);
> bp_pack_value (bp, analyzed, 1);
> + bp_pack_value (bp, in_other_partition, 1);
> bp_pack_value (bp, node->process, 1);
> bp_pack_value (bp, node->alias, 1);
> bp_pack_value (bp, node->finalized_by_frontend, 1);
> @@ -371,6 +389,15 @@ output_profile_summary (struct lto_simpl
> lto_output_uleb128_stream (ob->main_stream, 0);
> }
>
> +/* Add NODE into encoder as well as nodes it is cloned from.
> + Do it in a way so clones appear first. */
> +static void
> +add_node_to (lto_cgraph_encoder_t encoder, struct cgraph_node *node)
> +{
> + if (node->clone_of)
> + add_node_to (encoder, node->clone_of);
> + lto_cgraph_encoder_encode (encoder, node);
> +}
>
> /* Output the part of the cgraph in SET. */
>
> @@ -404,7 +431,7 @@ output_cgraph (cgraph_node_set set)
> for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
> {
> node = csi_node (csi);
> - lto_cgraph_encoder_encode (encoder, node);
> + add_node_to (encoder, node);
> }
>
> /* Go over all the nodes again to include callees that are not in
> @@ -419,7 +446,7 @@ output_cgraph (cgraph_node_set set)
> {
> /* We should have moved all the inlines. */
> gcc_assert (!callee->global.inlined_to);
> - lto_cgraph_encoder_encode (encoder, callee);
> + add_node_to (encoder, callee);
> /* Also with each included function include all other functions
> in the same comdat group. */
> if (callee->same_comdat_group)
> @@ -429,7 +456,7 @@ output_cgraph (cgraph_node_set set)
> next != callee;
> next = next->same_comdat_group)
> if (!cgraph_node_in_set_p (next, set))
> - lto_cgraph_encoder_encode (encoder, next);
> + add_node_to (encoder, next);
> }
> }
> }
> @@ -442,11 +469,13 @@ output_cgraph (cgraph_node_set set)
> next != node;
> next = next->same_comdat_group)
> if (!cgraph_node_in_set_p (next, set))
> - lto_cgraph_encoder_encode (encoder, next);
> + add_node_to (encoder, next);
> }
> }
>
> - /* Write out the nodes. */
> + /* Write out the nodes. We must first output a node and then its clones,
> + otherwise at a time reading back the node there would be nothing to clone
> + from. */
> n_nodes = lto_cgraph_encoder_size (encoder);
> for (i = 0; i < n_nodes; i++)
> {
> @@ -530,8 +559,10 @@ input_overwrite_node (struct lto_file_de
> node->address_taken = bp_unpack_value (bp, 1);
> node->abstract_and_needed = bp_unpack_value (bp, 1);
> node->reachable = bp_unpack_value (bp, 1);
> + node->reachable_from_other_partition = bp_unpack_value (bp, 1);
> node->lowered = bp_unpack_value (bp, 1);
> node->analyzed = bp_unpack_value (bp, 1);
> + node->in_other_partition = bp_unpack_value (bp, 1);
> node->process = bp_unpack_value (bp, 1);
> node->alias = bp_unpack_value (bp, 1);
> node->finalized_by_frontend = bp_unpack_value (bp, 1);
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c (revision 158542)
> +++ lto/lto.c (working copy)
> @@ -593,22 +593,19 @@ finish:
>
> static void
> lto_add_inline_clones (cgraph_node_set set, struct cgraph_node *node,
> - bitmap original_decls, bitmap inlined_decls)
> + bitmap original_decls)
> {
> struct cgraph_node *callee;
> struct cgraph_edge *edge;
>
> cgraph_node_set_add (set, node);
>
> - if (!bitmap_bit_p (original_decls, DECL_UID (node->decl)))
> - bitmap_set_bit (inlined_decls, DECL_UID (node->decl));
> -
> /* Check to see if NODE has any inlined callee. */
> for (edge = node->callees; edge != NULL; edge = edge->next_callee)
> {
> callee = edge->callee;
> if (callee->global.inlined_to != NULL)
> - lto_add_inline_clones (set, callee, original_decls, inlined_decls);
> + lto_add_inline_clones (set, callee, original_decls);
> }
> }
>
> @@ -616,14 +613,13 @@ lto_add_inline_clones (cgraph_node_set s
> information in the callgraph. Returns a bitmap of decls that have
> been inlined into SET indexed by UID. */
>
> -static bitmap
> +static void
> lto_add_all_inlinees (cgraph_node_set set)
> {
> cgraph_node_set_iterator csi;
> struct cgraph_node *node;
> bitmap original_nodes = lto_bitmap_alloc ();
> bitmap original_decls = lto_bitmap_alloc ();
> - bitmap inlined_decls = lto_bitmap_alloc ();
> bool changed;
>
> /* We are going to iterate SET while adding to it, mark all original
> @@ -663,19 +659,17 @@ lto_add_all_inlinees (cgraph_node_set se
> }
> while (changed);
>
> - /* Transitively add to SET all the inline clones for every node that
> - has been inlined. */
> - for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
> - {
> - node = csi_node (csi);
> - if (bitmap_bit_p (original_nodes, node->uid))
> - lto_add_inline_clones (set, node, original_decls, inlined_decls);
> - }
> + /* Transitively add to SET all the inline clones for every node that
> + has been inlined. */
> + for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
> + {
> + node = csi_node (csi);
> + if (bitmap_bit_p (original_nodes, node->uid))
> + lto_add_inline_clones (set, node, original_decls);
> + }
>
> lto_bitmap_free (original_nodes);
> lto_bitmap_free (original_decls);
> -
> - return inlined_decls;
> }
>
> /* Owing to inlining, we may need to promote a file-scope variable
> @@ -1004,8 +998,6 @@ lto_wpa_write_files (void)
> unsigned i, n_sets, last_out_file_ix, num_out_files;
> lto_file *file;
> cgraph_node_set set;
> - bitmap decls;
> - VEC(bitmap,heap) *inlined_decls = NULL;
>
> timevar_push (TV_WHOPR_WPA);
>
> @@ -1015,8 +1007,7 @@ lto_wpa_write_files (void)
> compiled by LTRANS. */
> for (i = 0; VEC_iterate (cgraph_node_set, lto_cgraph_node_sets, i, set); i++)
> {
> - decls = lto_add_all_inlinees (set);
> - VEC_safe_push (bitmap, heap, inlined_decls, decls);
> + lto_add_all_inlinees (set);
> lto_stats.num_output_cgraph_nodes += VEC_length (cgraph_node_ptr,
> set->nodes);
> }
> @@ -1053,13 +1044,8 @@ lto_wpa_write_files (void)
> fatal_error ("lto_elf_file_open() failed");
>
> lto_set_current_out_file (file);
> - lto_new_extern_inline_states ();
> -
> - decls = VEC_index (bitmap, inlined_decls, i);
> - lto_force_functions_extern_inline (decls);
>
> ipa_write_summaries_of_cgraph_node_set (set);
> - lto_delete_extern_inline_states ();
>
> lto_set_current_out_file (NULL);
> lto_elf_file_close (file);
> @@ -1072,10 +1058,6 @@ lto_wpa_write_files (void)
>
> output_files[last_out_file_ix] = NULL;
>
> - for (i = 0; VEC_iterate (bitmap, inlined_decls, i, decls); i++)
> - lto_bitmap_free (decls);
> - VEC_free (bitmap, heap, inlined_decls);
> -
> timevar_pop (TV_WHOPR_WPA_IO);
>
> return output_files;
> Index: passes.c
> ===================================================================
> --- passes.c (revision 158542)
> +++ passes.c (working copy)
> @@ -1695,7 +1695,6 @@ ipa_write_summaries (void)
> if (!flag_generate_lto || errorcount || sorrycount)
> return;
>
> - lto_new_extern_inline_states ();
> set = cgraph_node_set_new ();
>
> /* Create the callgraph set in the same order used in
> @@ -1726,7 +1725,6 @@ ipa_write_summaries (void)
> }
>
> ipa_write_summaries_1 (set);
> - lto_delete_extern_inline_states ();
>
> free (order);
> ggc_free (set);
> Index: lto-section-out.c
> ===================================================================
> --- lto-section-out.c (revision 158542)
> +++ lto-section-out.c (working copy)
> @@ -50,48 +50,6 @@ static VEC(lto_out_decl_state_ptr, heap)
> generate the decl directory later. */
>
> VEC(lto_out_decl_state_ptr, heap) *lto_function_decl_states;
> -
> -/* Bitmap indexed by DECL_UID to indicate if a function needs to be
> - forced extern inline. */
> -static bitmap forced_extern_inline;
> -
> -/* Initialize states for determining which function decls to be ouput
> - as extern inline, regardless of the decls' own attributes. */
> -
> -void
> -lto_new_extern_inline_states (void)
> -{
> - forced_extern_inline = lto_bitmap_alloc ();
> -}
> -
> -/* Releasing resources use for states to determine which function decls
> - to be ouput as extern inline */
> -
> -void
> -lto_delete_extern_inline_states (void)
> -{
> - lto_bitmap_free (forced_extern_inline);
> - forced_extern_inline = NULL;
> -}
> -
> -/* Force all the functions in DECLS to be output as extern inline.
> - DECLS is a bitmap indexed by DECL_UID. */
> -
> -void
> -lto_force_functions_extern_inline (bitmap decls)
> -{
> - bitmap_ior_into (forced_extern_inline, decls);
> -}
> -
> -/* Return true if FN_DECL is a function which should be emitted as
> - extern inline. */
> -
> -bool
> -lto_forced_extern_inline_p (tree fn_decl)
> -{
> - return bitmap_bit_p (forced_extern_inline, DECL_UID (fn_decl));
> -}
> -
> /* Returns a hash code for P. */
>
> hashval_t
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h (revision 158542)
> +++ lto-streamer.h (working copy)
> @@ -770,10 +770,6 @@ extern void lto_push_out_decl_state (str
> extern struct lto_out_decl_state *lto_pop_out_decl_state (void);
> extern void lto_record_function_out_decl_state (tree,
> struct lto_out_decl_state *);
> -extern void lto_new_extern_inline_states (void);
> -extern void lto_delete_extern_inline_states (void);
> -extern void lto_force_functions_extern_inline (bitmap decls);
> -extern bool lto_forced_extern_inline_p (tree fn_decl);
>
>
> /* In lto-streamer.c. */
>
>
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex