This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix call graph partitioning in WHOPR


On Tue, 20 Apr 2010, Jan Hubicka wrote:

> Hi,
> this patch makes whopr call graph slicing to work well enough so I can buid GCC
> from SPEC2000 with -O3 -fwhole-program (i.e. with the cherry picking put on the
> sterods dragging all functions called once from unit to unit).  Resulting
> binary works and is just few hundred bytes longer than -flto one with the
> corresponding IPA passes disabled.  Resulting binary works, I did not benchmark
> it.  So I believe that WHOR now works with inlining and whole program resonably
> wel.
> 
> The current implementation seems confused.  It is trying to make calls to
> functions from current partition to new partition to look external, inline
> functions picked from other units look extern inline and otherwise it does not
> care.
> 
> First problem is that with -fwhole-program most functions are not externally
> visible and unreachable function removal performed by ltrans will promptly
> remove whole unit.  This can be cured by making function called from other
> partitions externally visible, but such mocking of flags leads only to further
> problems.  All this has just worked by accident because we partition based on
> source files and usually calls crossing source file are leading to externally
> visible functions anyway, at least without -fwhole-program.
> 
> This patch makes cgraph code aware that it can be partitioned into ltrans unit
> by adding two new flags to cgraph node, in_other_partition and
> reachable_from_other_partition. The idea is to keep visibility flags as they
> are in WPA stage (so proper IPA passes don't need to be ready for fact that
> functions can completely change visibility from propagation to transformation)
> but make small ipa code running at ltrans aware that it don't necesarily need
> to see everything.
> 
> At the moment the only such late small IPA pass is unreachable function removal
> (and the actual code in cgraphunit producing assembly) so it is easy job to do.
> I see that it adds need for extra check in some cases of small IPA passes, but
> I think it is unavoidable.  Those functions accessed in parallel units are
> different anyway: for example we still want to use register passing conventions
> on them on x86 compilation.
> 
> The patch thus removes flags mocking in lto-cgraph and all code dealing
> with forced_extern_inline.  I also fixed way clones are output in lto-cgraph
> so they are reconstructed correctly (at mainline if clone appears in list before
> its original, the resulting callgraph is all wrong since we eventually overwrite
> the clone by the original).  More fixes will be needed to make clone streaming
> work in general (i.e. for clones that are not inline clones and have different
> declaration than the former), but that should be handled incrementally.
> 
> There is one side corner where ltrans can leak function body till end of
> compilation (and get ICE in code checking that all bodies are gone).  Callgraph
> node removal is responsible for removing bodies when they are needed, but with
> inline clone in current partition, but offline clone reachbale from current
> partition won't end up in removal. After we remove all the inline clones, the
> presence of node in callgraph will keep body around.  This needs to be cured by
> not mocing "analyzed" flag that we do currently.  I will handle this later -
> there are more important issues to track for now.
> 
> Bootstrapped/regtested x86_64-linux. OK?

Ok.

Thanks,
Richard.

> 	* cgraph.c (cgraph_remove_node): Kill bodies in other partitoin.
> 	(dump_cgraph_node): Dump new flags.
> 	* cgraph.h (struct cgraph_node): Add flags reachable_from_other_partition
> 	and in_other_partition.
> 	(cgraph_can_remove_if_no_direct_calls_p): Functions used by other partition
> 	can not be removed.
> 	* cgraphunit.c (cgraph_mark_functions_to_output): Functions used by the other
> 	partition must be output; silence sanity checking on leaking functions
> 	bodies from other paritition.
> 	* lto-cgraph.c (reachable_from_other_partition_p): New function.
> 	(lto_output_node): Output new flags; do not sanity check that inline
> 	clones are output; drop lto_forced_extern_inline_p code; do not mock
> 	visibility flags at partition boundaries.
> 	(add_node_to): New function.
> 	(output_cgraph): Use it to sort functions so masters appear before
> 	clones.
> 	(input_overwrite_node): Input new flags.
> 	* passes.c (ipa_write_summaries): Do not call
> 	lto_new_extern_inline_states.
> 	* lto-section-out.c (forced_extern_inline, lto_new_extern_inline_states,
> 	lto_delete_extern_inline_states, lto_force_functions_extern_inline,
> 	lto_forced_extern_inline_p): Kill.
> 	* lto-streamer.h (lto_new_extern_inline_states,
> 	* lto_delete_extern_inline_states, lto_force_functions_extern_inline,
> 	lto_forced_extern_inline_p): Kill.
> 
> 	* lto.c (lto_add_inline_clones): Do not track inlined_decls.
> 	(lto_add_all_inlinees): Likewise.
> 	(lto_wpa_write_files): Likewise.
> 
> 
> Index: cgraph.c
> ===================================================================
> --- cgraph.c	(revision 158542)
> +++ cgraph.c	(working copy)
> @@ -1467,7 +1467,8 @@ cgraph_remove_node (struct cgraph_node *
>        struct cgraph_node *n = (struct cgraph_node *) *slot;
>        if (!n->clones && !n->clone_of && !n->global.inlined_to
>  	  && (cgraph_global_info_ready
> -	      && (TREE_ASM_WRITTEN (n->decl) || DECL_EXTERNAL (n->decl))))
> +	      && (TREE_ASM_WRITTEN (n->decl) || DECL_EXTERNAL (n->decl)
> +		  || n->in_other_partition)))
>  	kill_body = true;
>      }
>    if (assembler_name_hash)
> @@ -1639,6 +1640,10 @@ dump_cgraph_node (FILE *f, struct cgraph
>    if (cgraph_function_flags_ready)
>      fprintf (f, " availability:%s",
>  	     cgraph_availability_names [cgraph_function_body_availability (node)]);
> +  if (node->analyzed)
> +    fprintf (f, " analyzed");
> +  if (node->in_other_partition)
> +    fprintf (f, " in_other_partition");
>    if (node->count)
>      fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
>  	     (HOST_WIDEST_INT)node->count);
> @@ -1666,6 +1671,8 @@ dump_cgraph_node (FILE *f, struct cgraph
>      fprintf (f, " address_taken");
>    else if (node->reachable)
>      fprintf (f, " reachable");
> +  else if (node->reachable_from_other_partition)
> +    fprintf (f, " reachable_from_other_partition");
>    if (gimple_has_body_p (node->decl))
>      fprintf (f, " body");
>    if (node->process)
> Index: cgraph.h
> ===================================================================
> --- cgraph.h	(revision 158542)
> +++ cgraph.h	(working copy)
> @@ -249,11 +249,15 @@ struct GTY((chain_next ("%h.next"), chai
>       cgraph_remove_unreachable_nodes cgraph still can contain unreachable
>       nodes when they are needed for virtual clone instantiation.  */
>    unsigned reachable : 1;
> +  /* Set when function is reachable by call from other LTRANS partition.  */
> +  unsigned reachable_from_other_partition : 1;
>    /* Set once the function is lowered (i.e. its CFG is built).  */
>    unsigned lowered : 1;
>    /* Set once the function has been instantiated and its callee
>       lists created.  */
>    unsigned analyzed : 1;
> +  /* Set when function is available in the other LTO partition.  */
> +  unsigned in_other_partition : 1;
>    /* Set when function is scheduled to be processed by local passes.  */
>    unsigned process : 1;
>    /* Set for aliases once they got through assemble_alias.  */
> @@ -723,7 +727,7 @@ cgraph_only_called_directly_p (struct cg
>  static inline bool
>  cgraph_can_remove_if_no_direct_calls_p (struct cgraph_node *node)
>  {
> -  return (!node->needed
> +  return (!node->needed && !node->reachable_from_other_partition
>    	  && (DECL_COMDAT (node->decl) || !node->local.externally_visible));
>  }
>  
> Index: cgraphunit.c
> ===================================================================
> --- cgraphunit.c	(revision 158542)
> +++ cgraphunit.c	(working copy)
> @@ -1130,7 +1130,7 @@ cgraph_mark_functions_to_output (void)
>  	 outside the current compilation unit.  */
>        if (node->analyzed
>  	  && !node->global.inlined_to
> -	  && (node->needed
> +	  && (node->needed || node->reachable_from_other_partition
>  	      || (e && node->reachable))
>  	  && !TREE_ASM_WRITTEN (decl)
>  	  && !DECL_EXTERNAL (decl))
> @@ -1157,6 +1157,10 @@ cgraph_mark_functions_to_output (void)
>  #ifdef ENABLE_CHECKING
>  	  if (!node->global.inlined_to
>  	      && gimple_has_body_p (decl)
> +	      /* FIXME: in ltrans unit when offline copy is outside partition but inline copies
> +		 are inside partition, we can end up not removing the body since we no longer
> +		 have analyzed node pointing to it.  */
> +	      && !node->in_other_partition
>  	      && !DECL_EXTERNAL (decl))
>  	    {
>  	      dump_cgraph_node (stderr, node);
> @@ -1165,6 +1169,7 @@ cgraph_mark_functions_to_output (void)
>  #endif
>  	  gcc_assert (node->global.inlined_to
>  		      || !gimple_has_body_p (decl)
> +		      || node->in_other_partition
>  		      || DECL_EXTERNAL (decl));
>  
>  	}
> @@ -1178,6 +1183,10 @@ cgraph_mark_functions_to_output (void)
>  	  tree decl = node->decl;
>  	  if (!node->global.inlined_to
>  	      && gimple_has_body_p (decl)
> +	      /* FIXME: in ltrans unit when offline copy is outside partition but inline copies
> +		 are inside partition, we can end up not removing the body since we no longer
> +		 have analyzed node pointing to it.  */
> +	      && !node->in_other_partition
>  	      && !DECL_EXTERNAL (decl))
>  	    {
>  	      dump_cgraph_node (stderr, node);
> Index: lto-cgraph.c
> ===================================================================
> --- lto-cgraph.c	(revision 158542)
> +++ lto-cgraph.c	(working copy)
> @@ -164,6 +164,23 @@ lto_output_edge (struct lto_simple_outpu
>    bitpack_delete (bp);
>  }
>  
> +/* Return true when node is reachable from other partition.  */
> +
> +static bool
> +reachable_from_other_partition_p (struct cgraph_node *node, cgraph_node_set set)
> +{
> +  struct cgraph_edge *e;
> +  if (node->needed)
> +    return true;
> +  if (!node->analyzed)
> +    return false;
> +  if (node->global.inlined_to)
> +    return false;
> +  for (e = node->callers; e; e = e->next_caller)
> +    if (!cgraph_node_in_set_p (e->caller, set))
> +      return true;
> +  return false;
> +}
>  
>  /* Output the cgraph NODE to OB.  ENCODER is used to find the
>     reference number of NODE->inlined_to.  SET is the set of nodes we
> @@ -183,6 +200,7 @@ lto_output_node (struct lto_simple_outpu
>    unsigned local, externally_visible, inlinable, analyzed;
>    bool boundary_p, wrote_decl_p;
>    intptr_t ref;
> +  bool in_other_partition = false;
>  
>    boundary_p = !cgraph_node_in_set_p (node, set);
>    wrote_decl_p = bitmap_bit_p (written_decls, DECL_UID (node->decl));
> @@ -228,19 +246,17 @@ lto_output_node (struct lto_simple_outpu
>       local static nodes to prevent clashes with other local statics.  */
>    if (boundary_p)
>      {
> -      /* Inline clones can not be part of boundary.  */
> -      gcc_assert (!node->global.inlined_to);
> -      local = 0;
> -      externally_visible = 1;
> -      inlinable = 0;
> +      /* Inline clones can not be part of boundary.  
> +         gcc_assert (!node->global.inlined_to);  
> +
> +	 FIXME: At the moment they can be, when partition contains an inline
> +	 clone that is clone of inline clone from outside partition.  We can
> +	 reshape the clone tree and make other tree to be the root, but it
> +	 needs a bit extra work and will be promplty done by cgraph_remove_node
> +	 after reading back.  */
> +      in_other_partition = 1;
>        analyzed = 0;
>      }
> -  else if (lto_forced_extern_inline_p (node->decl))
> -    {
> -      local = 1;
> -      externally_visible = 0;
> -      inlinable = 1;
> -    }
>  
>    lto_output_uleb128_stream (ob->main_stream, wrote_decl_p);
>  
> @@ -263,8 +279,10 @@ lto_output_node (struct lto_simple_outpu
>    bp_pack_value (bp, node->address_taken, 1);
>    bp_pack_value (bp, node->abstract_and_needed, 1);
>    bp_pack_value (bp, node->reachable, 1);
> +  bp_pack_value (bp, analyzed && reachable_from_other_partition_p (node, set), 1);
>    bp_pack_value (bp, node->lowered, 1);
>    bp_pack_value (bp, analyzed, 1);
> +  bp_pack_value (bp, in_other_partition, 1);
>    bp_pack_value (bp, node->process, 1);
>    bp_pack_value (bp, node->alias, 1);
>    bp_pack_value (bp, node->finalized_by_frontend, 1);
> @@ -371,6 +389,15 @@ output_profile_summary (struct lto_simpl
>      lto_output_uleb128_stream (ob->main_stream, 0);
>  }
>  
> +/* Add NODE into encoder as well as nodes it is cloned from.
> +   Do it in a way so clones appear first.  */
> +static void
> +add_node_to (lto_cgraph_encoder_t encoder, struct cgraph_node *node)
> +{
> +  if (node->clone_of)
> +    add_node_to (encoder, node->clone_of);
> +  lto_cgraph_encoder_encode (encoder, node);
> +}
>  
>  /* Output the part of the cgraph in SET.  */
>  
> @@ -404,7 +431,7 @@ output_cgraph (cgraph_node_set set)
>    for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
>      {
>        node = csi_node (csi);
> -      lto_cgraph_encoder_encode (encoder, node);
> +      add_node_to (encoder, node);
>      }
>  
>    /* Go over all the nodes again to include callees that are not in
> @@ -419,7 +446,7 @@ output_cgraph (cgraph_node_set set)
>  	    {
>  	      /* We should have moved all the inlines.  */
>  	      gcc_assert (!callee->global.inlined_to);
> -	      lto_cgraph_encoder_encode (encoder, callee);
> +	      add_node_to (encoder, callee);
>  	      /* Also with each included function include all other functions
>  		 in the same comdat group.  */
>  	      if (callee->same_comdat_group)
> @@ -429,7 +456,7 @@ output_cgraph (cgraph_node_set set)
>  		       next != callee;
>  		       next = next->same_comdat_group)
>  		    if (!cgraph_node_in_set_p (next, set))
> -		      lto_cgraph_encoder_encode (encoder, next);
> +		      add_node_to (encoder, next);
>  		}
>  	    }
>  	}
> @@ -442,11 +469,13 @@ output_cgraph (cgraph_node_set set)
>  	       next != node;
>  	       next = next->same_comdat_group)
>  	    if (!cgraph_node_in_set_p (next, set))
> -	      lto_cgraph_encoder_encode (encoder, next);
> +	      add_node_to (encoder, next);
>  	}
>      }
>  
> -  /* Write out the nodes.  */
> +  /* Write out the nodes.  We must first output a node and then its clones,
> +     otherwise at a time reading back the node there would be nothing to clone
> +     from.  */
>    n_nodes = lto_cgraph_encoder_size (encoder);
>    for (i = 0; i < n_nodes; i++)
>      {
> @@ -530,8 +559,10 @@ input_overwrite_node (struct lto_file_de
>    node->address_taken = bp_unpack_value (bp, 1);
>    node->abstract_and_needed = bp_unpack_value (bp, 1);
>    node->reachable = bp_unpack_value (bp, 1);
> +  node->reachable_from_other_partition = bp_unpack_value (bp, 1);
>    node->lowered = bp_unpack_value (bp, 1);
>    node->analyzed = bp_unpack_value (bp, 1);
> +  node->in_other_partition = bp_unpack_value (bp, 1);
>    node->process = bp_unpack_value (bp, 1);
>    node->alias = bp_unpack_value (bp, 1);
>    node->finalized_by_frontend = bp_unpack_value (bp, 1);
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 158542)
> +++ lto/lto.c	(working copy)
> @@ -593,22 +593,19 @@ finish:
>  
>  static void
>  lto_add_inline_clones (cgraph_node_set set, struct cgraph_node *node,
> -		       bitmap original_decls, bitmap inlined_decls)
> +		       bitmap original_decls)
>  {
>     struct cgraph_node *callee;
>     struct cgraph_edge *edge;
>  
>     cgraph_node_set_add (set, node);
>  
> -   if (!bitmap_bit_p (original_decls, DECL_UID (node->decl)))
> -     bitmap_set_bit (inlined_decls, DECL_UID (node->decl));
> -
>     /* Check to see if NODE has any inlined callee.  */
>     for (edge = node->callees; edge != NULL; edge = edge->next_callee)
>       {
>  	callee = edge->callee;
>  	if (callee->global.inlined_to != NULL)
> -	  lto_add_inline_clones (set, callee, original_decls, inlined_decls);
> +	  lto_add_inline_clones (set, callee, original_decls);
>       }
>  }
>  
> @@ -616,14 +613,13 @@ lto_add_inline_clones (cgraph_node_set s
>     information in the callgraph.  Returns a bitmap of decls that have
>     been inlined into SET indexed by UID.  */
>  
> -static bitmap
> +static void
>  lto_add_all_inlinees (cgraph_node_set set)
>  {
>    cgraph_node_set_iterator csi;
>    struct cgraph_node *node;
>    bitmap original_nodes = lto_bitmap_alloc ();
>    bitmap original_decls = lto_bitmap_alloc ();
> -  bitmap inlined_decls = lto_bitmap_alloc ();
>    bool changed;
>  
>    /* We are going to iterate SET while adding to it, mark all original
> @@ -663,19 +659,17 @@ lto_add_all_inlinees (cgraph_node_set se
>      }
>    while (changed);
>  
> -  /* Transitively add to SET all the inline clones for every node that
> -     has been inlined.  */
> -  for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
> -    {
> -      node = csi_node (csi);
> -      if (bitmap_bit_p (original_nodes, node->uid))
> -	lto_add_inline_clones (set, node, original_decls, inlined_decls);
> -    }
> + /* Transitively add to SET all the inline clones for every node that
> +    has been inlined.  */
> + for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
> +   {
> +     node = csi_node (csi);
> +     if (bitmap_bit_p (original_nodes, node->uid))
> +      lto_add_inline_clones (set, node, original_decls);
> +   }
>  
>    lto_bitmap_free (original_nodes);
>    lto_bitmap_free (original_decls);
> -
> -  return inlined_decls;
>  }
>  
>  /* Owing to inlining, we may need to promote a file-scope variable
> @@ -1004,8 +998,6 @@ lto_wpa_write_files (void)
>    unsigned i, n_sets, last_out_file_ix, num_out_files;
>    lto_file *file;
>    cgraph_node_set set;
> -  bitmap decls;
> -  VEC(bitmap,heap) *inlined_decls = NULL;
>  
>    timevar_push (TV_WHOPR_WPA);
>  
> @@ -1015,8 +1007,7 @@ lto_wpa_write_files (void)
>       compiled by LTRANS.  */
>    for (i = 0; VEC_iterate (cgraph_node_set, lto_cgraph_node_sets, i, set); i++)
>      {
> -      decls = lto_add_all_inlinees (set);
> -      VEC_safe_push (bitmap, heap, inlined_decls, decls);
> +      lto_add_all_inlinees (set);
>        lto_stats.num_output_cgraph_nodes += VEC_length (cgraph_node_ptr,
>  						       set->nodes);
>      }
> @@ -1053,13 +1044,8 @@ lto_wpa_write_files (void)
>  	    fatal_error ("lto_elf_file_open() failed");
>  
>  	  lto_set_current_out_file (file);
> -	  lto_new_extern_inline_states ();
> -
> -	  decls = VEC_index (bitmap, inlined_decls, i);
> -	  lto_force_functions_extern_inline (decls);
>  
>  	  ipa_write_summaries_of_cgraph_node_set (set);
> -	  lto_delete_extern_inline_states ();
>  
>  	  lto_set_current_out_file (NULL);
>  	  lto_elf_file_close (file);
> @@ -1072,10 +1058,6 @@ lto_wpa_write_files (void)
>  
>    output_files[last_out_file_ix] = NULL;
>  
> -  for (i = 0; VEC_iterate (bitmap, inlined_decls, i, decls); i++)
> -    lto_bitmap_free (decls);
> -  VEC_free (bitmap, heap, inlined_decls);
> -
>    timevar_pop (TV_WHOPR_WPA_IO);
>  
>    return output_files;
> Index: passes.c
> ===================================================================
> --- passes.c	(revision 158542)
> +++ passes.c	(working copy)
> @@ -1695,7 +1695,6 @@ ipa_write_summaries (void)
>    if (!flag_generate_lto || errorcount || sorrycount)
>      return;
>  
> -  lto_new_extern_inline_states ();
>    set = cgraph_node_set_new ();
>  
>    /* Create the callgraph set in the same order used in
> @@ -1726,7 +1725,6 @@ ipa_write_summaries (void)
>      }
>  
>    ipa_write_summaries_1 (set);
> -  lto_delete_extern_inline_states ();
>  
>    free (order);
>    ggc_free (set);
> Index: lto-section-out.c
> ===================================================================
> --- lto-section-out.c	(revision 158542)
> +++ lto-section-out.c	(working copy)
> @@ -50,48 +50,6 @@ static VEC(lto_out_decl_state_ptr, heap)
>     generate the decl directory later. */
>  
>  VEC(lto_out_decl_state_ptr, heap) *lto_function_decl_states;
> -
> -/* Bitmap indexed by DECL_UID to indicate if a function needs to be
> -   forced extern inline. */
> -static bitmap forced_extern_inline;
> -
> -/* Initialize states for determining which function decls to be ouput
> -   as extern inline, regardless of the decls' own attributes.  */
> -
> -void
> -lto_new_extern_inline_states (void)
> -{
> -  forced_extern_inline = lto_bitmap_alloc ();
> -}
> -
> -/* Releasing resources use for states to determine which function decls
> -   to be ouput as extern inline */
> -
> -void
> -lto_delete_extern_inline_states (void)
> -{
> -  lto_bitmap_free (forced_extern_inline);
> -  forced_extern_inline = NULL;
> -}
> -
> -/* Force all the functions in DECLS to be output as extern inline.
> -   DECLS is a bitmap indexed by DECL_UID. */
> -
> -void
> -lto_force_functions_extern_inline (bitmap decls)
> -{
> -  bitmap_ior_into (forced_extern_inline, decls);
> -}
> -
> -/* Return true if FN_DECL is a function which should be emitted as
> -   extern inline.  */
> -
> -bool
> -lto_forced_extern_inline_p (tree fn_decl)
> -{
> -  return bitmap_bit_p (forced_extern_inline, DECL_UID (fn_decl));
> -}
> -
>  /* Returns a hash code for P.  */
>  
>  hashval_t
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h	(revision 158542)
> +++ lto-streamer.h	(working copy)
> @@ -770,10 +770,6 @@ extern void lto_push_out_decl_state (str
>  extern struct lto_out_decl_state *lto_pop_out_decl_state (void);
>  extern void lto_record_function_out_decl_state (tree,
>  						struct lto_out_decl_state *);
> -extern void lto_new_extern_inline_states (void);
> -extern void lto_delete_extern_inline_states (void);
> -extern void lto_force_functions_extern_inline (bitmap decls);
> -extern bool lto_forced_extern_inline_p (tree fn_decl);
>  
>  
>  /* In lto-streamer.c.  */
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]