This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Merge profiles of duplicated COMDAT functions


On Fri, 30 Aug 2013, Jan Hubicka wrote:

> Hi,
> this patch makes COMDAT profiles right with LTO. (made jointly with Martin Liska)
> 
> To recap the issue:  COMDAT functions are produced many times. Each copy gets
> its own set of counters and depending on inlining and linker decision, one 
> or more of copies of a given COMDAT function will get executed on runtime.
> After reading profile in, the profiles can be wrong when inlining/linker
> decision differs at -fprofile-use stage.
> 
> This has nasty effect of optimizing important stuff for size.
> 
> Now with LTO the problem is solved, since early inlining things works magically
> well.  Catch is that for large projects, we tend to explode with
> -fprofile-generate -flto and we explicitely ask users to not use -flto when
> generating profile.  This brings the problem back.
> 
> This patch makes profiles of multiple copies of given comdat to be merged during
> LTO symtab merging.  This is done in the busy way: both functions are read into
> memory, compared if CFGs match, profiles are merged, cgraph profile is updated
> based on CFG profile and one of the bodies is released from memory.  The other
> body will then be streamed by WPA as if the function was born during WPA time.
> 
> This is not terribly cheap by design (we load COMDATs into WPA memory), but
> reading happens only when COMDAT gets merged and more than one of them has
> non-0 count from profile feedback. Moreover this whole path is executed only
> when -fno-lto is used for -fprofile-generate.
> 
> The compile time/memory impact seems under 1% both on GCC and firefox.  For GCC
> profiledbootstrap we merge 1600 functions, mostly vector accestors and tree
> checking (I tried checking enabled build). 
> 
> To make things even ugier, there is nasty special case where we already merged
> function declarations, but we absolutely need to read two different bodies of
> the same function.  I did so by copying the declaration of function I am going
> to remove.  This provoke checking failure on labels that are in global stream
> and thus have wrong context pointers in one of the bodies.  This is harmless
> since we are going to throw it away, but it requires special case silencing of
> the sanity check for LTO stream in.  We may want to avoid streaming in gimple
> statements completely, but we need to merge statement histograms so that will
> require a bit of massaging of the on-disk format to make this possible.
> (histograms are currently stored into statement section that needs to be
> changed). I plan to look into this incrementally.
> 
> Now for longer term, we want to make function CFGs independent of gimple body
> and we want to decide on instrumentation at linktime, so we won't require user
> to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
> not for 4.9.  Thus I hope we can go with this fix - the COMDAT profile loss
> issue has proved itself to be very difficult to deal with and very common
> for C++ programs.
> 
> Bootstrapped/regtested x86_64-liux, profiledbootstrapped and tested
> on firefox.
> 
> If there will be no real opposition, I will go ahead with this patch during
> weekend, so it is picked by periodic testers.
> 
> Honza
> 
> 	* lto-symtab.c (lto_cgraph_replace_node): Merge function profiles.
> 	* cgraph.c (cgraph_get_body): Update call of input_function_body.
> 	* ipa-utils.c: Include lto-streamer.h and ipa-inline.h
> 	(ipa_merge_profiles): New function.
> 	* ipa-utils.h (ipa_merge_profiles): Declare.
> 	* lto-streamer-in.c (lto_input_function_body): Change to use
> 	cgraph_node as parameter.
> 	(lto_read_body): Take cgraph node as parameter.
> Index: lto-symtab.c
> ===================================================================
> --- lto-symtab.c	(revision 202099)
> +++ lto-symtab.c	(working copy)
> @@ -80,6 +82,7 @@
>    /* Redirect incomming references.  */
>    ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
>  
> +  ipa_merge_profiles (prevailing_node, node);
>    lto_free_function_in_decl_state_for_node ((symtab_node)node);
>  
>    if (node->symbol.decl != prevailing_node->symbol.decl)
> Index: cgraph.c
> ===================================================================
> --- cgraph.c	(revision 202099)
> +++ cgraph.c	(working copy)
> @@ -3109,7 +3109,7 @@
>  
>    gcc_assert (DECL_STRUCT_FUNCTION (decl) == NULL);
>  
> -  lto_input_function_body (file_data, node->symbol.decl, data);
> +  lto_input_function_body (file_data, node, data);
>    lto_stats.num_function_bodies++;
>    lto_free_section_data (file_data, LTO_section_function_body, name,
>  			 data, len);
> Index: ipa-utils.c
> ===================================================================
> --- ipa-utils.c	(revision 202099)
> +++ ipa-utils.c	(working copy)
> @@ -37,6 +37,8 @@
>  #include "flags.h"
>  #include "diagnostic.h"
>  #include "langhooks.h"
> +#include "lto-streamer.h"
> +#include "ipa-inline.h"
>  
>  /* Debugging function for postorder and inorder code. NOTE is a string
>     that is printed before the nodes are printed.  ORDER is an array of
> @@ -618,3 +620,174 @@
>  {
>    dump_varpool_node_set (stderr, set);
>  }
> +
> +
> +/* SRC and DST are going to be merged.  Take SRC's profile and merge it into
> +   DST so it is not going to be lost.  Destroy SRC's body on the way.  */
> +
> +void
> +ipa_merge_profiles (struct cgraph_node *dst,
> +		    struct cgraph_node *src)
> +{
> +  tree oldsrcdecl = src->symbol.decl;
> +  struct function *srccfun, *dstcfun;
> +  bool match = true;
> +
> +  if (!src->symbol.definition
> +      || !dst->symbol.definition)
> +    return;
> +  if (src->frequency < dst->frequency)
> +    src->frequency = dst->frequency;
> +  if (!dst->count)
> +    return;
> +  if (cgraph_dump_file)
> +    {
> +      fprintf (cgraph_dump_file, "Merging profiles of %s/%i to %s/%i\n",
> +	       xstrdup (cgraph_node_name (src)), src->symbol.order,
> +	       xstrdup (cgraph_node_name (dst)), dst->symbol.order);
> +    }
> +  dst->count += src->count;
> +
> +  /* This is ugly.  We need to get both function bodies into memory.
> +     If declaration is merged, we need to duplicate it to be able
> +     to load body that is being replaced.  This makes symbol table
> +     temporarily inconsistent.  */
> +  if (src->symbol.decl == dst->symbol.decl)
> +    {
> +      void **slot;
> +      struct lto_in_decl_state temp;
> +      struct lto_in_decl_state *state;
> +
> +      /* We are going to move the decl, we want to remove its file decl data.
> +	 and link these with the new decl. */
> +      temp.fn_decl = src->symbol.decl;
> +      slot = htab_find_slot (src->symbol.lto_file_data->function_decl_states,
> +			     &temp, NO_INSERT);
> +      state = (lto_in_decl_state *)*slot;
> +      htab_clear_slot (src->symbol.lto_file_data->function_decl_states, slot);
> +      gcc_assert (state);
> +
> +      /* Duplicate the decl and be sure it does not link into body of DST.  */
> +      src->symbol.decl = copy_node (src->symbol.decl);
> +      DECL_STRUCT_FUNCTION (src->symbol.decl) = NULL;
> +      DECL_ARGUMENTS (src->symbol.decl) = NULL;
> +      DECL_INITIAL (src->symbol.decl) = NULL;
> +      DECL_RESULT (src->symbol.decl) = NULL;
> +
> +      /* Associate the decl state with new declaration, so LTO streamer
> + 	 can look it up.  */
> +      state->fn_decl = src->symbol.decl;
> +      slot = htab_find_slot (src->symbol.lto_file_data->function_decl_states,
> +			     state, INSERT);
> +      gcc_assert (!*slot);
> +      *slot = state;
> +    }
> +  cgraph_get_body (src);
> +  cgraph_get_body (dst);
> +  srccfun = DECL_STRUCT_FUNCTION (src->symbol.decl);
> +  dstcfun = DECL_STRUCT_FUNCTION (dst->symbol.decl);
> +  if (n_basic_blocks_for_function (srccfun)
> +      != n_basic_blocks_for_function (dstcfun))
> +    {
> +      if (cgraph_dump_file)
> +	fprintf (cgraph_dump_file,
> +		 "Giving up; number of basic block mismatch.\n");
> +      match = false;
> +    }
> +  else if (last_basic_block_for_function (srccfun)
> +	   != last_basic_block_for_function (dstcfun))
> +    {
> +      if (cgraph_dump_file)
> +	fprintf (cgraph_dump_file,
> +		 "Giving up; last block mismatch.\n");
> +      match = false;
> +    }
> +  else 
> +    {
> +      basic_block srcbb, dstbb;
> +
> +      FOR_ALL_BB_FN (srcbb, srccfun)
> +	{
> +	  unsigned int i;
> +
> +	  dstbb = BASIC_BLOCK_FOR_FUNCTION (dstcfun, srcbb->index);
> +	  if (dstbb == NULL)
> +	    {
> +	      if (cgraph_dump_file)
> +		fprintf (cgraph_dump_file,
> +			 "No matching block for bb %i.\n",
> +			 srcbb->index);
> +	      match = false;
> +	      break;
> +	    }
> +	  if (EDGE_COUNT (srcbb->succs) != EDGE_COUNT (dstbb->succs))
> +	    {
> +	      if (cgraph_dump_file)
> +		fprintf (cgraph_dump_file,
> +			 "Edge count mistmatch for bb %i.\n",
> +			 srcbb->index);
> +	      match = false;
> +	      break;
> +	    }
> +	  for (i = 0; i < EDGE_COUNT (srcbb->succs); i++)
> +	    {
> +	      edge srce = EDGE_SUCC (srcbb, i);
> +	      edge dste = EDGE_SUCC (dstbb, i);
> +	      if (srce->dest->index != dste->dest->index)
> +		{
> +		  if (cgraph_dump_file)
> +		    fprintf (cgraph_dump_file,
> +			     "Succ edge mistmatch for bb %i.\n",
> +			     srce->dest->index);
> +		  match = false;
> +		  break;
> +		}
> +	    }
> +	}
> +    }
> +  if (match)
> +    {
> +      struct cgraph_edge *e;
> +      basic_block srcbb, dstbb;
> +
> +      /* TODO: merge also statement histograms.  */
> +      FOR_ALL_BB_FN (srcbb, srccfun)
> +	{
> +	  unsigned int i;
> +
> +	  dstbb = BASIC_BLOCK_FOR_FUNCTION (dstcfun, srcbb->index);
> +	  dstbb->count += srcbb->count;
> +	  for (i = 0; i < EDGE_COUNT (srcbb->succs); i++)
> +	    {
> +	      edge srce = EDGE_SUCC (srcbb, i);
> +	      edge dste = EDGE_SUCC (dstbb, i);
> +	      dste->count += srce->count;
> +	    }
> +	}
> +      push_cfun (dstcfun);
> +      counts_to_freqs ();
> +      compute_function_frequency ();
> +      pop_cfun ();
> +      for (e = dst->callees; e; e = e->next_callee)
> +	{
> +	  gcc_assert (!e->speculative);
> +	  e->count = gimple_bb (e->call_stmt)->count;
> +	  e->frequency = compute_call_stmt_bb_frequency
> +			     (dst->symbol.decl,
> +			      gimple_bb (e->call_stmt));
> +	}
> +      for (e = dst->indirect_calls; e; e = e->next_callee)
> +	{
> +	  gcc_assert (!e->speculative);
> +	  e->count = gimple_bb (e->call_stmt)->count;
> +	  e->frequency = compute_call_stmt_bb_frequency
> +			     (dst->symbol.decl,
> +			      gimple_bb (e->call_stmt));
> +	}
> +      cgraph_release_function_body (src);
> +      inline_update_overall_summary (dst);
> +    }
> +  /* TODO: if there is no match, we can scale up.  */
> +  src->symbol.decl = oldsrcdecl;
> +}
> +
> Index: ipa-utils.h
> ===================================================================
> --- ipa-utils.h	(revision 202099)
> +++ ipa-utils.h	(working copy)
> @@ -44,6 +44,8 @@
>  vec<cgraph_node_ptr> ipa_get_nodes_in_cycle (struct cgraph_node *);
>  int ipa_reverse_postorder (struct cgraph_node **);
>  tree get_base_var (tree);
> +void ipa_merge_profiles (struct cgraph_node *dst,
> +			 struct cgraph_node *src);
>  
>  /* In ipa-devirt.c  */
>  
> Index: gimple-streamer-in.c
> ===================================================================
> --- gimple-streamer-in.c	(revision 202099)
> +++ gimple-streamer-in.c	(working copy)
> @@ -284,7 +284,11 @@
>      }
>    else if (code == GIMPLE_LABEL)
>      gcc_assert (emit_label_in_global_context_p (gimple_label_label (stmt))
> -	        || DECL_CONTEXT (gimple_label_label (stmt)) == fn->decl);
> +	        || DECL_CONTEXT (gimple_label_label (stmt)) == fn->decl
> +		/* Do not sanity check before decl merging is completed.
> +		   This is needed for profile merging during symtab
> +		   resolution.  */
> +		|| cgraph_state == CGRAPH_LTO_STREAMING);

Please instead remove this assert and put the checking into
tree-cfg.c:verify_gimple_label where it should need no special-casing
on cgraph_state.

Otherwise the non-profile bits look ok.

Thanks,
Richard.

>    else if (code == GIMPLE_ASM)
>      {
>        unsigned i;
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h	(revision 202099)
> +++ lto-streamer.h	(working copy)
> @@ -834,7 +834,8 @@
>  /* In lto-streamer-in.c */
>  extern void lto_input_cgraph (struct lto_file_decl_data *, const char *);
>  extern void lto_reader_init (void);
> -extern void lto_input_function_body (struct lto_file_decl_data *, tree,
> +extern void lto_input_function_body (struct lto_file_decl_data *,
> +				     struct cgraph_node *,
>  				     const char *);
>  extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
>  					      const char *);
> Index: lto-streamer-in.c
> ===================================================================
> --- lto-streamer-in.c	(revision 202099)
> +++ lto-streamer-in.c	(working copy)
> @@ -1001,14 +1064,14 @@
>  }
>  
>  
> -/* Read the body from DATA for function FN_DECL and fill it in.
> +/* Read the body from DATA for function NODE and fill it in.
>     FILE_DATA are the global decls and types.  SECTION_TYPE is either
>     LTO_section_function_body or LTO_section_static_initializer.  If
>     section type is LTO_section_function_body, FN must be the decl for
>     that function.  */
>  
>  static void
> -lto_read_body (struct lto_file_decl_data *file_data, tree fn_decl,
> +lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
>  	       const char *data, enum lto_section_type section_type)
>  {
>    const struct lto_function_header *header;
> @@ -1018,6 +1081,7 @@
>    int string_offset;
>    struct lto_input_block ib_cfg;
>    struct lto_input_block ib_main;
> +  tree fn_decl = node->symbol.decl;
>  
>    header = (const struct lto_function_header *) data;
>    cfg_offset = sizeof (struct lto_function_header);
> @@ -1044,7 +1108,6 @@
>    if (section_type == LTO_section_function_body)
>      {
>        struct lto_in_decl_state *decl_state;
> -      struct cgraph_node *node = cgraph_get_node (fn_decl);
>        unsigned from;
>  
>        gcc_checking_assert (node);
> @@ -1094,14 +1157,14 @@
>  }
>  
>  
> -/* Read the body of FN_DECL using DATA.  FILE_DATA holds the global
> +/* Read the body of NODE using DATA.  FILE_DATA holds the global
>     decls and types.  */
>  
>  void
>  lto_input_function_body (struct lto_file_decl_data *file_data,
> -			 tree fn_decl, const char *data)
> +			 struct cgraph_node *node, const char *data)
>  {
> -  lto_read_body (file_data, fn_decl, data, LTO_section_function_body);
> +  lto_read_body (file_data, node, data, LTO_section_function_body);
>  }
>  
>  
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]