Merge profiles of duplicated COMDAT functions

Jan Hubicka hubicka@ucw.cz
Fri Aug 30 08:57:00 GMT 2013


Hi,
this patch makes COMDAT profiles right with LTO. (made jointly with Martin Liska)

To recap the issue:  COMDAT functions are produced many times. Each copy gets
its own set of counters and depending on inlining and linker decision, one 
or more of copies of a given COMDAT function will get executed on runtime.
After reading profile in, the profiles can be wrong when inlining/linker
decision differs at -fprofile-use stage.

This has nasty effect of optimizing important stuff for size.

Now with LTO the problem is solved, since early inlining things works magically
well.  Catch is that for large projects, we tend to explode with
-fprofile-generate -flto and we explicitely ask users to not use -flto when
generating profile.  This brings the problem back.

This patch makes profiles of multiple copies of given comdat to be merged during
LTO symtab merging.  This is done in the busy way: both functions are read into
memory, compared if CFGs match, profiles are merged, cgraph profile is updated
based on CFG profile and one of the bodies is released from memory.  The other
body will then be streamed by WPA as if the function was born during WPA time.

This is not terribly cheap by design (we load COMDATs into WPA memory), but
reading happens only when COMDAT gets merged and more than one of them has
non-0 count from profile feedback. Moreover this whole path is executed only
when -fno-lto is used for -fprofile-generate.

The compile time/memory impact seems under 1% both on GCC and firefox.  For GCC
profiledbootstrap we merge 1600 functions, mostly vector accestors and tree
checking (I tried checking enabled build). 

To make things even ugier, there is nasty special case where we already merged
function declarations, but we absolutely need to read two different bodies of
the same function.  I did so by copying the declaration of function I am going
to remove.  This provoke checking failure on labels that are in global stream
and thus have wrong context pointers in one of the bodies.  This is harmless
since we are going to throw it away, but it requires special case silencing of
the sanity check for LTO stream in.  We may want to avoid streaming in gimple
statements completely, but we need to merge statement histograms so that will
require a bit of massaging of the on-disk format to make this possible.
(histograms are currently stored into statement section that needs to be
changed). I plan to look into this incrementally.

Now for longer term, we want to make function CFGs independent of gimple body
and we want to decide on instrumentation at linktime, so we won't require user
to rebuild with -fprofile-use, just relink.  I plan to work on this, but 
not for 4.9.  Thus I hope we can go with this fix - the COMDAT profile loss
issue has proved itself to be very difficult to deal with and very common
for C++ programs.

Bootstrapped/regtested x86_64-liux, profiledbootstrapped and tested
on firefox.

If there will be no real opposition, I will go ahead with this patch during
weekend, so it is picked by periodic testers.

Honza

	* lto-symtab.c (lto_cgraph_replace_node): Merge function profiles.
	* cgraph.c (cgraph_get_body): Update call of input_function_body.
	* ipa-utils.c: Include lto-streamer.h and ipa-inline.h
	(ipa_merge_profiles): New function.
	* ipa-utils.h (ipa_merge_profiles): Declare.
	* lto-streamer-in.c (lto_input_function_body): Change to use
	cgraph_node as parameter.
	(lto_read_body): Take cgraph node as parameter.
Index: lto-symtab.c
===================================================================
--- lto-symtab.c	(revision 202099)
+++ lto-symtab.c	(working copy)
@@ -80,6 +82,7 @@
   /* Redirect incomming references.  */
   ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
 
+  ipa_merge_profiles (prevailing_node, node);
   lto_free_function_in_decl_state_for_node ((symtab_node)node);
 
   if (node->symbol.decl != prevailing_node->symbol.decl)
Index: cgraph.c
===================================================================
--- cgraph.c	(revision 202099)
+++ cgraph.c	(working copy)
@@ -3109,7 +3109,7 @@
 
   gcc_assert (DECL_STRUCT_FUNCTION (decl) == NULL);
 
-  lto_input_function_body (file_data, node->symbol.decl, data);
+  lto_input_function_body (file_data, node, data);
   lto_stats.num_function_bodies++;
   lto_free_section_data (file_data, LTO_section_function_body, name,
 			 data, len);
Index: ipa-utils.c
===================================================================
--- ipa-utils.c	(revision 202099)
+++ ipa-utils.c	(working copy)
@@ -37,6 +37,8 @@
 #include "flags.h"
 #include "diagnostic.h"
 #include "langhooks.h"
+#include "lto-streamer.h"
+#include "ipa-inline.h"
 
 /* Debugging function for postorder and inorder code. NOTE is a string
    that is printed before the nodes are printed.  ORDER is an array of
@@ -618,3 +620,174 @@
 {
   dump_varpool_node_set (stderr, set);
 }
+
+
+/* SRC and DST are going to be merged.  Take SRC's profile and merge it into
+   DST so it is not going to be lost.  Destroy SRC's body on the way.  */
+
+void
+ipa_merge_profiles (struct cgraph_node *dst,
+		    struct cgraph_node *src)
+{
+  tree oldsrcdecl = src->symbol.decl;
+  struct function *srccfun, *dstcfun;
+  bool match = true;
+
+  if (!src->symbol.definition
+      || !dst->symbol.definition)
+    return;
+  if (src->frequency < dst->frequency)
+    src->frequency = dst->frequency;
+  if (!dst->count)
+    return;
+  if (cgraph_dump_file)
+    {
+      fprintf (cgraph_dump_file, "Merging profiles of %s/%i to %s/%i\n",
+	       xstrdup (cgraph_node_name (src)), src->symbol.order,
+	       xstrdup (cgraph_node_name (dst)), dst->symbol.order);
+    }
+  dst->count += src->count;
+
+  /* This is ugly.  We need to get both function bodies into memory.
+     If declaration is merged, we need to duplicate it to be able
+     to load body that is being replaced.  This makes symbol table
+     temporarily inconsistent.  */
+  if (src->symbol.decl == dst->symbol.decl)
+    {
+      void **slot;
+      struct lto_in_decl_state temp;
+      struct lto_in_decl_state *state;
+
+      /* We are going to move the decl, we want to remove its file decl data.
+	 and link these with the new decl. */
+      temp.fn_decl = src->symbol.decl;
+      slot = htab_find_slot (src->symbol.lto_file_data->function_decl_states,
+			     &temp, NO_INSERT);
+      state = (lto_in_decl_state *)*slot;
+      htab_clear_slot (src->symbol.lto_file_data->function_decl_states, slot);
+      gcc_assert (state);
+
+      /* Duplicate the decl and be sure it does not link into body of DST.  */
+      src->symbol.decl = copy_node (src->symbol.decl);
+      DECL_STRUCT_FUNCTION (src->symbol.decl) = NULL;
+      DECL_ARGUMENTS (src->symbol.decl) = NULL;
+      DECL_INITIAL (src->symbol.decl) = NULL;
+      DECL_RESULT (src->symbol.decl) = NULL;
+
+      /* Associate the decl state with new declaration, so LTO streamer
+ 	 can look it up.  */
+      state->fn_decl = src->symbol.decl;
+      slot = htab_find_slot (src->symbol.lto_file_data->function_decl_states,
+			     state, INSERT);
+      gcc_assert (!*slot);
+      *slot = state;
+    }
+  cgraph_get_body (src);
+  cgraph_get_body (dst);
+  srccfun = DECL_STRUCT_FUNCTION (src->symbol.decl);
+  dstcfun = DECL_STRUCT_FUNCTION (dst->symbol.decl);
+  if (n_basic_blocks_for_function (srccfun)
+      != n_basic_blocks_for_function (dstcfun))
+    {
+      if (cgraph_dump_file)
+	fprintf (cgraph_dump_file,
+		 "Giving up; number of basic block mismatch.\n");
+      match = false;
+    }
+  else if (last_basic_block_for_function (srccfun)
+	   != last_basic_block_for_function (dstcfun))
+    {
+      if (cgraph_dump_file)
+	fprintf (cgraph_dump_file,
+		 "Giving up; last block mismatch.\n");
+      match = false;
+    }
+  else 
+    {
+      basic_block srcbb, dstbb;
+
+      FOR_ALL_BB_FN (srcbb, srccfun)
+	{
+	  unsigned int i;
+
+	  dstbb = BASIC_BLOCK_FOR_FUNCTION (dstcfun, srcbb->index);
+	  if (dstbb == NULL)
+	    {
+	      if (cgraph_dump_file)
+		fprintf (cgraph_dump_file,
+			 "No matching block for bb %i.\n",
+			 srcbb->index);
+	      match = false;
+	      break;
+	    }
+	  if (EDGE_COUNT (srcbb->succs) != EDGE_COUNT (dstbb->succs))
+	    {
+	      if (cgraph_dump_file)
+		fprintf (cgraph_dump_file,
+			 "Edge count mistmatch for bb %i.\n",
+			 srcbb->index);
+	      match = false;
+	      break;
+	    }
+	  for (i = 0; i < EDGE_COUNT (srcbb->succs); i++)
+	    {
+	      edge srce = EDGE_SUCC (srcbb, i);
+	      edge dste = EDGE_SUCC (dstbb, i);
+	      if (srce->dest->index != dste->dest->index)
+		{
+		  if (cgraph_dump_file)
+		    fprintf (cgraph_dump_file,
+			     "Succ edge mistmatch for bb %i.\n",
+			     srce->dest->index);
+		  match = false;
+		  break;
+		}
+	    }
+	}
+    }
+  if (match)
+    {
+      struct cgraph_edge *e;
+      basic_block srcbb, dstbb;
+
+      /* TODO: merge also statement histograms.  */
+      FOR_ALL_BB_FN (srcbb, srccfun)
+	{
+	  unsigned int i;
+
+	  dstbb = BASIC_BLOCK_FOR_FUNCTION (dstcfun, srcbb->index);
+	  dstbb->count += srcbb->count;
+	  for (i = 0; i < EDGE_COUNT (srcbb->succs); i++)
+	    {
+	      edge srce = EDGE_SUCC (srcbb, i);
+	      edge dste = EDGE_SUCC (dstbb, i);
+	      dste->count += srce->count;
+	    }
+	}
+      push_cfun (dstcfun);
+      counts_to_freqs ();
+      compute_function_frequency ();
+      pop_cfun ();
+      for (e = dst->callees; e; e = e->next_callee)
+	{
+	  gcc_assert (!e->speculative);
+	  e->count = gimple_bb (e->call_stmt)->count;
+	  e->frequency = compute_call_stmt_bb_frequency
+			     (dst->symbol.decl,
+			      gimple_bb (e->call_stmt));
+	}
+      for (e = dst->indirect_calls; e; e = e->next_callee)
+	{
+	  gcc_assert (!e->speculative);
+	  e->count = gimple_bb (e->call_stmt)->count;
+	  e->frequency = compute_call_stmt_bb_frequency
+			     (dst->symbol.decl,
+			      gimple_bb (e->call_stmt));
+	}
+      cgraph_release_function_body (src);
+      inline_update_overall_summary (dst);
+    }
+  /* TODO: if there is no match, we can scale up.  */
+  src->symbol.decl = oldsrcdecl;
+}
+
Index: ipa-utils.h
===================================================================
--- ipa-utils.h	(revision 202099)
+++ ipa-utils.h	(working copy)
@@ -44,6 +44,8 @@
 vec<cgraph_node_ptr> ipa_get_nodes_in_cycle (struct cgraph_node *);
 int ipa_reverse_postorder (struct cgraph_node **);
 tree get_base_var (tree);
+void ipa_merge_profiles (struct cgraph_node *dst,
+			 struct cgraph_node *src);
 
 /* In ipa-devirt.c  */
 
Index: gimple-streamer-in.c
===================================================================
--- gimple-streamer-in.c	(revision 202099)
+++ gimple-streamer-in.c	(working copy)
@@ -284,7 +284,11 @@
     }
   else if (code == GIMPLE_LABEL)
     gcc_assert (emit_label_in_global_context_p (gimple_label_label (stmt))
-	        || DECL_CONTEXT (gimple_label_label (stmt)) == fn->decl);
+	        || DECL_CONTEXT (gimple_label_label (stmt)) == fn->decl
+		/* Do not sanity check before decl merging is completed.
+		   This is needed for profile merging during symtab
+		   resolution.  */
+		|| cgraph_state == CGRAPH_LTO_STREAMING);
   else if (code == GIMPLE_ASM)
     {
       unsigned i;
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 202099)
+++ lto-streamer.h	(working copy)
@@ -834,7 +834,8 @@
 /* In lto-streamer-in.c */
 extern void lto_input_cgraph (struct lto_file_decl_data *, const char *);
 extern void lto_reader_init (void);
-extern void lto_input_function_body (struct lto_file_decl_data *, tree,
+extern void lto_input_function_body (struct lto_file_decl_data *,
+				     struct cgraph_node *,
 				     const char *);
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 202099)
+++ lto-streamer-in.c	(working copy)
@@ -1001,14 +1064,14 @@
 }
 
 
-/* Read the body from DATA for function FN_DECL and fill it in.
+/* Read the body from DATA for function NODE and fill it in.
    FILE_DATA are the global decls and types.  SECTION_TYPE is either
    LTO_section_function_body or LTO_section_static_initializer.  If
    section type is LTO_section_function_body, FN must be the decl for
    that function.  */
 
 static void
-lto_read_body (struct lto_file_decl_data *file_data, tree fn_decl,
+lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
 	       const char *data, enum lto_section_type section_type)
 {
   const struct lto_function_header *header;
@@ -1018,6 +1081,7 @@
   int string_offset;
   struct lto_input_block ib_cfg;
   struct lto_input_block ib_main;
+  tree fn_decl = node->symbol.decl;
 
   header = (const struct lto_function_header *) data;
   cfg_offset = sizeof (struct lto_function_header);
@@ -1044,7 +1108,6 @@
   if (section_type == LTO_section_function_body)
     {
       struct lto_in_decl_state *decl_state;
-      struct cgraph_node *node = cgraph_get_node (fn_decl);
       unsigned from;
 
       gcc_checking_assert (node);
@@ -1094,14 +1157,14 @@
 }
 
 
-/* Read the body of FN_DECL using DATA.  FILE_DATA holds the global
+/* Read the body of NODE using DATA.  FILE_DATA holds the global
    decls and types.  */
 
 void
 lto_input_function_body (struct lto_file_decl_data *file_data,
-			 tree fn_decl, const char *data)
+			 struct cgraph_node *node, const char *data)
 {
-  lto_read_body (file_data, fn_decl, data, LTO_section_function_body);
+  lto_read_body (file_data, node, data, LTO_section_function_body);
 }
 
 



More information about the Gcc-patches mailing list