This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Use separate sections to stream non-trivial constructors


Hi,
since we both agreed offlining constructors from global decl stream is a good
idea, I went ahead and implemented it.  I would like to followup by an
cleanups; for example the sections are still tagged as function sections, but I
would like to do it incrementally. There is quite some uglyness in the way we
handle function sections and the patch started to snowball very quickly.

The patch conceptually copies what we do for functions and re-uses most of
infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
getting function in) and it is used by output machinery, by ipa-visibility
while rewritting the constructor and by ctor_for_folding (which makes us to
load the ctor whenever it is needed by ipa-cp or ipa-devirt).

I kept get_symbol_initial_value as an authority to decide if we want to encode
given constructor or not.  The section itself for trivial ctor is about 25
bytes and with header it is probably close to double of it. Currently the heuristic
is to offline only constructors that are CONSTRUCTOR and keep simple expressions
inline.  We may want to tweak it.

The patch does not bring miraculous savings to firefox WPA, but it does some:

GGC memory after global stream is read goes from 1376898k to 1250533k
overall GGC allocations from 4156478 kB to 4012462 kB
read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
20997206 tree bodies read in total -> 18584194 tree bodies read in total
Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 

Things would be better if ipa-visibility and ipa-devirt did not load most of
the virtual tables into memory (still better than loading each into memory 20
times at average).  I will work on that incrementally. We load 10311 ctors into
memory at WPA time.

Note that firefox seems to feature really huge data segment these days.
http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html

Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK?

	* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
	(varpool_get_constructor): New function.
	(ctor_for_folding): Use it.
	(varpool_assemble_decl): Likewise.
	* lto-streamer.h (struct output_block): Turn cgraph_node
	to symbol filed.
	(lto_input_variable_constructor): Declare.
	* ipa-visibility.c (function_and_variable_visibility): Use
	varpool_get_constructor.
	* cgraph.h (varpool_get_constructor): Declare.
	* lto-streamer-out.c (get_symbol_initial_value): Take encoder
	parameter; return error_mark_node for non-trivial constructors.
	(lto_write_tree_1, DFS_write_tree): UPdate use of
	get_symbol_initial_value.
	(output_function): Update initialization of symbol.
	(output_constructor): New function.
	(copy_function): Rename to ..
	(copy_function_or_variable): ... this one; handle vars too.
	(lto_output): Output variable sections.
	* lto-streamer-in.c (input_constructor): New function.
	(lto_read_body): Rename from ...
	(lto_read_body_or_constructor): ... this one; handle vars
	too.
	(lto_input_variable_constructor): New function.
	* ipa-prop.c (ipa_prop_write_jump_functions,
	ipa_prop_write_all_agg_replacement): Update.
Index: varpool.c
===================================================================
--- varpool.c	(revision 212426)
+++ varpool.c	(working copy)
@@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
 #include "gimple-expr.h"
 #include "flags.h"
 #include "pointer-set.h"
+#include "tree-ssa-alias.h"
+#include "gimple.h"
+#include "lto-streamer.h"
 
 const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
 				      "tls-global-dynamic", "tls-local-dynamic",
@@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
     return NULL;
 }
 
+/* When doing LTO, read NODE's constructor from disk if it is not already present.  */
+
+tree
+varpool_get_constructor (struct varpool_node *node)
+{
+  struct lto_file_decl_data *file_data;
+  const char *data, *name;
+  size_t len;
+  tree decl = node->decl;
+
+  if (DECL_INITIAL (node->decl) != error_mark_node
+      || !in_lto_p)
+    return DECL_INITIAL (node->decl);
+
+  file_data = node->lto_file_data;
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* We may have renamed the declaration, e.g., a static function.  */
+  name = lto_get_decl_name_mapping (file_data, name);
+
+  data = lto_get_section_data (file_data, LTO_section_function_body,
+			       name, &len);
+  if (!data)
+    fatal_error ("%s: section %s is missing",
+		 file_data->file_name,
+		 name);
+
+  lto_input_variable_constructor (file_data, node, data);
+  lto_stats.num_function_bodies++;
+  lto_free_section_data (file_data, LTO_section_function_body, name,
+			 data, len);
+  lto_free_function_in_decl_state_for_node (node);
+  return DECL_INITIAL (node->decl);
+}
+
 /* Return if DECL is constant and its initial value is known (so we can do
    constant folding using DECL_INITIAL (decl)).
    Return ERROR_MARK_NODE when value is unknown.  */
@@ -314,6 +352,9 @@ ctor_for_folding (tree decl)
   if (DECL_VIRTUAL_P (real_decl))
     {
       gcc_checking_assert (TREE_READONLY (real_decl));
+      if (DECL_INITIAL (real_decl) == error_mark_node
+	  && (node = varpool_get_node (real_decl)))
+	return varpool_get_constructor (node);
       if (DECL_INITIAL (real_decl))
 	return DECL_INITIAL (real_decl);
       else
@@ -349,6 +390,9 @@ ctor_for_folding (tree decl)
 
      ??? Previously we behaved so for scalar variables but not for array
      accesses.  */
+  if (DECL_INITIAL (real_decl) == error_mark_node
+      && (node = varpool_get_node (real_decl)))
+    return varpool_get_constructor (node);
   return DECL_INITIAL (real_decl);
 }
 
@@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod
   if (!node->in_other_partition
       && !DECL_EXTERNAL (decl))
     {
+      varpool_get_constructor (node);
       assemble_variable (decl, 0, 1, 0);
       gcc_assert (TREE_ASM_WRITTEN (decl));
       node->definition = true;
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 212426)
+++ lto-streamer.h	(working copy)
@@ -685,9 +685,9 @@ struct output_block
      far and the indexes assigned to them.  */
   hash_table<string_slot_hasher> *string_hash_table;
 
-  /* The current cgraph_node that we are currently serializing.  Null
+  /* The current symbol that we are currently serializing.  Null
      if we are serializing something else.  */
-  struct cgraph_node *cgraph_node;
+  struct symtab_node *symbol;
 
   /* These are the last file and line that were seen in the stream.
      If the current node differs from these, it needs to insert
@@ -830,6 +830,9 @@ extern void lto_reader_init (void);
 extern void lto_input_function_body (struct lto_file_decl_data *,
 				     struct cgraph_node *,
 				     const char *);
+extern void lto_input_variable_constructor (struct lto_file_decl_data *,
+					    struct varpool_node *,
+					    const char *);
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
Index: ipa-visibility.c
===================================================================
--- ipa-visibility.c	(revision 212426)
+++ ipa-visibility.c	(working copy)
@@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
 	  if (found)
 	    {
 	      struct pointer_set_t *visited_nodes = pointer_set_create ();
+
+	      varpool_get_constructor (vnode);
 	      walk_tree (&DECL_INITIAL (vnode->decl),
 			 update_vtable_references, NULL, visited_nodes);
 	      pointer_set_destroy (visited_nodes);
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 212426)
+++ cgraph.h	(working copy)
@@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree);
 void symtab_initialize_asm_name_hash (void);
 void symtab_prevail_in_asm_name_hash (symtab_node *node);
 void varpool_remove_initializer (varpool_node *);
+tree varpool_get_constructor (struct varpool_node *node);
 
 /* In cgraph.c */
 extern void change_decl_assembler_name (tree, tree);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 212426)
+++ lto-streamer-out.c	(working copy)
@@ -318,7 +319,7 @@ lto_is_streamable (tree expr)
 /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  */
 
 static tree
-get_symbol_initial_value (struct output_block *ob, tree expr)
+get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
 {
   gcc_checking_assert (DECL_P (expr)
 		       && TREE_CODE (expr) != FUNCTION_DECL
@@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_
       && !DECL_IN_CONSTANT_POOL (expr)
       && initial)
     {
-      lto_symtab_encoder_t encoder;
       varpool_node *vnode;
-
-      encoder = ob->decl_state->symtab_node_encoder;
-      vnode = varpool_get_node (expr);
-      if (!vnode
-	  || !lto_symtab_encoder_encode_initializer_p (encoder,
-						       vnode))
-	initial = error_mark_node;
+      /* Extra section needs about 30 bytes; do not produce it for simple
+	 scalar values.  */
+      if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
+	  || !(vnode = varpool_get_node (expr))
+	  || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
+        initial = error_mark_node;
     }
 
   return initial;
@@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o
       && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
     {
       /* Handle DECL_INITIAL for symbols.  */
-      tree initial = get_symbol_initial_value (ob, expr);
+      tree initial = get_symbol_initial_value
+			 (ob->decl_state->symtab_node_encoder, expr);
       stream_write_tree (ob, initial, ref_p);
     }
 }
@@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob,
 	      && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
 	    {
 	      /* Handle DECL_INITIAL for symbols.  */
-	      tree initial = get_symbol_initial_value (ob, expr);
+	      tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
+						       expr);
 	      DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
 	    }
 	}
@@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod
   ob = create_output_block (LTO_section_function_body);
 
   clear_line_info (ob);
-  ob->cgraph_node = node;
+  ob->symbol = node;
 
   gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
 
@@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod
   destroy_output_block (ob);
 }
 
+/* Output the body of function NODE->DECL.  */
+
+static void
+output_constructor (struct varpool_node *node)
+{
+  tree var = node->decl;
+  struct output_block *ob;
+
+  ob = create_output_block (LTO_section_function_body);
+
+  clear_line_info (ob);
+  ob->symbol = node;
+
+  /* Make string 0 be a NULL string.  */
+  streamer_write_char_stream (ob->string_stream, 0);
+
+  /* Output DECL_INITIAL for the function, which contains the tree of
+     lexical scopes.  */
+  stream_write_tree (ob, DECL_INITIAL (var), true);
+
+  /* Create a section to hold the pickled output of this function.   */
+  produce_asm (ob, var);
+
+  destroy_output_block (ob);
+}
+
 
 /* Emit toplevel asms.  */
 
@@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void)
 }
 
 
-/* Copy the function body of NODE without deserializing. */
+/* Copy the function body or variable constructor of NODE without deserializing. */
 
 static void
-copy_function (struct cgraph_node *node)
+copy_function_or_variable (struct symtab_node *node)
 {
   tree function = node->decl;
   struct lto_file_decl_data *file_data = node->lto_file_data;
@@ -2072,7 +2190,7 @@ lto_output (void)
 	      if (gimple_has_body_p (node->decl) || !flag_wpa)
 		output_function (node);
 	      else
-		copy_function (node);
+		copy_function_or_variable (node);
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
@@ -2085,6 +2203,25 @@ lto_output (void)
 	  tree ctor = DECL_INITIAL (node->decl);
 	  if (ctor && !in_lto_p)
 	    walk_tree (&ctor, wrap_refs, NULL, NULL);
+	  if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
+	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
+	      && !node->alias)
+	    {
+#ifdef ENABLE_CHECKING
+	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
+	      bitmap_set_bit (output, DECL_UID (node->decl));
+#endif
+	      decl_state = lto_new_out_decl_state ();
+	      lto_push_out_decl_state (decl_state);
+	      if (DECL_INITIAL (node->decl) != error_mark_node
+		  || !flag_wpa)
+		output_constructor (node);
+	      else
+		copy_function_or_variable (node);
+	      gcc_assert (lto_get_out_decl_state () == decl_state);
+	      lto_pop_out_decl_state ();
+	      lto_record_function_out_decl_state (node->decl, decl_state);
+	    }
 	}
     }
 
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 212426)
+++ lto-streamer-in.c	(working copy)
@@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
   pop_cfun ();
 }
 
+/* Read the body of function FN_DECL from DATA_IN using input block IB.  */
+
+static void
+input_constructor (tree var, struct data_in *data_in,
+		   struct lto_input_block *ib)
+{
+  DECL_INITIAL (var) = stream_read_tree (ib, data_in);
+}
+
 
 /* Read the body from DATA for function NODE and fill it in.
    FILE_DATA are the global decls and types.  SECTION_TYPE is either
@@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
    that function.  */
 
 static void
-lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
-	       const char *data, enum lto_section_type section_type)
+lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
+			      const char *data, enum lto_section_type section_type)
 {
   const struct lto_function_header *header;
   struct data_in *data_in;
@@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
   tree fn_decl = node->decl;
 
   header = (const struct lto_function_header *) data;
-  cfg_offset = sizeof (struct lto_function_header);
-  main_offset = cfg_offset + header->cfg_size;
-  string_offset = main_offset + header->main_size;
-
-  LTO_INIT_INPUT_BLOCK (ib_cfg,
-		        data + cfg_offset,
-			0,
-			header->cfg_size);
-
-  LTO_INIT_INPUT_BLOCK (ib_main,
-			data + main_offset,
-			0,
-			header->main_size);
+  if (TREE_CODE (node->decl) == FUNCTION_DECL)
+    {
+      cfg_offset = sizeof (struct lto_function_header);
+      main_offset = cfg_offset + header->cfg_size;
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_cfg,
+			    data + cfg_offset,
+			    0,
+			    header->cfg_size);
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
+  else
+    {
+      main_offset = sizeof (struct lto_function_header);
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      if (TREE_CODE (node->decl) == FUNCTION_DECL)
+        input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      else
+        input_constructor (fn_decl, data_in, &ib_main);
       /* And fixup types we streamed locally.  */
 	{
 	  struct streamer_tree_cache_d *cache = data_in->reader_cache;
@@ -1124,7 +1149,17 @@ void
 lto_input_function_body (struct lto_file_decl_data *file_data,
 			 struct cgraph_node *node, const char *data)
 {
-  lto_read_body (file_data, node, data, LTO_section_function_body);
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
+}
+
+/* Read the body of NODE using DATA.  FILE_DATA holds the global
+   decls and types.  */
+
+void
+lto_input_variable_constructor (struct lto_file_decl_data *file_data,
+				struct varpool_node *node, const char *data)
+{
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
 }
 
 
Index: ipa-prop.c
===================================================================
--- ipa-prop.c	(revision 212426)
+++ ipa-prop.c	(working copy)
@@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void)
 
   ob = create_output_block (LTO_section_jump_functions);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {
@@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void
 
   ob = create_output_block (LTO_section_ipcp_transform);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]