This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Use separate sections to stream non-trivial constructors


On Fri, 11 Jul 2014, Jan Hubicka wrote:

> Hi,
> since we both agreed offlining constructors from global decl stream is a good
> idea, I went ahead and implemented it.  I would like to followup by an
> cleanups; for example the sections are still tagged as function sections, but I
> would like to do it incrementally. There is quite some uglyness in the way we
> handle function sections and the patch started to snowball very quickly.
> 
> The patch conceptually copies what we do for functions and re-uses most of
> infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
> getting function in) and it is used by output machinery, by ipa-visibility
> while rewritting the constructor and by ctor_for_folding (which makes us to
> load the ctor whenever it is needed by ipa-cp or ipa-devirt).
> 
> I kept get_symbol_initial_value as an authority to decide if we want to encode
> given constructor or not.  The section itself for trivial ctor is about 25
> bytes and with header it is probably close to double of it. Currently the heuristic
> is to offline only constructors that are CONSTRUCTOR and keep simple expressions
> inline.  We may want to tweak it.

Hmm, so what about artificial testcase with gazillions of

struct X { int i; };

struct X a0001 = { 1 };
struct X a0002 = { 2 };
....

how does it explode LTO IL size and streaming time (compile-out and
LTRANS in)?  I suppose it still helps WPA stage.

Also what we desparately miss is to put CONST_DECLs into the symbol 
table (and thus eventually move the constant pool to symtab).  That
and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
with STRING_CST initializers (to fix PR50199).

> The patch does not bring miraculous savings to firefox WPA, but it does some:
> 
> GGC memory after global stream is read goes from 1376898k to 1250533k
> overall GGC allocations from 4156478 kB to 4012462 kB
> read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
> 20997206 tree bodies read in total -> 18584194 tree bodies read in total
> Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
> Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 
> 
> Things would be better if ipa-visibility and ipa-devirt did not load most of
> the virtual tables into memory (still better than loading each into memory 20
> times at average).  I will work on that incrementally. We load 10311 ctors into
> memory at WPA time.
> 
> Note that firefox seems to feature really huge data segment these days.
> http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
> 
> Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
> in progress, OK?

The patch looks ok to me.  How about simply doing 
s/LTO_section_function_body/LTO_section_symbol_content/ instead of
adding LTO_section_variable_initializer?

Thanks,
Richard.

> 	* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
> 	(varpool_get_constructor): New function.
> 	(ctor_for_folding): Use it.
> 	(varpool_assemble_decl): Likewise.
> 	* lto-streamer.h (struct output_block): Turn cgraph_node
> 	to symbol filed.
> 	(lto_input_variable_constructor): Declare.
> 	* ipa-visibility.c (function_and_variable_visibility): Use
> 	varpool_get_constructor.
> 	* cgraph.h (varpool_get_constructor): Declare.
> 	* lto-streamer-out.c (get_symbol_initial_value): Take encoder
> 	parameter; return error_mark_node for non-trivial constructors.
> 	(lto_write_tree_1, DFS_write_tree): UPdate use of
> 	get_symbol_initial_value.
> 	(output_function): Update initialization of symbol.
> 	(output_constructor): New function.
> 	(copy_function): Rename to ..
> 	(copy_function_or_variable): ... this one; handle vars too.
> 	(lto_output): Output variable sections.
> 	* lto-streamer-in.c (input_constructor): New function.
> 	(lto_read_body): Rename from ...
> 	(lto_read_body_or_constructor): ... this one; handle vars
> 	too.
> 	(lto_input_variable_constructor): New function.
> 	* ipa-prop.c (ipa_prop_write_jump_functions,
> 	ipa_prop_write_all_agg_replacement): Update.
> Index: varpool.c
> ===================================================================
> --- varpool.c	(revision 212426)
> +++ varpool.c	(working copy)
> @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
>  #include "gimple-expr.h"
>  #include "flags.h"
>  #include "pointer-set.h"
> +#include "tree-ssa-alias.h"
> +#include "gimple.h"
> +#include "lto-streamer.h"
>  
>  const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
>  				      "tls-global-dynamic", "tls-local-dynamic",
> @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
>      return NULL;
>  }
>  
> +/* When doing LTO, read NODE's constructor from disk if it is not already present.  */
> +
> +tree
> +varpool_get_constructor (struct varpool_node *node)
> +{
> +  struct lto_file_decl_data *file_data;
> +  const char *data, *name;
> +  size_t len;
> +  tree decl = node->decl;
> +
> +  if (DECL_INITIAL (node->decl) != error_mark_node
> +      || !in_lto_p)
> +    return DECL_INITIAL (node->decl);
> +
> +  file_data = node->lto_file_data;
> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +
> +  /* We may have renamed the declaration, e.g., a static function.  */
> +  name = lto_get_decl_name_mapping (file_data, name);
> +
> +  data = lto_get_section_data (file_data, LTO_section_function_body,
> +			       name, &len);
> +  if (!data)
> +    fatal_error ("%s: section %s is missing",
> +		 file_data->file_name,
> +		 name);
> +
> +  lto_input_variable_constructor (file_data, node, data);
> +  lto_stats.num_function_bodies++;
> +  lto_free_section_data (file_data, LTO_section_function_body, name,
> +			 data, len);
> +  lto_free_function_in_decl_state_for_node (node);
> +  return DECL_INITIAL (node->decl);
> +}
> +
>  /* Return if DECL is constant and its initial value is known (so we can do
>     constant folding using DECL_INITIAL (decl)).
>     Return ERROR_MARK_NODE when value is unknown.  */
> @@ -314,6 +352,9 @@ ctor_for_folding (tree decl)
>    if (DECL_VIRTUAL_P (real_decl))
>      {
>        gcc_checking_assert (TREE_READONLY (real_decl));
> +      if (DECL_INITIAL (real_decl) == error_mark_node
> +	  && (node = varpool_get_node (real_decl)))
> +	return varpool_get_constructor (node);
>        if (DECL_INITIAL (real_decl))
>  	return DECL_INITIAL (real_decl);
>        else
> @@ -349,6 +390,9 @@ ctor_for_folding (tree decl)
>  
>       ??? Previously we behaved so for scalar variables but not for array
>       accesses.  */
> +  if (DECL_INITIAL (real_decl) == error_mark_node
> +      && (node = varpool_get_node (real_decl)))
> +    return varpool_get_constructor (node);
>    return DECL_INITIAL (real_decl);
>  }
>  
> @@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod
>    if (!node->in_other_partition
>        && !DECL_EXTERNAL (decl))
>      {
> +      varpool_get_constructor (node);
>        assemble_variable (decl, 0, 1, 0);
>        gcc_assert (TREE_ASM_WRITTEN (decl));
>        node->definition = true;
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h	(revision 212426)
> +++ lto-streamer.h	(working copy)
> @@ -685,9 +685,9 @@ struct output_block
>       far and the indexes assigned to them.  */
>    hash_table<string_slot_hasher> *string_hash_table;
>  
> -  /* The current cgraph_node that we are currently serializing.  Null
> +  /* The current symbol that we are currently serializing.  Null
>       if we are serializing something else.  */
> -  struct cgraph_node *cgraph_node;
> +  struct symtab_node *symbol;
>  
>    /* These are the last file and line that were seen in the stream.
>       If the current node differs from these, it needs to insert
> @@ -830,6 +830,9 @@ extern void lto_reader_init (void);
>  extern void lto_input_function_body (struct lto_file_decl_data *,
>  				     struct cgraph_node *,
>  				     const char *);
> +extern void lto_input_variable_constructor (struct lto_file_decl_data *,
> +					    struct varpool_node *,
> +					    const char *);
>  extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
>  					      const char *);
>  extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
> Index: ipa-visibility.c
> ===================================================================
> --- ipa-visibility.c	(revision 212426)
> +++ ipa-visibility.c	(working copy)
> @@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
>  	  if (found)
>  	    {
>  	      struct pointer_set_t *visited_nodes = pointer_set_create ();
> +
> +	      varpool_get_constructor (vnode);
>  	      walk_tree (&DECL_INITIAL (vnode->decl),
>  			 update_vtable_references, NULL, visited_nodes);
>  	      pointer_set_destroy (visited_nodes);
> Index: cgraph.h
> ===================================================================
> --- cgraph.h	(revision 212426)
> +++ cgraph.h	(working copy)
> @@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree);
>  void symtab_initialize_asm_name_hash (void);
>  void symtab_prevail_in_asm_name_hash (symtab_node *node);
>  void varpool_remove_initializer (varpool_node *);
> +tree varpool_get_constructor (struct varpool_node *node);
>  
>  /* In cgraph.c */
>  extern void change_decl_assembler_name (tree, tree);
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c	(revision 212426)
> +++ lto-streamer-out.c	(working copy)
> @@ -318,7 +319,7 @@ lto_is_streamable (tree expr)
>  /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  */
>  
>  static tree
> -get_symbol_initial_value (struct output_block *ob, tree expr)
> +get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
>  {
>    gcc_checking_assert (DECL_P (expr)
>  		       && TREE_CODE (expr) != FUNCTION_DECL
> @@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_
>        && !DECL_IN_CONSTANT_POOL (expr)
>        && initial)
>      {
> -      lto_symtab_encoder_t encoder;
>        varpool_node *vnode;
> -
> -      encoder = ob->decl_state->symtab_node_encoder;
> -      vnode = varpool_get_node (expr);
> -      if (!vnode
> -	  || !lto_symtab_encoder_encode_initializer_p (encoder,
> -						       vnode))
> -	initial = error_mark_node;
> +      /* Extra section needs about 30 bytes; do not produce it for simple
> +	 scalar values.  */
> +      if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
> +	  || !(vnode = varpool_get_node (expr))
> +	  || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
> +        initial = error_mark_node;
>      }
>  
>    return initial;
> @@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o
>        && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
>      {
>        /* Handle DECL_INITIAL for symbols.  */
> -      tree initial = get_symbol_initial_value (ob, expr);
> +      tree initial = get_symbol_initial_value
> +			 (ob->decl_state->symtab_node_encoder, expr);
>        stream_write_tree (ob, initial, ref_p);
>      }
>  }
> @@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob,
>  	      && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
>  	    {
>  	      /* Handle DECL_INITIAL for symbols.  */
> -	      tree initial = get_symbol_initial_value (ob, expr);
> +	      tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
> +						       expr);
>  	      DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
>  	    }
>  	}
> @@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod
>    ob = create_output_block (LTO_section_function_body);
>  
>    clear_line_info (ob);
> -  ob->cgraph_node = node;
> +  ob->symbol = node;
>  
>    gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
>  
> @@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod
>    destroy_output_block (ob);
>  }
>  
> +/* Output the body of function NODE->DECL.  */
> +
> +static void
> +output_constructor (struct varpool_node *node)
> +{
> +  tree var = node->decl;
> +  struct output_block *ob;
> +
> +  ob = create_output_block (LTO_section_function_body);
> +
> +  clear_line_info (ob);
> +  ob->symbol = node;
> +
> +  /* Make string 0 be a NULL string.  */
> +  streamer_write_char_stream (ob->string_stream, 0);
> +
> +  /* Output DECL_INITIAL for the function, which contains the tree of
> +     lexical scopes.  */
> +  stream_write_tree (ob, DECL_INITIAL (var), true);
> +
> +  /* Create a section to hold the pickled output of this function.   */
> +  produce_asm (ob, var);
> +
> +  destroy_output_block (ob);
> +}
> +
>  
>  /* Emit toplevel asms.  */
>  
> @@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void)
>  }
>  
>  
> -/* Copy the function body of NODE without deserializing. */
> +/* Copy the function body or variable constructor of NODE without deserializing. */
>  
>  static void
> -copy_function (struct cgraph_node *node)
> +copy_function_or_variable (struct symtab_node *node)
>  {
>    tree function = node->decl;
>    struct lto_file_decl_data *file_data = node->lto_file_data;
> @@ -2072,7 +2190,7 @@ lto_output (void)
>  	      if (gimple_has_body_p (node->decl) || !flag_wpa)
>  		output_function (node);
>  	      else
> -		copy_function (node);
> +		copy_function_or_variable (node);
>  	      gcc_assert (lto_get_out_decl_state () == decl_state);
>  	      lto_pop_out_decl_state ();
>  	      lto_record_function_out_decl_state (node->decl, decl_state);
> @@ -2085,6 +2203,25 @@ lto_output (void)
>  	  tree ctor = DECL_INITIAL (node->decl);
>  	  if (ctor && !in_lto_p)
>  	    walk_tree (&ctor, wrap_refs, NULL, NULL);
> +	  if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
> +	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
> +	      && !node->alias)
> +	    {
> +#ifdef ENABLE_CHECKING
> +	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
> +	      bitmap_set_bit (output, DECL_UID (node->decl));
> +#endif
> +	      decl_state = lto_new_out_decl_state ();
> +	      lto_push_out_decl_state (decl_state);
> +	      if (DECL_INITIAL (node->decl) != error_mark_node
> +		  || !flag_wpa)
> +		output_constructor (node);
> +	      else
> +		copy_function_or_variable (node);
> +	      gcc_assert (lto_get_out_decl_state () == decl_state);
> +	      lto_pop_out_decl_state ();
> +	      lto_record_function_out_decl_state (node->decl, decl_state);
> +	    }
>  	}
>      }
>  
> Index: lto-streamer-in.c
> ===================================================================
> --- lto-streamer-in.c	(revision 212426)
> +++ lto-streamer-in.c	(working copy)
> @@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
>    pop_cfun ();
>  }
>  
> +/* Read the body of function FN_DECL from DATA_IN using input block IB.  */
> +
> +static void
> +input_constructor (tree var, struct data_in *data_in,
> +		   struct lto_input_block *ib)
> +{
> +  DECL_INITIAL (var) = stream_read_tree (ib, data_in);
> +}
> +
>  
>  /* Read the body from DATA for function NODE and fill it in.
>     FILE_DATA are the global decls and types.  SECTION_TYPE is either
> @@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
>     that function.  */
>  
>  static void
> -lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
> -	       const char *data, enum lto_section_type section_type)
> +lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
> +			      const char *data, enum lto_section_type section_type)
>  {
>    const struct lto_function_header *header;
>    struct data_in *data_in;
> @@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
>    tree fn_decl = node->decl;
>  
>    header = (const struct lto_function_header *) data;
> -  cfg_offset = sizeof (struct lto_function_header);
> -  main_offset = cfg_offset + header->cfg_size;
> -  string_offset = main_offset + header->main_size;
> -
> -  LTO_INIT_INPUT_BLOCK (ib_cfg,
> -		        data + cfg_offset,
> -			0,
> -			header->cfg_size);
> -
> -  LTO_INIT_INPUT_BLOCK (ib_main,
> -			data + main_offset,
> -			0,
> -			header->main_size);
> +  if (TREE_CODE (node->decl) == FUNCTION_DECL)
> +    {
> +      cfg_offset = sizeof (struct lto_function_header);
> +      main_offset = cfg_offset + header->cfg_size;
> +      string_offset = main_offset + header->main_size;
> +
> +      LTO_INIT_INPUT_BLOCK (ib_cfg,
> +			    data + cfg_offset,
> +			    0,
> +			    header->cfg_size);
> +
> +      LTO_INIT_INPUT_BLOCK (ib_main,
> +			    data + main_offset,
> +			    0,
> +			    header->main_size);
> +    }
> +  else
> +    {
> +      main_offset = sizeof (struct lto_function_header);
> +      string_offset = main_offset + header->main_size;
> +
> +      LTO_INIT_INPUT_BLOCK (ib_main,
> +			    data + main_offset,
> +			    0,
> +			    header->main_size);
> +    }
>  
>    data_in = lto_data_in_create (file_data, data + string_offset,
>  			      header->string_size, vNULL);
> @@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
>  
>        /* Set up the struct function.  */
>        from = data_in->reader_cache->nodes.length ();
> -      input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> +      if (TREE_CODE (node->decl) == FUNCTION_DECL)
> +        input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> +      else
> +        input_constructor (fn_decl, data_in, &ib_main);
>        /* And fixup types we streamed locally.  */
>  	{
>  	  struct streamer_tree_cache_d *cache = data_in->reader_cache;
> @@ -1124,7 +1149,17 @@ void
>  lto_input_function_body (struct lto_file_decl_data *file_data,
>  			 struct cgraph_node *node, const char *data)
>  {
> -  lto_read_body (file_data, node, data, LTO_section_function_body);
> +  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
> +}
> +
> +/* Read the body of NODE using DATA.  FILE_DATA holds the global
> +   decls and types.  */
> +
> +void
> +lto_input_variable_constructor (struct lto_file_decl_data *file_data,
> +				struct varpool_node *node, const char *data)
> +{
> +  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
>  }
>  
>  
> Index: ipa-prop.c
> ===================================================================
> --- ipa-prop.c	(revision 212426)
> +++ ipa-prop.c	(working copy)
> @@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void)
>  
>    ob = create_output_block (LTO_section_jump_functions);
>    encoder = ob->decl_state->symtab_node_encoder;
> -  ob->cgraph_node = NULL;
> +  ob->symbol = NULL;
>    for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
>         lsei_next_function_in_partition (&lsei))
>      {
> @@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void
>  
>    ob = create_output_block (LTO_section_ipcp_transform);
>    encoder = ob->decl_state->symtab_node_encoder;
> -  ob->cgraph_node = NULL;
> +  ob->symbol = NULL;
>    for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
>         lsei_next_function_in_partition (&lsei))
>      {
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]