This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Use separate sections to stream non-trivial constructors
- From: Richard Biener <rguenther at suse dot de>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 11 Jul 2014 13:29:53 +0200 (CEST)
- Subject: Re: Use separate sections to stream non-trivial constructors
- Authentication-results: sourceware.org; auth=none
- References: <20140711091810 dot GC30037 at kam dot mff dot cuni dot cz>
On Fri, 11 Jul 2014, Jan Hubicka wrote:
> Hi,
> since we both agreed offlining constructors from global decl stream is a good
> idea, I went ahead and implemented it. I would like to followup by an
> cleanups; for example the sections are still tagged as function sections, but I
> would like to do it incrementally. There is quite some uglyness in the way we
> handle function sections and the patch started to snowball very quickly.
>
> The patch conceptually copies what we do for functions and re-uses most of
> infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
> getting function in) and it is used by output machinery, by ipa-visibility
> while rewritting the constructor and by ctor_for_folding (which makes us to
> load the ctor whenever it is needed by ipa-cp or ipa-devirt).
>
> I kept get_symbol_initial_value as an authority to decide if we want to encode
> given constructor or not. The section itself for trivial ctor is about 25
> bytes and with header it is probably close to double of it. Currently the heuristic
> is to offline only constructors that are CONSTRUCTOR and keep simple expressions
> inline. We may want to tweak it.
Hmm, so what about artificial testcase with gazillions of
struct X { int i; };
struct X a0001 = { 1 };
struct X a0002 = { 2 };
....
how does it explode LTO IL size and streaming time (compile-out and
LTRANS in)? I suppose it still helps WPA stage.
Also what we desparately miss is to put CONST_DECLs into the symbol
table (and thus eventually move the constant pool to symtab). That
and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
with STRING_CST initializers (to fix PR50199).
> The patch does not bring miraculous savings to firefox WPA, but it does some:
>
> GGC memory after global stream is read goes from 1376898k to 1250533k
> overall GGC allocations from 4156478 kB to 4012462 kB
> read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
> 20997206 tree bodies read in total -> 18584194 tree bodies read in total
> Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
> Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes
>
> Things would be better if ipa-visibility and ipa-devirt did not load most of
> the virtual tables into memory (still better than loading each into memory 20
> times at average). I will work on that incrementally. We load 10311 ctors into
> memory at WPA time.
>
> Note that firefox seems to feature really huge data segment these days.
> http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
>
> Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap
> in progress, OK?
The patch looks ok to me. How about simply doing
s/LTO_section_function_body/LTO_section_symbol_content/ instead of
adding LTO_section_variable_initializer?
Thanks,
Richard.
> * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
> (varpool_get_constructor): New function.
> (ctor_for_folding): Use it.
> (varpool_assemble_decl): Likewise.
> * lto-streamer.h (struct output_block): Turn cgraph_node
> to symbol filed.
> (lto_input_variable_constructor): Declare.
> * ipa-visibility.c (function_and_variable_visibility): Use
> varpool_get_constructor.
> * cgraph.h (varpool_get_constructor): Declare.
> * lto-streamer-out.c (get_symbol_initial_value): Take encoder
> parameter; return error_mark_node for non-trivial constructors.
> (lto_write_tree_1, DFS_write_tree): UPdate use of
> get_symbol_initial_value.
> (output_function): Update initialization of symbol.
> (output_constructor): New function.
> (copy_function): Rename to ..
> (copy_function_or_variable): ... this one; handle vars too.
> (lto_output): Output variable sections.
> * lto-streamer-in.c (input_constructor): New function.
> (lto_read_body): Rename from ...
> (lto_read_body_or_constructor): ... this one; handle vars
> too.
> (lto_input_variable_constructor): New function.
> * ipa-prop.c (ipa_prop_write_jump_functions,
> ipa_prop_write_all_agg_replacement): Update.
> Index: varpool.c
> ===================================================================
> --- varpool.c (revision 212426)
> +++ varpool.c (working copy)
> @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
> #include "gimple-expr.h"
> #include "flags.h"
> #include "pointer-set.h"
> +#include "tree-ssa-alias.h"
> +#include "gimple.h"
> +#include "lto-streamer.h"
>
> const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
> "tls-global-dynamic", "tls-local-dynamic",
> @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
> return NULL;
> }
>
> +/* When doing LTO, read NODE's constructor from disk if it is not already present. */
> +
> +tree
> +varpool_get_constructor (struct varpool_node *node)
> +{
> + struct lto_file_decl_data *file_data;
> + const char *data, *name;
> + size_t len;
> + tree decl = node->decl;
> +
> + if (DECL_INITIAL (node->decl) != error_mark_node
> + || !in_lto_p)
> + return DECL_INITIAL (node->decl);
> +
> + file_data = node->lto_file_data;
> + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +
> + /* We may have renamed the declaration, e.g., a static function. */
> + name = lto_get_decl_name_mapping (file_data, name);
> +
> + data = lto_get_section_data (file_data, LTO_section_function_body,
> + name, &len);
> + if (!data)
> + fatal_error ("%s: section %s is missing",
> + file_data->file_name,
> + name);
> +
> + lto_input_variable_constructor (file_data, node, data);
> + lto_stats.num_function_bodies++;
> + lto_free_section_data (file_data, LTO_section_function_body, name,
> + data, len);
> + lto_free_function_in_decl_state_for_node (node);
> + return DECL_INITIAL (node->decl);
> +}
> +
> /* Return if DECL is constant and its initial value is known (so we can do
> constant folding using DECL_INITIAL (decl)).
> Return ERROR_MARK_NODE when value is unknown. */
> @@ -314,6 +352,9 @@ ctor_for_folding (tree decl)
> if (DECL_VIRTUAL_P (real_decl))
> {
> gcc_checking_assert (TREE_READONLY (real_decl));
> + if (DECL_INITIAL (real_decl) == error_mark_node
> + && (node = varpool_get_node (real_decl)))
> + return varpool_get_constructor (node);
> if (DECL_INITIAL (real_decl))
> return DECL_INITIAL (real_decl);
> else
> @@ -349,6 +390,9 @@ ctor_for_folding (tree decl)
>
> ??? Previously we behaved so for scalar variables but not for array
> accesses. */
> + if (DECL_INITIAL (real_decl) == error_mark_node
> + && (node = varpool_get_node (real_decl)))
> + return varpool_get_constructor (node);
> return DECL_INITIAL (real_decl);
> }
>
> @@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod
> if (!node->in_other_partition
> && !DECL_EXTERNAL (decl))
> {
> + varpool_get_constructor (node);
> assemble_variable (decl, 0, 1, 0);
> gcc_assert (TREE_ASM_WRITTEN (decl));
> node->definition = true;
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h (revision 212426)
> +++ lto-streamer.h (working copy)
> @@ -685,9 +685,9 @@ struct output_block
> far and the indexes assigned to them. */
> hash_table<string_slot_hasher> *string_hash_table;
>
> - /* The current cgraph_node that we are currently serializing. Null
> + /* The current symbol that we are currently serializing. Null
> if we are serializing something else. */
> - struct cgraph_node *cgraph_node;
> + struct symtab_node *symbol;
>
> /* These are the last file and line that were seen in the stream.
> If the current node differs from these, it needs to insert
> @@ -830,6 +830,9 @@ extern void lto_reader_init (void);
> extern void lto_input_function_body (struct lto_file_decl_data *,
> struct cgraph_node *,
> const char *);
> +extern void lto_input_variable_constructor (struct lto_file_decl_data *,
> + struct varpool_node *,
> + const char *);
> extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
> const char *);
> extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
> Index: ipa-visibility.c
> ===================================================================
> --- ipa-visibility.c (revision 212426)
> +++ ipa-visibility.c (working copy)
> @@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
> if (found)
> {
> struct pointer_set_t *visited_nodes = pointer_set_create ();
> +
> + varpool_get_constructor (vnode);
> walk_tree (&DECL_INITIAL (vnode->decl),
> update_vtable_references, NULL, visited_nodes);
> pointer_set_destroy (visited_nodes);
> Index: cgraph.h
> ===================================================================
> --- cgraph.h (revision 212426)
> +++ cgraph.h (working copy)
> @@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree);
> void symtab_initialize_asm_name_hash (void);
> void symtab_prevail_in_asm_name_hash (symtab_node *node);
> void varpool_remove_initializer (varpool_node *);
> +tree varpool_get_constructor (struct varpool_node *node);
>
> /* In cgraph.c */
> extern void change_decl_assembler_name (tree, tree);
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c (revision 212426)
> +++ lto-streamer-out.c (working copy)
> @@ -318,7 +319,7 @@ lto_is_streamable (tree expr)
> /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL. */
>
> static tree
> -get_symbol_initial_value (struct output_block *ob, tree expr)
> +get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
> {
> gcc_checking_assert (DECL_P (expr)
> && TREE_CODE (expr) != FUNCTION_DECL
> @@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_
> && !DECL_IN_CONSTANT_POOL (expr)
> && initial)
> {
> - lto_symtab_encoder_t encoder;
> varpool_node *vnode;
> -
> - encoder = ob->decl_state->symtab_node_encoder;
> - vnode = varpool_get_node (expr);
> - if (!vnode
> - || !lto_symtab_encoder_encode_initializer_p (encoder,
> - vnode))
> - initial = error_mark_node;
> + /* Extra section needs about 30 bytes; do not produce it for simple
> + scalar values. */
> + if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
> + || !(vnode = varpool_get_node (expr))
> + || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
> + initial = error_mark_node;
> }
>
> return initial;
> @@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o
> && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
> {
> /* Handle DECL_INITIAL for symbols. */
> - tree initial = get_symbol_initial_value (ob, expr);
> + tree initial = get_symbol_initial_value
> + (ob->decl_state->symtab_node_encoder, expr);
> stream_write_tree (ob, initial, ref_p);
> }
> }
> @@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob,
> && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
> {
> /* Handle DECL_INITIAL for symbols. */
> - tree initial = get_symbol_initial_value (ob, expr);
> + tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
> + expr);
> DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
> }
> }
> @@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod
> ob = create_output_block (LTO_section_function_body);
>
> clear_line_info (ob);
> - ob->cgraph_node = node;
> + ob->symbol = node;
>
> gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
>
> @@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod
> destroy_output_block (ob);
> }
>
> +/* Output the body of function NODE->DECL. */
> +
> +static void
> +output_constructor (struct varpool_node *node)
> +{
> + tree var = node->decl;
> + struct output_block *ob;
> +
> + ob = create_output_block (LTO_section_function_body);
> +
> + clear_line_info (ob);
> + ob->symbol = node;
> +
> + /* Make string 0 be a NULL string. */
> + streamer_write_char_stream (ob->string_stream, 0);
> +
> + /* Output DECL_INITIAL for the function, which contains the tree of
> + lexical scopes. */
> + stream_write_tree (ob, DECL_INITIAL (var), true);
> +
> + /* Create a section to hold the pickled output of this function. */
> + produce_asm (ob, var);
> +
> + destroy_output_block (ob);
> +}
> +
>
> /* Emit toplevel asms. */
>
> @@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void)
> }
>
>
> -/* Copy the function body of NODE without deserializing. */
> +/* Copy the function body or variable constructor of NODE without deserializing. */
>
> static void
> -copy_function (struct cgraph_node *node)
> +copy_function_or_variable (struct symtab_node *node)
> {
> tree function = node->decl;
> struct lto_file_decl_data *file_data = node->lto_file_data;
> @@ -2072,7 +2190,7 @@ lto_output (void)
> if (gimple_has_body_p (node->decl) || !flag_wpa)
> output_function (node);
> else
> - copy_function (node);
> + copy_function_or_variable (node);
> gcc_assert (lto_get_out_decl_state () == decl_state);
> lto_pop_out_decl_state ();
> lto_record_function_out_decl_state (node->decl, decl_state);
> @@ -2085,6 +2203,25 @@ lto_output (void)
> tree ctor = DECL_INITIAL (node->decl);
> if (ctor && !in_lto_p)
> walk_tree (&ctor, wrap_refs, NULL, NULL);
> + if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
> + && lto_symtab_encoder_encode_initializer_p (encoder, node)
> + && !node->alias)
> + {
> +#ifdef ENABLE_CHECKING
> + gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
> + bitmap_set_bit (output, DECL_UID (node->decl));
> +#endif
> + decl_state = lto_new_out_decl_state ();
> + lto_push_out_decl_state (decl_state);
> + if (DECL_INITIAL (node->decl) != error_mark_node
> + || !flag_wpa)
> + output_constructor (node);
> + else
> + copy_function_or_variable (node);
> + gcc_assert (lto_get_out_decl_state () == decl_state);
> + lto_pop_out_decl_state ();
> + lto_record_function_out_decl_state (node->decl, decl_state);
> + }
> }
> }
>
> Index: lto-streamer-in.c
> ===================================================================
> --- lto-streamer-in.c (revision 212426)
> +++ lto-streamer-in.c (working copy)
> @@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
> pop_cfun ();
> }
>
> +/* Read the body of function FN_DECL from DATA_IN using input block IB. */
> +
> +static void
> +input_constructor (tree var, struct data_in *data_in,
> + struct lto_input_block *ib)
> +{
> + DECL_INITIAL (var) = stream_read_tree (ib, data_in);
> +}
> +
>
> /* Read the body from DATA for function NODE and fill it in.
> FILE_DATA are the global decls and types. SECTION_TYPE is either
> @@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
> that function. */
>
> static void
> -lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
> - const char *data, enum lto_section_type section_type)
> +lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
> + const char *data, enum lto_section_type section_type)
> {
> const struct lto_function_header *header;
> struct data_in *data_in;
> @@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
> tree fn_decl = node->decl;
>
> header = (const struct lto_function_header *) data;
> - cfg_offset = sizeof (struct lto_function_header);
> - main_offset = cfg_offset + header->cfg_size;
> - string_offset = main_offset + header->main_size;
> -
> - LTO_INIT_INPUT_BLOCK (ib_cfg,
> - data + cfg_offset,
> - 0,
> - header->cfg_size);
> -
> - LTO_INIT_INPUT_BLOCK (ib_main,
> - data + main_offset,
> - 0,
> - header->main_size);
> + if (TREE_CODE (node->decl) == FUNCTION_DECL)
> + {
> + cfg_offset = sizeof (struct lto_function_header);
> + main_offset = cfg_offset + header->cfg_size;
> + string_offset = main_offset + header->main_size;
> +
> + LTO_INIT_INPUT_BLOCK (ib_cfg,
> + data + cfg_offset,
> + 0,
> + header->cfg_size);
> +
> + LTO_INIT_INPUT_BLOCK (ib_main,
> + data + main_offset,
> + 0,
> + header->main_size);
> + }
> + else
> + {
> + main_offset = sizeof (struct lto_function_header);
> + string_offset = main_offset + header->main_size;
> +
> + LTO_INIT_INPUT_BLOCK (ib_main,
> + data + main_offset,
> + 0,
> + header->main_size);
> + }
>
> data_in = lto_data_in_create (file_data, data + string_offset,
> header->string_size, vNULL);
> @@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
>
> /* Set up the struct function. */
> from = data_in->reader_cache->nodes.length ();
> - input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> + if (TREE_CODE (node->decl) == FUNCTION_DECL)
> + input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> + else
> + input_constructor (fn_decl, data_in, &ib_main);
> /* And fixup types we streamed locally. */
> {
> struct streamer_tree_cache_d *cache = data_in->reader_cache;
> @@ -1124,7 +1149,17 @@ void
> lto_input_function_body (struct lto_file_decl_data *file_data,
> struct cgraph_node *node, const char *data)
> {
> - lto_read_body (file_data, node, data, LTO_section_function_body);
> + lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
> +}
> +
> +/* Read the body of NODE using DATA. FILE_DATA holds the global
> + decls and types. */
> +
> +void
> +lto_input_variable_constructor (struct lto_file_decl_data *file_data,
> + struct varpool_node *node, const char *data)
> +{
> + lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
> }
>
>
> Index: ipa-prop.c
> ===================================================================
> --- ipa-prop.c (revision 212426)
> +++ ipa-prop.c (working copy)
> @@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void)
>
> ob = create_output_block (LTO_section_jump_functions);
> encoder = ob->decl_state->symtab_node_encoder;
> - ob->cgraph_node = NULL;
> + ob->symbol = NULL;
> for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
> lsei_next_function_in_partition (&lsei))
> {
> @@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void
>
> ob = create_output_block (LTO_section_ipcp_transform);
> encoder = ob->decl_state->symtab_node_encoder;
> - ob->cgraph_node = NULL;
> + ob->symbol = NULL;
> for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
> lsei_next_function_in_partition (&lsei))
> {
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer