This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Some minor WPA effectivity issues
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 29 Apr 2010 11:53:15 +0200
- Subject: Some minor WPA effectivity issues
Hi,
WPA needed about 100MB of GGC memory to compile SPEC2000 GCC. This has
recently dropped to about 60MB since we no longer pickle all unused decls.
Still there is room for improvement.
I've added timevars for various tasks and noticed that wpa is outputting
all static varibales at the end of compilation to unused assembly file
that is fixed now. Also we never ggc collected during whole WPA
compilation even if we produce resonable amount of garbage.
I had to push streamer datastructures into GTY in order to garbage collect
after declaration merging. Hope it is acceptable.
The allocation at WPA IPA time is as follows:
lto-streamer-in.c:2338 (lto_input_ts_constructor 787768:12.0% 1157536:16.4% 787768: 4.3% 651200:18.6% 5894
lto/lto.c:173 (lto_read_in_decl_state) 576: 0.0% 0: 0.0% 945904: 5.1% 147448: 4.2% 14889
tree.c:1178 (build_int_cst_wide) 757808:11.5% 0: 0.0% 1034080: 5.6% 588960:16.8% 363
ipa-prop.c:2072 (ipa_read_node_info) 0: 0.0% 0: 0.0% 1552512: 8.4% 27776: 0.8% 19765
cgraph.c:444 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 1556480: 8.4% 0: 0.0% 4864
cgraph.c:971 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 2283840:12.4% 0: 0.0% 21960
lto-streamer-in.c:1939 (lto_materialize_tree) 4398200:66.8% 0: 0.0% 5117880:27.8% 163424: 4.7% 91147
Total 6588064 7067120 18423344 3504312 238564
So 4MB of trees gets garbage collected, overall abut 6.5MB). We are currently
dangling the pointers to declarations that was merged away. I will look into
this later.
18MB of memory is alive, mostly by trees and cgraph edges. I thin it is
resonable for source base of overall 20MB of .c files. We can definitly do
something to put the other datastructures on diet. Most of cgraph edges and nodes
are actually dead waiting in the reuse list by Martin's patch.
I guess ipa-prop also should not consume 8% of memory.
WPA stage needs 13 seconds out of 260 seconds compilation (that is about the
time needed for -flto). Half of time is spent by executing assembler in ltrans
shipment. This can be avoided by simply streaming into ltrans in our own
format instead of ELF .o. So with some care we should be pretty effective
here.
I wonder why we need 50MB of temporary caches to read decls in.
garbage collection : 0.13 ( 2%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc
varpool construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
ipa lto gimple I/O : 0.36 ( 6%) usr 0.03 ( 8%) sys 0.49 ( 0%) wall 729 kB ( 1%) ggc
ipa lto decl I/O : 5.22 (84%) usr 0.03 ( 8%) sys 5.78 ( 2%) wall 47547 kB (76%) ggc
ipa lto decl initr I/O: 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 2361 kB ( 4%) ggc
ipa lto cgraph I/O : 0.03 ( 0%) usr 0.01 ( 3%) sys 0.04 ( 0%) wall 3990 kB ( 6%) ggc
ipa lto decl merge : 0.09 ( 1%) usr 0.01 ( 3%) sys 0.10 ( 0%) wall 41 kB ( 0%) ggc
ipa lto cgraph merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
whopr wpa : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 773 kB ( 1%) ggc
whopr wpa I/O : 0.16 ( 3%) usr 0.26 (68%) sys 6.21 ( 2%) wall 0 kB ( 0%) ggc
whopr wpa->ltrans : 0.00 ( 0%) usr 0.00 ( 0%) sys 250.65 (95%) wall 0 kB ( 0%) ggc
ipa pure const : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
parser : 0.00 ( 0%) usr 0.03 ( 8%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
inline heuristics : 0.15 ( 2%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 6093 kB (10%) ggc
callgraph verifier : 0.00 ( 0%) usr 0.01 ( 3%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
TOTAL : 6.25 0.38 263.78 62639 kB
Bootstrapped/regtested x86_64-linux, OK?
* gengtype.c (open_base_files): Add lto-streamer.h
* cgraph.h (cgraph_local_info): lto_file_data is now in GGC.
(pass_ipa_cp): GGC collect.
* toplev. (compile_file): Do not output symbols.
* ipa-inline.c (pass_ipa_inline): Add ggc collect.
* timevar.def (TV_VARPOOL, TV_IPA_LTO_DECL_INIT_IO,
TV_IPA_LTO_DECL_MERGE, TV_IPA_LTO_CGRAPH_MERGE, TV_VAROUT): New.
* lto-section-in.c: Include ggc.h
(lto_new_in_decl_state): Alloc in GGC.
(lto_delete_in_decl_state): Likewise.
* ipa.c (pass_ipa_function_visibility, pass_ipa_whole_program): Collect.
* lto/lto.c (lto_read_in_decl_state): Use GGC.
(lto_wpa_write_files): Announce what we are writting.
(all_file_decl_data): New.
(read_cgraph_and_symbols): Use GGC; correct timevars.
(do_whole_program_analysis): Collect.
* lto/Make-lang.in (lto.o): Fix dependency.
* Makefile.in (GTFILES): Add lto-streamer.h.
* varpool.c (varpool_analyze_pending_decls): Use TV_VARPOOL.
(varpool_assemble_pending_decls): Use VAROUT.
* lto-streamer.h (lto_tree_ref_table): Annotate.
(lto_in_decl_state): Annotate.
(lto_file_decl_data): Annotate.
Index: gengtype.c
===================================================================
--- gengtype.c (revision 158853)
+++ gengtype.c (working copy)
@@ -1571,7 +1571,7 @@ open_base_files (void)
"optabs.h", "libfuncs.h", "debug.h", "ggc.h", "cgraph.h",
"tree-flow.h", "reload.h", "cpp-id-data.h", "tree-chrec.h",
"cfglayout.h", "except.h", "output.h", "gimple.h", "cfgloop.h",
- "target.h", "ipa-prop.h", NULL
+ "target.h", "ipa-prop.h", "lto-streamer.h", NULL
};
const char *const *ifp;
outf_p gtype_desc_c;
Index: cgraph.h
===================================================================
--- cgraph.h (revision 158854)
+++ cgraph.h (working copy)
@@ -87,7 +87,7 @@ struct GTY(()) cgraph_thunk_info {
struct GTY(()) cgraph_local_info {
/* File stream where this node is being written to. */
- struct lto_file_decl_data * GTY ((skip)) lto_file_data;
+ struct lto_file_decl_data * lto_file_data;
struct inline_summary inline_summary;
Index: ipa-cp.c
===================================================================
--- ipa-cp.c (revision 158854)
+++ ipa-cp.c (working copy)
@@ -1340,7 +1340,7 @@ struct ipa_opt_pass_d pass_ipa_cp =
0, /* properties_destroyed */
0, /* todo_flags_start */
TODO_dump_cgraph | TODO_dump_func |
- TODO_remove_functions /* todo_flags_finish */
+ TODO_remove_functions | TODO_ggc_collect /* todo_flags_finish */
},
ipcp_generate_summary, /* generate_summary */
ipcp_write_summary, /* write_summary */
Index: toplev.c
===================================================================
--- toplev.c (revision 158853)
+++ toplev.c (working copy)
@@ -1056,7 +1056,7 @@ compile_file (void)
what's left of the symbol table output. */
timevar_pop (TV_PARSE);
- if (flag_syntax_only)
+ if (flag_syntax_only || flag_wpa)
return;
ggc_protect_identifiers = false;
Index: ipa-inline.c
===================================================================
--- ipa-inline.c (revision 158854)
+++ ipa-inline.c (working copy)
@@ -2134,7 +2134,7 @@ struct ipa_opt_pass_d pass_ipa_inline =
0, /* properties_destroyed */
TODO_remove_functions, /* todo_flags_finish */
TODO_dump_cgraph | TODO_dump_func
- | TODO_remove_functions /* todo_flags_finish */
+ | TODO_remove_functions | TODO_ggc_collect /* todo_flags_finish */
},
inline_generate_summary, /* generate_summary */
inline_write_summary, /* write_summary */
Index: timevar.def
===================================================================
--- timevar.def (revision 158853)
+++ timevar.def (working copy)
@@ -49,10 +49,14 @@ DEFTIMEVAR (TV_PCH_CPP_RESTORE , "
DEFTIMEVAR (TV_CGRAPH , "callgraph construction")
DEFTIMEVAR (TV_CGRAPHOPT , "callgraph optimization")
+DEFTIMEVAR (TV_VARPOOL , "varpool construction")
DEFTIMEVAR (TV_IPA_CONSTANT_PROP , "ipa cp")
DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IO , "ipa lto gimple I/O")
DEFTIMEVAR (TV_IPA_LTO_DECL_IO , "ipa lto decl I/O")
+DEFTIMEVAR (TV_IPA_LTO_DECL_INIT_IO , "ipa lto decl init I/O")
DEFTIMEVAR (TV_IPA_LTO_CGRAPH_IO , "ipa lto cgraph I/O")
+DEFTIMEVAR (TV_IPA_LTO_DECL_MERGE , "ipa lto decl merge")
+DEFTIMEVAR (TV_IPA_LTO_CGRAPH_MERGE , "ipa lto cgraph merge")
DEFTIMEVAR (TV_LTO , "lto")
DEFTIMEVAR (TV_WHOPR_WPA , "whopr wpa")
DEFTIMEVAR (TV_WHOPR_WPA_IO , "whopr wpa I/O")
@@ -216,6 +220,7 @@ DEFTIMEVAR (TV_REORDER_BLOCKS , "
DEFTIMEVAR (TV_SHORTEN_BRANCH , "shorten branches")
DEFTIMEVAR (TV_REG_STACK , "reg stack")
DEFTIMEVAR (TV_FINAL , "final")
+DEFTIMEVAR (TV_VAROUT , "variable output")
DEFTIMEVAR (TV_SYMOUT , "symout")
DEFTIMEVAR (TV_VAR_TRACKING , "variable tracking")
DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-combine")
Index: lto-section-in.c
===================================================================
--- lto-section-in.c (revision 158854)
+++ lto-section-in.c (working copy)
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
#include "output.h"
#include "lto-streamer.h"
#include "lto-compress.h"
+#include "ggc.h"
/* Section names. These must correspond to the values of
enum lto_section_type. */
@@ -433,7 +434,7 @@ lto_new_in_decl_state (void)
{
struct lto_in_decl_state *state;
- state = ((struct lto_in_decl_state *) xmalloc (sizeof (*state)));
+ state = ((struct lto_in_decl_state *) ggc_alloc (sizeof (*state)));
memset (state, 0, sizeof (*state));
return state;
}
@@ -447,8 +448,8 @@ lto_delete_in_decl_state (struct lto_in_
for (i = 0; i < LTO_N_DECL_STREAMS; i++)
if (state->streams[i].trees)
- free (state->streams[i].trees);
- free (state);
+ ggc_free (state->streams[i].trees);
+ ggc_free (state);
}
/* Hashtable helpers. lto_in_decl_states are hash by their function decls. */
Index: ipa.c
===================================================================
--- ipa.c (revision 158854)
+++ ipa.c (working copy)
@@ -537,7 +537,8 @@ struct simple_ipa_opt_pass pass_ipa_func
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
- TODO_remove_functions | TODO_dump_cgraph/* todo_flags_finish */
+ TODO_remove_functions | TODO_dump_cgraph
+ | TODO_ggc_collect /* todo_flags_finish */
}
};
@@ -592,7 +593,8 @@ struct ipa_opt_pass_d pass_ipa_whole_pro
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
- TODO_dump_cgraph | TODO_remove_functions/* todo_flags_finish */
+ TODO_remove_functions | TODO_dump_cgraph
+ | TODO_ggc_collect /* todo_flags_finish */
},
NULL, /* generate_summary */
NULL, /* write_summary */
Index: lto/lto.c
===================================================================
--- lto/lto.c (revision 158854)
+++ lto/lto.c (working copy)
@@ -170,7 +170,7 @@ lto_read_in_decl_state (struct data_in *
for (i = 0; i < LTO_N_DECL_STREAMS; i++)
{
uint32_t size = *data++;
- tree *decls = (tree *) xcalloc (size, sizeof (tree));
+ tree *decls = GGC_NEWVEC (tree, size);
for (j = 0; j < size; j++)
{
@@ -235,7 +235,7 @@ lto_read_decls (struct lto_file_decl_dat
/* Read in per-function decl states and enter them in hash table. */
decl_data->function_decl_states =
- htab_create (37, lto_hash_in_decl_state, lto_eq_in_decl_state, free);
+ htab_create_ggc (37, lto_hash_in_decl_state, lto_eq_in_decl_state, NULL);
for (i = 1; i < num_decl_states; i++)
{
@@ -376,7 +376,7 @@ lto_file_read (lto_file *file, FILE *res
resolutions = lto_resolution_read (resolution_file, file);
- file_data = XCNEW (struct lto_file_decl_data);
+ file_data = GGC_NEW (struct lto_file_decl_data);
file_data->file_name = file->filename;
file_data->section_hash_table = lto_obj_build_section_table (file);
file_data->renaming_hash_table = lto_create_renaming_table ();
@@ -936,6 +936,9 @@ lto_wpa_write_files (void)
if (!file)
fatal_error ("lto_obj_file_open() failed");
+ if (!quiet_flag)
+ fprintf (stderr, " %s", temp_filename);
+
lto_set_current_out_file (file);
ipa_write_optimization_summaries (set, vset);
@@ -1657,6 +1660,7 @@ lto_read_all_file_options (void)
lto_reissue_options ();
}
+static GTY((length ("lto_stats.num_input_files + 1"))) struct lto_file_decl_data **all_file_decl_data;
/* Read all the symbols from the input files FNAMES. NFILES is the
number of files requested in the command line. Instantiate a
@@ -1667,7 +1671,6 @@ static void
read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
{
unsigned int i, last_file_ix;
- struct lto_file_decl_data **all_file_decl_data;
FILE *resolution;
struct cgraph_node *node;
@@ -1676,7 +1679,7 @@ read_cgraph_and_symbols (unsigned nfiles
timevar_push (TV_IPA_LTO_DECL_IO);
/* Set the hooks so that all of the ipa passes can read in their data. */
- all_file_decl_data = XNEWVEC (struct lto_file_decl_data *, nfiles + 1);
+ all_file_decl_data = GGC_CNEWVEC (struct lto_file_decl_data *, nfiles + 1);
lto_set_in_hooks (all_file_decl_data, get_section_data, free_section_data);
/* Read the resolution file. */
@@ -1723,6 +1726,7 @@ read_cgraph_and_symbols (unsigned nfiles
lto_obj_file_close (current_lto_file);
current_lto_file = NULL;
+ ggc_collect ();
}
if (resolution_file_name)
@@ -1733,24 +1737,30 @@ read_cgraph_and_symbols (unsigned nfiles
/* Set the hooks so that all of the ipa passes can read in their data. */
lto_set_in_hooks (all_file_decl_data, get_section_data, free_section_data);
- /* Each pass will set the appropriate timer. */
timevar_pop (TV_IPA_LTO_DECL_IO);
if (!quiet_flag)
fprintf (stderr, "\nReading the callgraph\n");
+ timevar_push (TV_IPA_LTO_CGRAPH_IO);
/* Read the callgraph. */
input_cgraph ();
+ timevar_pop (TV_IPA_LTO_CGRAPH_IO);
if (!quiet_flag)
fprintf (stderr, "Merging declarations\n");
+ timevar_push (TV_IPA_LTO_DECL_MERGE);
/* Merge global decls. */
lto_symtab_merge_decls ();
/* Fixup all decls and types and free the type hash tables. */
lto_fixup_decls (all_file_decl_data);
free_gimple_type_tables ();
+ ggc_collect ();
+
+ timevar_pop (TV_IPA_LTO_DECL_MERGE);
+ /* Each pass will set the appropriate timer. */
if (!quiet_flag)
fprintf (stderr, "Reading summaries\n");
@@ -1762,7 +1772,9 @@ read_cgraph_and_symbols (unsigned nfiles
ipa_read_summaries ();
/* Finally merge the cgraph according to the decl merging decisions. */
+ timevar_push (TV_IPA_LTO_CGRAPH_MERGE);
lto_symtab_merge_cgraph_nodes ();
+ ggc_collect ();
if (flag_ltrans)
for (node = cgraph_nodes; node; node = node->next)
@@ -1776,8 +1788,9 @@ read_cgraph_and_symbols (unsigned nfiles
node->ipa_transforms_to_apply,
(ipa_opt_pass)&pass_ipa_inline);
}
+ timevar_pop (TV_IPA_LTO_CGRAPH_MERGE);
- timevar_push (TV_IPA_LTO_DECL_IO);
+ timevar_push (TV_IPA_LTO_DECL_INIT_IO);
/* FIXME lto. This loop needs to be changed to use the pass manager to
call the ipa passes directly. */
@@ -1791,7 +1804,9 @@ read_cgraph_and_symbols (unsigned nfiles
/* Indicate that the cgraph is built and ready. */
cgraph_function_flags_ready = true;
- timevar_pop (TV_IPA_LTO_DECL_IO);
+ timevar_pop (TV_IPA_LTO_DECL_INIT_IO);
+ ggc_free (all_file_decl_data);
+ all_file_decl_data = NULL;
}
@@ -1895,6 +1910,7 @@ do_whole_program_analysis (void)
fflush (stderr);
}
output_files = lto_wpa_write_files ();
+ ggc_collect ();
if (!quiet_flag)
fprintf (stderr, "\n");
Index: lto/Make-lang.in
===================================================================
--- lto/Make-lang.in (revision 158853)
+++ lto/Make-lang.in (working copy)
@@ -85,7 +85,7 @@ lto/lto.o: lto/lto.c $(CONFIG_H) $(SYSTE
$(CGRAPH_H) $(GGC_H) tree-ssa-operands.h $(TREE_PASS_H) \
langhooks.h vec.h $(BITMAP_H) pointer-set.h $(IPA_PROP_H) \
$(COMMON_H) $(TIMEVAR_H) $(GIMPLE_H) $(LTO_H) $(LTO_TREE_H) \
- $(LTO_TAGS_H) $(LTO_STREAMER_H)
+ $(LTO_TAGS_H) $(LTO_STREAMER_H) gt-lto-lto.h
lto/lto-elf.o: lto/lto-elf.c $(CONFIG_H) coretypes.h $(SYSTEM_H) \
toplev.h $(LTO_H) $(TM_H) $(LIBIBERTY_H) $(GGC_H) $(LTO_STREAMER_H)
lto/lto-coff.o: lto/lto-coff.c $(CONFIG_H) coretypes.h $(SYSTEM_H) \
Index: Makefile.in
===================================================================
--- Makefile.in (revision 158853)
+++ Makefile.in (working copy)
@@ -3623,6 +3623,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/inp
$(srcdir)/lto-symtab.c \
$(srcdir)/tree-ssa-alias.h \
$(srcdir)/ipa-prop.h \
+ $(srcdir)/lto-streamer.h \
@all_gtfiles@
# Compute the list of GT header files from the corresponding C sources,
Index: varpool.c
===================================================================
--- varpool.c (revision 158854)
+++ varpool.c (working copy)
@@ -399,8 +399,8 @@ bool
varpool_analyze_pending_decls (void)
{
bool changed = false;
- timevar_push (TV_CGRAPH);
+ timevar_push (TV_VARPOOL);
while (varpool_first_unanalyzed_node)
{
tree decl = varpool_first_unanalyzed_node->decl;
@@ -424,7 +424,7 @@ varpool_analyze_pending_decls (void)
record_references_in_initializer (decl, analyzed);
changed = true;
}
- timevar_pop (TV_CGRAPH);
+ timevar_pop (TV_VARPOOL);
return changed;
}
@@ -518,6 +518,7 @@ varpool_assemble_pending_decls (void)
if (errorcount || sorrycount)
return false;
+ timevar_push (TV_VAROUT);
/* EH might mark decls as needed during expansion. This should be safe since
we don't create references to new function, but it should not be used
elsewhere. */
@@ -539,6 +540,7 @@ varpool_assemble_pending_decls (void)
/* varpool_nodes_queue is now empty, clear the pointer to the last element
in the queue. */
varpool_last_needed_node = NULL;
+ timevar_pop (TV_VAROUT);
return changed;
}
Index: lto-streamer.h
===================================================================
--- lto-streamer.h (revision 158854)
+++ lto-streamer.h (working copy)
@@ -467,10 +467,10 @@ struct lto_cgraph_encoder_d
typedef struct lto_cgraph_encoder_d *lto_cgraph_encoder_t;
/* Mapping from indices to trees. */
-struct lto_tree_ref_table
+struct GTY(()) lto_tree_ref_table
{
/* Array of referenced trees . */
- tree *trees;
+ tree * GTY((length ("%h.size"))) trees;
/* Size of array. */
unsigned int size;
@@ -496,7 +496,7 @@ struct lto_tree_ref_encoder
/* Structure to hold states of input scope. */
-struct lto_in_decl_state
+struct GTY(()) lto_in_decl_state
{
/* Array of lto_in_decl_buffers to store type and decls streams. */
struct lto_tree_ref_table streams[LTO_N_DECL_STREAMS];
@@ -534,7 +534,7 @@ DEF_VEC_ALLOC_P(lto_out_decl_state_ptr,
by lto. This structure contains the tables that are needed by the
serialized functions and ipa passes to connect themselves to the
global types and decls as they are reconstituted. */
-struct lto_file_decl_data
+struct GTY(()) lto_file_decl_data
{
/* Decl state currently used. */
struct lto_in_decl_state *current_decl_state;
@@ -544,22 +544,22 @@ struct lto_file_decl_data
struct lto_in_decl_state *global_decl_state;
/* Table of cgraph nodes present in this file. */
- lto_cgraph_encoder_t cgraph_node_encoder;
+ lto_cgraph_encoder_t GTY((skip)) cgraph_node_encoder;
/* Hash table maps lto-related section names to location in file. */
- htab_t function_decl_states;
+ htab_t GTY((param_is (struct lto_in_decl_state))) function_decl_states;
/* The .o file that these offsets relate to. */
- const char *file_name;
+ const char *GTY((skip)) file_name;
/* Nonzero if this file should be recompiled with LTRANS. */
unsigned needs_ltrans_p : 1;
/* Hash table maps lto-related section names to location in file. */
- htab_t section_hash_table;
+ htab_t GTY((skip)) section_hash_table;
/* Hash new name of renamed global declaration to its original name. */
- htab_t renaming_hash_table;
+ htab_t GTY((skip)) renaming_hash_table;
};
struct lto_char_ptr_base