This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GGC per call allocation statistics
- From: Jan Hubicka <jh at suse dot cz>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 20 Jan 2004 18:40:02 +0100
- Subject: GGC per call allocation statistics
Hi,
I did simple patch to record statistics of ggc_alloc calls per line
basics. The output looks like this:
source location Times Allocated Overhead
-------------------------------------------------------
../../gcc/varasm.c:2427 (build_constant_desc) 4 32 0:
../../gcc/varasm.c:374 (set_named_section_flags) 3 36 0:
genrtl.c:748 (gen_rtx_fmt_eeeee) 2 48 0:
../../gcc/tree.c:582 (make_tree_vec) 4 96 0:
../../gcc/stmt.c:2539 (expand_start_null_loop) 3 192 0:
../../gcc/c-decl.c:5117 (finish_struct) 17 136 68:
../../gcc/function.c:3325 (insns_for_mem_walk) 40 320 0:
../../gcc/cgraph.c:447 (cgraph_varpool_node) 46 552 0:
genrtl.c:726 (gen_rtx_fmt_eit) 40 640 0:
genrtl.c:589 (gen_rtx_fmt_) 57 456 228:
../../gcc/alias.c:644 (record_alias_subset) 99 1188 0:
../../gcc/tree.c:539 (build_string) 26 1192 69:
../../gcc/emit-rtl.c:4821 (start_sequence) 94 1504 0:
../../gcc/c-decl.c:392 (make_scope) 20 1280 320:
../../gcc/stmt.c:4418 (expand_start_case) 29 1856 0:
genrtl.c:356 (gen_rtx_fmt_iuuBieiee) 56 2240 0:
../../gcc/bitmap.c:145 (bitmap_element_allocate) 66 2112 264:
../../gcc/function.c:5071 (assign_parms) 4 2048 1008:
../../gcc/c-decl.c:5118 (finish_struct) 17 2912 692:
../../gcc/varasm.c:2647 (init_varasm_status) 183 4392 0:
../../gcc/stmt.c:2499 (expand_start_loop) 100 6400 0:
../../gcc/config/i386/i386.c:11716 (ix86_init_machine_status) 183 5856 732:
../../gcc/expr.c:323 (init_expr) 183 5856 732:
../../gcc/c-decl.c:1724 (pushdecl) 744 5952 2976:
../../gcc/tree.c:3042 (type_hash_add) 1408 11264 0:
../../gcc/ggc-common.c:197 (ggc_splay_alloc) 604 11248 396:
../../gcc/stmt.c:433 (init_stmt_for_function) 183 11712 732:
../../gcc/stmt.c:4687 (add_case_node) 372 11904 1488:
genrtl.c:604 (gen_rtx_fmt_w) 1697 13576 0:
../../gcc/emit-rtl.c:5152 (init_emit) 183 11712 2928:
../../gcc/function.c:4338 (assign_parms) 58 14848 1392:
genrtl.c:901 (gen_rtx_fmt_uuuu) 865 17300 0:
genrtl.c:706 (gen_rtx_fmt_s00) 1359 21744 0:
../../gcc/except.c:463 (init_eh_for_function) 183 19764 5124:
../../gcc/emit-rtl.c:366 (get_reg_attrs) 4107 32856 0:
../../gcc/cgraph.c:157 (create_edge) 1723 34460 0:
../../gcc/c-decl.c:4393 (grokdeclarator) 3250 26000 13000:
../../gcc/cgraph.c:113 (cgraph_node) 319 34452 8932:
../../gcc/optabs.c:4887 (new_optab) 75 38400 8700:
../../gcc/varasm.c:2654 (init_varasm_status) 183 46848 2196:
../../gcc/varasm.c:2651 (init_varasm_status) 183 46848 2196:
../../gcc/emit-rtl.c:847 (gen_reg_rtx) 46 46080 17640:
../../gcc/emit-rtl.c:5170 (init_emit) 183 46848 17934:
genrtl.c:572 (gen_rtx_fmt_eee) 4157 66512 0:
../../gcc/stmt.c:2413 (expand_start_cond) 1071 68544 0:
../../gcc/varray.c:154 (varray_grow) 63 62760 30384:
genrtl.c:220 (gen_rtx_fmt_E) 12118 96944 0:
../../gcc/stmt.c:3406 (expand_start_bindings_and_block) 1913 122432 0:
../../gcc/emit-rtl.c:317 (get_mem_attrs) 6465 129300 0:
../../gcc/function.c:6384 (allocate_struct_function) 183 93696 39528:
genrtl.c:654 (gen_rtx_fmt_ei) 12450 149400 0:
../../gcc/rtl.c:158 (rtvec_alloc) 16247 208760 2624:
genrtl.c:478 (gen_rtx_fmt_iuuB00is) 4821 192840 19284:
genrtl.c:688 (gen_rtx_fmt_u00) 13960 223360 0:
../../gcc/emit-rtl.c:852 (gen_reg_rtx) 46 184320 70560:
../../gcc/emit-rtl.c:5173 (init_emit) 183 187392 71736:
../../gcc/tree.c:352 (copy_node) 14211 354144 11922:
../../gcc/optabs.c:4901 (new_convert_optab) 9 294912 122004:
genrtl.c:236 (gen_rtx_fmt_e) 52415 419320 0:
genrtl.c:635 (gen_rtx_fmt_i00) 41412 662592 0:
../../gcc/tree.c:2452 (build1) 52200 1044000 0:
genrtl.c:671 (gen_rtx_fmt_e0) 88306 1059672 0:
../../gcc/tree.c:1059 (tree_cons) 60420 1208400 0:
genrtl.c:51 (gen_rtx_fmt_ue) 129362 1552344 0:
../../gcc/cselib.c:150 (new_elt_list) 240516 1924128 0:
genrtl.c:619 (gen_rtx_fmt_0) 255752 2046016 0:
../../gcc/varray.c:121 (varray_init) 899 1714608 511456:
../../gcc/rtl.c:312 (shallow_copy_rtx) 140216 2281592 114400:
genrtl.c:33 (gen_rtx_fmt_ee) 253345 3040140 0:
../../gcc/alias.c:2734 (init_alias_analysis) 828 2761728 896360:
../../gcc/cselib.c:708 (new_cselib_val) 254266 5085320 0:
../../gcc/rtl.c:180 (rtx_alloc) 241148 4860180 225580:
../../gcc/ggc-common.c:188 (ggc_calloc) 29750 5142208 187088:
../../gcc/cselib.c:167 (new_elt_loc_list) 315895 6317900 0:
../../gcc/tree.c:274 (make_node) 192189 6449496 286772:
Total 2455817 50550120 2679445
-------------------------------------------------------
As disucssed ealirer on IRC with Steven, major stopper of this idea is
that ggc_alloc gets called from just few places in compiler using
another wrapper idea. I've added some machinery to make it possible to
transparentize the functions, but even the code above makes it obvious
that cselib is producing 40% of ggc_alloc calls on my testcase
(combine.c compilation) for no good reason. I am already testing patch
that makes end of statistics look like this:
genrtl.c:478 (gen_rtx_fmt_iuuB00is) 4731 189240 18924:
genrtl.c:688 (gen_rtx_fmt_u00) 13339 213424 0:
../../gcc/emit-rtl.c:5173 (init_emit) 183 187392 71736:
../../gcc/emit-rtl.c:852 (gen_reg_rtx) 52 198656 76048:
genrtl.c:654 (gen_rtx_fmt_ei) 24213 290556 0:
genrtl.c:236 (gen_rtx_fmt_e) 45302 362416 0:
../../gcc/tree.c:352 (copy_node) 14230 354536 11926:
../../gcc/optabs.c:4901 (new_convert_optab) 9 294912 122004:
../../gcc/tree.c:3867 (build_function_type) 3784 408672 75680:
../../gcc/tree.c:408 (build_int_2_wide) 29024 580480 0:
genrtl.c:635 (gen_rtx_fmt_i00) 43858 701728 0:
../../gcc/stringpool.c:70 (alloc_node) 8621 551744 172420:
../../gcc/tree.c:2452 (build1) 50790 1015800 0:
../../gcc/tree.c:2315 (build) 49034 1182640 2912:
../../gcc/tree.c:1059 (tree_cons) 60421 1208420 0:
genrtl.c:671 (gen_rtx_fmt_e0) 111975 1343700 0:
../../gcc/tree.c:1044 (build_tree_list) 71169 1423380 0:
genrtl.c:51 (gen_rtx_fmt_ue) 128759 1545108 0:
../../gcc/tree.c:2559 (build_decl) 17331 1871748 0:
genrtl.c:619 (gen_rtx_fmt_0) 265711 2125688 0:
../../gcc/varray.c:121 (varray_init) 855 1743792 482980:
../../gcc/rtl.c:312 (shallow_copy_rtx) 163152 2619416 110480:
genrtl.c:33 (gen_rtx_fmt_ee) 261693 3140316 0:
../../gcc/alias.c:2734 (init_alias_analysis) 828 3055616 977896:
../../gcc/ggc-common.c:188 (ggc_calloc) 26339 4802112 167492:
../../gcc/rtl.c:180 (rtx_alloc) 234303 4875800 219396:
Total 1702668 38094280 2707693
Saving GGC memory overhead from 50MB to 38MB. (I will have to take
closer look on why cselib needs that many of datastructures)
I would like to hear ideas how to make such patch as least intrusive as
possible. At the moment I use statistics.h header like this:
#ifndef GCC_STATISTICS
#define GCC_STATISTICS
#ifdef GATHER_STATISTICS
#define MEM_STAT_DECL , const char *_loc_name ATTRIBUTE_UNUSED, int _loc_line ATTRIBUTE_UNUSED, const char *_loc_function ATTRIBUTE_UNUSED
#define PASS_MEM_STAT , _loc_name, _loc_line, _loc_function
#define MEM_STAT_INFO , __FILE__, __LINE__, __FUNCTION__
#else
#define MEM_STAT_DECL
#define PASS_MEM_STAT
#define MEM_STAT_INFO
#endif
#endif
and then if I want to make some function transparent I add #define
trick:
extern void *ggc_realloc_stat (void *, size_t MEM_STAT_DECL);
#define ggc_realloc(s,z) ggc_realloc_stat (s,z MEM_STAT_INFO)
and I pass things around in the function itself:
/* Resize a block of memory, possibly re-allocating it. */
void *
ggc_realloc_stat (void *x, size_t size MEM_STAT_DECL)
{
void *r;
size_t old_size;
if (x == NULL)
return ggc_alloc_stat (size PASS_MEM_STAT);
...
And the function disappears from my statistics in favour if it callers.
One ugly part are the missing commas before my macros, other stopper is
that for varargs functions (like build*) I would need macros with
variable number of arguments that is C99 only.
I can hide these macros in special header and provide default
#define build build_stat
for non-c99 compilers, but I would much preffer to see different
sollution. Ideas?
Honza