Materialize clones on demand

Jan Hubicka hubicka@ucw.cz
Fri Oct 23 19:27:48 GMT 2020


> Hi,
> 
> On Thu, Oct 22 2020, Jan Hubicka wrote:
> > Hi,
> > this patch removes the pass to materialize all clones and instead this
> > is now done on demand.  The motivation is to reduce lifetime of function
> > bodies in ltrans that should noticeably reduce memory use for highly
> > parallel compilations of large programs (like Martin does) or with
> > partitioning reduced/disabled. For cc1 with one partition the memory use
> > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> > particularly accurate).
> >
> 
> Nice.

Sadly this is only true w/o debug info.  I collected memory usage stats
at the end of the ltrans stage and it is as folloes

 - after streaming in global stream: 126M GGC and 41M heap
 - after streaming symbol table:     373M GGC and 92M heap
 - after stremaing in summaries:     394M GGC and 92M heap 
   (only large summary seems to be ipa-cp transformation summary)
 - then compilation starts and memory goes slowly up to 3527M at the end
   of compilation

The following accounts for more than 1% GGC:

Time variable                                   usr           sys          wall           GGC
 ipa inlining heuristics            :   6.99 (  0%)   4.62 (  1%)  11.17 (  1%)   241M (  1%)
 ipa lto gimple in                  :  50.04 (  3%)  29.72 (  7%)  80.22 (  4%)  3129M ( 14%)
 ipa lto decl in                    :   0.79 (  0%)   0.36 (  0%)   1.15 (  0%)   135M (  1%)
 ipa lto cgraph I/O                 :   0.95 (  0%)   0.20 (  0%)   1.15 (  0%)   269M (  1%)
 cfg cleanup                        :  25.83 (  2%)   2.52 (  1%)  28.15 (  1%)   154M (  1%)
 df reg dead/unused notes           :  24.08 (  2%)   2.09 (  1%)  26.77 (  1%)   180M (  1%)
 alias analysis                     :  16.94 (  1%)   1.05 (  0%)  17.71 (  1%)   383M (  2%)
 integration                        :  45.76 (  3%)  44.30 ( 11%)  88.99 (  5%)  2328M ( 10%)
 tree VRP                           :  41.38 (  3%)  15.67 (  4%)  57.71 (  3%)   560M (  2%)
 tree SSA rewrite                   :   6.71 (  0%)   2.17 (  1%)   8.96 (  0%)   194M (  1%)
 tree SSA incremental               :  26.99 (  2%)   8.23 (  2%)  34.42 (  2%)   144M (  1%)
 tree operand scan                  :  65.34 (  4%)  61.50 ( 15%) 127.02 (  7%)   886M (  4%)
 dominator optimization             :  41.53 (  3%)  13.56 (  3%)  55.78 (  3%)   407M (  2%)
 tree split crit edges              :   1.08 (  0%)   0.65 (  0%)   1.63 (  0%)   127M (  1%)
 tree PRE                           :  34.30 (  2%)  14.52 (  4%)  49.08 (  3%)   337M (  1%)
 tree code sinking                  :   2.92 (  0%)   0.58 (  0%)   3.51 (  0%)   122M (  1%)
 tree iv optimization               :   6.71 (  0%)   1.19 (  0%)   8.46 (  0%)   133M (  1%)
 expand                             :  45.56 (  3%)   8.24 (  2%)  55.02 (  3%)  1980M (  9%)
 forward prop                       :  11.89 (  1%)   1.39 (  0%)  12.59 (  1%)   130M (  1%)
 dead store elim2                   :  10.03 (  1%)   0.70 (  0%)  11.23 (  1%)   138M (  1%)
 loop init                          :  11.96 (  1%)   4.95 (  1%)  17.11 (  1%)   378M (  2%)
 CPROP                              :  22.63 (  2%)   2.78 (  1%)  25.19 (  1%)   359M (  2%)
 combiner                           :  41.39 (  3%)   2.57 (  1%)  43.30 (  2%)   558M (  2%)
 reload CSE regs                    :  22.38 (  2%)   1.25 (  0%)  23.06 (  1%)   186M (  1%)
 final                              :  32.33 (  2%)   4.28 (  1%)  36.75 (  2%)  1105M (  5%)
 symout                             :  49.04 (  3%)   2.23 (  1%)  52.33 (  3%)  2517M ( 11%)
 var-tracking emit                  :  33.26 (  2%)   1.02 (  0%)  34.35 (  2%)   582M (  3%)
 rest of compilation                :  38.05 (  3%)  15.61 (  4%)  52.42 (  3%)   114M (  1%)
 TOTAL                              :1486.02        408.79       1899.96        22512M

We seem to leak some hashtables:
dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc
cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc

and hashmaps:
ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap
alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc

and hashsets:
ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap

and vectors:
tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k
tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k

However main problem is
cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
--------------------------------------------------------------------------------------------------------------------------------------------

Clearly some function bodies leak - I will try to figure out what. But
main problem is debug info.
I guess debug info for whole cc1plus is large, but it would be nice if
it was not in the garbage collector, for example :)

Honza


More information about the Gcc-patches mailing list