Materialize clones on demand

Richard Biener rguenther@suse.de
Mon Oct 26 07:41:34 GMT 2020


On Fri, 23 Oct 2020, Jan Hubicka wrote:

> > Hi,
> > 
> > On Thu, Oct 22 2020, Jan Hubicka wrote:
> > > Hi,
> > > this patch removes the pass to materialize all clones and instead this
> > > is now done on demand.  The motivation is to reduce lifetime of function
> > > bodies in ltrans that should noticeably reduce memory use for highly
> > > parallel compilations of large programs (like Martin does) or with
> > > partitioning reduced/disabled. For cc1 with one partition the memory use
> > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> > > particularly accurate).
> > >
> > 
> > Nice.
> 
> Sadly this is only true w/o debug info.  I collected memory usage stats
> at the end of the ltrans stage and it is as folloes
> 
>  - after streaming in global stream: 126M GGC and 41M heap
>  - after streaming symbol table:     373M GGC and 92M heap
>  - after stremaing in summaries:     394M GGC and 92M heap 
>    (only large summary seems to be ipa-cp transformation summary)
>  - then compilation starts and memory goes slowly up to 3527M at the end
>    of compilation
> 
> The following accounts for more than 1% GGC:
> 
> Time variable                                   usr           sys          wall           GGC
>  ipa inlining heuristics            :   6.99 (  0%)   4.62 (  1%)  11.17 (  1%)   241M (  1%)
>  ipa lto gimple in                  :  50.04 (  3%)  29.72 (  7%)  80.22 (  4%)  3129M ( 14%)
>  ipa lto decl in                    :   0.79 (  0%)   0.36 (  0%)   1.15 (  0%)   135M (  1%)
>  ipa lto cgraph I/O                 :   0.95 (  0%)   0.20 (  0%)   1.15 (  0%)   269M (  1%)
>  cfg cleanup                        :  25.83 (  2%)   2.52 (  1%)  28.15 (  1%)   154M (  1%)
>  df reg dead/unused notes           :  24.08 (  2%)   2.09 (  1%)  26.77 (  1%)   180M (  1%)
>  alias analysis                     :  16.94 (  1%)   1.05 (  0%)  17.71 (  1%)   383M (  2%)
>  integration                        :  45.76 (  3%)  44.30 ( 11%)  88.99 (  5%)  2328M ( 10%)
>  tree VRP                           :  41.38 (  3%)  15.67 (  4%)  57.71 (  3%)   560M (  2%)
>  tree SSA rewrite                   :   6.71 (  0%)   2.17 (  1%)   8.96 (  0%)   194M (  1%)
>  tree SSA incremental               :  26.99 (  2%)   8.23 (  2%)  34.42 (  2%)   144M (  1%)
>  tree operand scan                  :  65.34 (  4%)  61.50 ( 15%) 127.02 (  7%)   886M (  4%)
>  dominator optimization             :  41.53 (  3%)  13.56 (  3%)  55.78 (  3%)   407M (  2%)
>  tree split crit edges              :   1.08 (  0%)   0.65 (  0%)   1.63 (  0%)   127M (  1%)
>  tree PRE                           :  34.30 (  2%)  14.52 (  4%)  49.08 (  3%)   337M (  1%)
>  tree code sinking                  :   2.92 (  0%)   0.58 (  0%)   3.51 (  0%)   122M (  1%)
>  tree iv optimization               :   6.71 (  0%)   1.19 (  0%)   8.46 (  0%)   133M (  1%)
>  expand                             :  45.56 (  3%)   8.24 (  2%)  55.02 (  3%)  1980M (  9%)
>  forward prop                       :  11.89 (  1%)   1.39 (  0%)  12.59 (  1%)   130M (  1%)
>  dead store elim2                   :  10.03 (  1%)   0.70 (  0%)  11.23 (  1%)   138M (  1%)
>  loop init                          :  11.96 (  1%)   4.95 (  1%)  17.11 (  1%)   378M (  2%)
>  CPROP                              :  22.63 (  2%)   2.78 (  1%)  25.19 (  1%)   359M (  2%)
>  combiner                           :  41.39 (  3%)   2.57 (  1%)  43.30 (  2%)   558M (  2%)
>  reload CSE regs                    :  22.38 (  2%)   1.25 (  0%)  23.06 (  1%)   186M (  1%)
>  final                              :  32.33 (  2%)   4.28 (  1%)  36.75 (  2%)  1105M (  5%)
>  symout                             :  49.04 (  3%)   2.23 (  1%)  52.33 (  3%)  2517M ( 11%)
>  var-tracking emit                  :  33.26 (  2%)   1.02 (  0%)  34.35 (  2%)   582M (  3%)
>  rest of compilation                :  38.05 (  3%)  15.61 (  4%)  52.42 (  3%)   114M (  1%)
>  TOTAL                              :1486.02        408.79       1899.96        22512M
> 
> We seem to leak some hashtables:
> dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc

that one likely keeps quite some memory live...

> cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
> tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc

Hmm, so we do

  scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);

and

  scalar_evolution_info->empty ();
  scalar_evolution_info = NULL;

to reclaim.  ->empty () will IIRC at least allocate 7 elements which we
the eventually should reclaim during a GC walk - I guess the hashtable
statistics do not really handle GC reclaimed portions?

If there's a friendlier way of releasing a GC allocated hash-tab
we can switch to that.  Note that in principle the hash-table doesn't
need to be GC allocated but it needs to be walked since it refers to
trees that might not be referenced in other ways.

> and hashmaps:
> ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
> tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap

Similar as SCEV, probably mis-accounting?

> alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
> ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
> dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc
> 
> and hashsets:
> ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
> ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
> tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap
> 
> and vectors:
> tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k

Huh.  It's an auto_vec<>

> tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
> tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
> ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
> tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
> graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
> dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
> tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
> tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
> symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
> vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k

Those all look OK to me, not sure why we even think there's a leak?

> However main problem is
> cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
> cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
> varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
> emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
> dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
> emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
> tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
> gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
> tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
> dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
> cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
> tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
> stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
> stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
> tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
> cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
> cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
> tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
> tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
> stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
> dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
> tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
> tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
> function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
> hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
> dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
> tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
> dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
> dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
> dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
> toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> Clearly some function bodies leak - I will try to figure out what. But
> main problem is debug info.
> I guess debug info for whole cc1plus is large, but it would be nice if
> it was not in the garbage collector, for example :)

Well, we're building a DIE tree for the whole unit here so I'm not sure
what parts we can optimize.  The structures may keep quite some stuff
on the tree side live through the decl -> DIE and block -> DIE maps
and the external_die_map used for LTO streaming (but if we lazily stream
bodies we do need to keep this map ... unless we add some
start/end-stream-body hooks and doing the map per function.  But then
we build the DIEs lazily as well so the query of the map is lazy :/)

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


More information about the Gcc-patches mailing list