This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Add object allocators to symbol and call summaries
On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On 11/5/19 3:48 PM, Jan Hubicka wrote:
> > > > >
> > > > > stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k
> > > > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k
> > > > > hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349
> > > > > ^^^ some of memory comes here which ought to be accounted to caller of
> > > > > expand.
> > > >
> > > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
> > > > for each created symbol/call_summary and register manually every allocation to such descriptor.
> > >
> > > Or just pass memory stats from caller of expand and transitively pass it
> > > from caller of summary. This will get us the line info of get_create
> > > call that is IMO OK.
> >
> > The issue with this approach is that you will spread a summary allocation
> > along all the ::get_create places. Which is not ideal.
>
> We get it with other allocations, too. Not ideal, but better.
> Even better solutions are welcome :)
> >
> > Try to take a look, or we can debug that on Thursday together?
> > Martin
>
> Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks
> all active allocations objects and looks if they was collected accoutnig
> their collection and then throws away all allocations (including those
> not colelcted) and those gets no longer accounted later. So we
> basically misaccount everything that survives ggc_collect.
>
> No wonder that it makes me to hunt ghosts 8-O
>
> Also the last memory report was sorted by garbage not leak for reason -
> for normal compilation we care about garbage produces primarily because
> those triggers ggc collects and makes compiler slow.
>
> BTW I like how advanced C++ gets back to lisp :)
>
> With the fix I get following stats by end of firefox WPA
>
> cfg.c:127 (alloc_block) 32M: 1.2% 12M: 2.6% 0 : 0.0% 0 : 0.0% 446k
> symtab.c:582 (create_reference) 42M: 1.6% 0 : 0.0% 65M: 1.7% 1329k: 0.4% 840k
> gimple-streamer-in.c:101 (input_gimple_stmt) 49M: 1.9% 17M: 3.5% 0 : 0.0% 375k: 0.1% 747k
> tree-ssanames.c:308 (make_ssa_name_fn) 51M: 2.0% 16M: 3.4% 0 : 0.0% 0 : 0.0% 973k
> ipa-cp.c:5157 (ipcp_store_vr_results) 51M: 2.0% 1243k: 0.2% 0 : 0.0% 9561k: 3.0% 146k
> stringpool.c:63 (alloc_node) 53M: 2.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1362k
> ipa-prop.c:3988 (duplicate) 63M: 2.4% 1115k: 0.2% 0 : 0.0% 10M: 3.2% 264k
> toplev.c:904 (realloc_for_line_map) 72M: 2.8% 0 : 0.0% 71M: 1.9% 15M: 5.1% 27
> tree-ssanames.c:83 (init_ssanames) 96M: 3.7% 121M: 24.4% 44M: 1.2% 87M: 27.8% 174k
> tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 4.0% 0 : 0.0% 39M: 1.0% 0 : 0.0% 105k
> stringpool.c:41 (stringpool_ggc_alloc) 106M: 4.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k
> lto/lto-common.c:204 (lto_read_in_decl_state) 160M: 6.2% 0 : 0.0% 105M: 2.8% 19M: 6.1% 1731k
> cgraph.c:851 (create_edge) 248M: 9.5% 0 : 0.0% 70M: 1.9% 0 : 0.0% 3141k
> cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7% 0 : 0.0% 155M: 4.1% 0 : 0.0% 1567k
> tree-streamer-in.c:631 (streamer_alloc_tree) 718M: 27.5% 136M: 27.5% 1267M: 33.3% 64M: 20.6% 15M
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory Leak Garbage Freed Overhead Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total 2609M:100.0% 497M:100.0% 3804M:100.0% 313M:100.0% 49M
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> This looks more realistic. ssa_operands and init_ssanames shows that we
> read really a lot of bodies into memory. I also wonder if we realy want
> to compute virutal ssa form for them when we only want to compare them.
>
> After reading and symbol table merging I get:
>
> cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k
>
> So it seems that about half of callgrpah nodes are inline clones, so
> working on reducing clone overhead (in addition to re-visiting tree
> merging once again) seems to be most meaningful right now.
>
> OK if patch passes testing?
OK.
> * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving
> memory allocations away.
> * mem-stats.h (mem_alloc_description<T>::release_object_overhead):
> do not silently ignore invalid release requests.
> Index: ggc-common.c
> ===================================================================
> --- ggc-common.c (revision 277796)
> +++ ggc-common.c (working copy)
> @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void)
>
> for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it)
> if (!ggc_marked_p ((*it).first))
> - (*it).second.first->m_collected += (*it).second.second;
> -
> - delete ggc_mem_desc.m_reverse_object_map;
> - ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false);
> + {
> + (*it).second.first->m_collected += (*it).second.second;
> + ggc_mem_desc.m_reverse_object_map->remove ((*it).first);
> + }
> }
>
> /* Return memory used by heap in kb, 0 if this info is not available. */
> Index: mem-stats.h
> ===================================================================
> --- mem-stats.h (revision 277796)
> +++ mem-stats.h (working copy)
> @@ -535,11 +535,8 @@ inline void
> mem_alloc_description<T>::release_object_overhead (void *ptr)
> {
> std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr);
> - if (entry)
> - {
> - entry->first->release_overhead (entry->second);
> - m_reverse_object_map->remove (ptr);
> - }
> + entry->first->release_overhead (entry->second);
> + m_reverse_object_map->remove (ptr);
> }
>
> /* Unregister a memory allocation descriptor registered with