This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Re-write LTO type merging again, do tree merging
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org, Jan Hubicka <jh at suse dot de>, Michael Matz <matz at suse dot de>, Diego Novillo <dnovillo at google dot com>, andi at firstfloor dot org
- Date: Mon, 17 Jun 2013 10:12:41 +0200
- Subject: Re: [PATCH] Re-write LTO type merging again, do tree merging
- References: <alpine dot LNX dot 2 dot 00 dot 1306141240340 dot 6998 at zhemvz dot fhfr dot qr> <20130615102823 dot GD2605 at atrey dot karlin dot mff dot cuni dot cz> <alpine dot LNX dot 2 dot 00 dot 1306171005280 dot 22313 at zhemvz dot fhfr dot qr>
> > CPU: AMD64 family10, speed 2100 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000
> > samples % app name symbol name
> > 45047 11.7420 lto1 inflate_fast
>
> It might be worth changing LTO section layout to include a header
> that specifies whether a section is compressed or not so we can
> allow mixed compressed/uncompressed sections in the LTRANS files
> and avoid decompressing the function sections.
Yes, but this profile shows only decl streaming. Functions do not really show
up in profile. I guess only way to cut this down is to either use LZO that
is faster at decompression side and/or reduce amount of data we stream to .o
files.
>
> > 34224 8.9209 lto1 streamer_read_uhwi(lto_input_block*)
> > 24630 6.4201 lto1 compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
> > 23205 6.0487 lto1 pointer_map_insert(pointer_map_t*, void const*)
> > 20829 5.4293 lto1 unpack_value_fields(data_in*, bitpack_d*, tree_node*)
> > 13545 3.5307 lto1 ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option)
> > 12841 3.3472 libc-2.11.1.so memset
> > 11840 3.0862 lto1 htab_find_slot_with_hash
> > 11397 2.9708 lto1 streamer_tree_cache_insert_1(streamer_tree_cache_d*, tree_node*, unsigned int, unsigned int*, bool)
> > 11086 2.8897 lto1 lto_input_tree(lto_input_block*, data_in*)
> > 10522 2.7427 lto1 lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
> > 8853 2.3076 lto1 unify_scc(streamer_tree_cache_d*, unsigned int, unsigned int, unsigned int, unsigned int)
> > 8539 2.2258 lto1 hash_table<tree_scc_hasher, xcallocator>::find_slot_with_hash(tree_scc const*, unsigned int, insert_option)
> > 7987 2.0819 lto1 adler32
> > 7743 2.0183 lto1 streamer_read_tree_body(lto_input_block*, data_in*, tree_node*)
> >
> > Can't we free the pointer map in streamer after every SCC?
>
> You mean on read-in? We even can do without the pointer-map there at all.
>
> We can experiment with that as a followup.
I believe it was needed for one of the cleanups (to update the map), but i guess
one can easily just run the fixup on the segment of array corresponding to new SCC.
> > The longest running ltrans add another 400 seconds.
> > combiner : 16.16 ( 4%) usr 0.08 ( 1%) sys 16.53 ( 4%) wall 205251 kB ( 6%) ggc
> > integrated RA : 47.97 (12%) usr 0.21 ( 3%) sys 48.39 (12%) wall 391655 kB (12%) ggc
> > LRA hard reg assignment : 158.64 (39%) usr 0.02 ( 0%) sys 158.74 (38%) wall 0 kB ( 0%) ggc
> > TOTAL : 404.51 8.39 414.01 3215235 kB
>
> Otherwise it looks pretty good.
Indeed. We are getting closer to numbers I measured on the same machine in 2010, when Firefox
was half of its today size.
Thanks for all the hard work!
Honza