This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: WPA stream_out form & memory consumption
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Martin Liška <mliska at suse dot cz>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 3 Apr 2014 00:43:44 +0200
- Subject: Re: WPA stream_out form & memory consumption
- Authentication-results: sourceware.org; auth=none
- References: <53286192 dot 3030600 at suse dot cz> <20140325205021 dot GA6581 at atrey dot karlin dot mff dot cuni dot cz> <5333E6B8 dot 3000504 at suse dot cz> <5333F3D3 dot 1010009 at suse dot cz> <533C1B04 dot 40407 at suse dot cz>
>
> Hello,
> taking latest trunk gcc, I built Firefox and Chromium. Both
> projects compiled without debugging symbols and -O2 on an 8-core
> machine.
>
> Firefox:
> -flto=9, peak memory usage (in LTRANS): 11GB
>
> Chromium:
> -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB
I see, the ltrans memory use is however about the same later in the game.
>
> For details please see attached with graphs. The attachment contains
> also -fmem-report and -fmem-report-wpa.
> I think reduced memory footprint to ~3.5GB is a bit optimistic:
> http://gcc.gnu.org/gcc-4.9/changes.html
I will need to re-measure my setup - it is what I got last time with basically
same configuration. It depends on parallelism, you should get sub 4GB peak
with -flto=1, right? We should clarify this in changes.html.
>
> Is there any way we can reduce the memory footprint?
Looking at the memreport we get for ggc memory:
Chromium:
cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 274319552: 4.8% 0: 0.0% 2637688
cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 426228128: 7.5% 0: 0.0% 1299476
toplev.c:960 (realloc_for_line_map) 0: 0.0% 357908640: 3.8% 1073743896:18.8% 184: 0.0% 10
tree-streamer-in.c:621 (streamer_alloc_tree) 216054000:86.6% 7623611824:80.2% 2536849136:44.5% 57818592:36.0% 69421368
Total 249562346 9504578411 5700671942 160593619 97146243
source location Garbage Freed Leak Overhead Times
Firefox:
cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 130358176: 6.9% 0: 0.0% 1253444
cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 182236800: 9.7% 0: 0.0% 555600
toplev.c:960 (realloc_for_line_map) 0: 0.0% 89503888: 5.5% 268468240:14.3% 160: 0.0% 13
tree-streamer-in.c:621 (streamer_alloc_tree) 93089976:77.5% 972848816:59.6% 639230248:33.9% 21332480:32.3% 13496198
Total 120076578 1632997043 1883064062 65981723 24732501
source location Garbage Freed Leak Overhead Times
So chromium uses quite a lot more trees and also seem to have about twice as many functions.
Next time, it is useful to include -Q while collecting the data - it shows individual GGC runs and also
memory usage accounted per pass. That way we would know if there are a lot more functions to start with, or just
more inlining going on.
I have older patch that introduces cache to line map stremaing reducing its size quite a bit, that should save
some memory of realloc_for_line_map.
I will dig it out and update to current tree.
I also wonder where the rest of memory goes, since the graphs shows about 10GB for Firefox.
Some is probably accounting of mmap files, also gold's memory usage.
We collect only some of memory usage that is not in ggc. Vectors:
Chromium:
ipa-cp.c:2421 (grow_edge_clone_vectors) 17225752: 6.9% 17225752 1: 0.0%
vec.h:1393 (copy) 17291228: 6.9% 100465316 1499009: 3.7%
lto-cgraph.c:141 (lto_symtab_encoder_encode) 30436272:12.2% 53192752 1460: 0.0%
passes.c:2254 (execute_one_pass) 53853360:21.6% 83885960 1426939: 3.5%
ipa-inline-analysis.c:974 (inline_summary_alloc) 84406056:33.8% 137806000 484472: 1.2%
Total 249721648 40747241
Firefox:
ipa-cp.c:2421 (grow_edge_clone_vectors) 7753312: 6.1% 7753312 1: 0.0%
ipa-inline-analysis.c:4053 (read_inline_edge_sum 8758216: 6.9% 26420804 909584: 4.9%
ipa-ref.c:54 (ipa_record_reference) 10747880: 8.4% 20943288 371083: 2.0%
lto-cgraph.c:141 (lto_symtab_encoder_encode) 19756008:15.5% 23548272 1335: 0.0%
passes.c:2254 (execute_one_pass) 26769688:21.0% 41942904 716378: 3.9%
ipa-inline-analysis.c:974 (inline_summary_alloc) 40110248:31.5% 62026480 284283: 1.5%
Total 127480444 18430703
that seems as usual. 249MB seems acceptable.
Bitmaps seems to be dominated by ipa-reference. On Chromium this pass seems to go crazy, having
about 800000MB of bitmaps. Perhaps you could try to get data with -fno-ipa-reference?
We ought to get stats on hashtables, since these probably consume quite some memory
during LTO streaing.
Could you perhaps also get -flto-report?
Honza
>
> Attachment (due to size restriction): https://drive.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit?usp=sharing
>
> Thank you,
> Martin