This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: WPA stream_out form & memory consumption


> 
> Hello,
>   taking latest trunk gcc, I built Firefox and Chromium. Both
> projects compiled without debugging symbols and -O2 on an 8-core
> machine.
> 
> Firefox:
> -flto=9, peak memory usage (in LTRANS): 11GB
> 
> Chromium:
> -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB

I see, the ltrans memory use is however about the same later in the game.
> 
> For details please see attached with graphs. The attachment contains
> also -fmem-report and -fmem-report-wpa.
> I think reduced memory footprint to ~3.5GB is a bit optimistic:
> http://gcc.gnu.org/gcc-4.9/changes.html

I will need to re-measure my setup - it is what I got last time with basically
same configuration.  It depends on parallelism, you should get sub 4GB peak
with -flto=1, right? We should clarify this in changes.html.
> 
> Is there any way we can reduce the memory footprint?

Looking at the memreport we get for ggc memory:

Chromium:
cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          0: 0.0%  274319552: 4.8%          0: 0.0%    2637688
cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          0: 0.0%  426228128: 7.5%          0: 0.0%    1299476
toplev.c:960 (realloc_for_line_map)                       0: 0.0%  357908640: 3.8% 1073743896:18.8%        184: 0.0%         10
tree-streamer-in.c:621 (streamer_alloc_tree)      216054000:86.6% 7623611824:80.2% 2536849136:44.5%   57818592:36.0%   69421368
Total                                             249562346       9504578411       5700671942        160593619         97146243
source location                                     Garbage            Freed             Leak         Overhead            Times

Firefox:
cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          0: 0.0%  130358176: 6.9%          0: 0.0%    1253444
cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          0: 0.0%  182236800: 9.7%          0: 0.0%     555600
toplev.c:960 (realloc_for_line_map)                       0: 0.0%   89503888: 5.5%  268468240:14.3%        160: 0.0%         13
tree-streamer-in.c:621 (streamer_alloc_tree)       93089976:77.5%  972848816:59.6%  639230248:33.9%   21332480:32.3%   13496198
Total                                             120076578       1632997043       1883064062         65981723         24732501
source location                                     Garbage            Freed             Leak         Overhead            Times

So chromium uses quite a lot more trees and also seem to have about twice as many functions.
Next time, it is useful to include -Q while collecting the data - it shows individual GGC runs and also
memory usage accounted per pass.  That way we would know if there are a lot more functions to start with, or just
more inlining going on.

I have older patch that introduces cache to line map stremaing reducing its size quite a bit, that should save
some memory of realloc_for_line_map.
I will dig it out and update to current tree.

I also wonder where the rest of memory goes, since the graphs shows about 10GB for Firefox.
Some is probably accounting of mmap files, also gold's memory usage.
We collect only some of memory usage that is not in ggc. Vectors:

Chromium:
ipa-cp.c:2421 (grow_edge_clone_vectors)            17225752: 6.9%   17225752               1: 0.0%           
vec.h:1393 (copy)                                  17291228: 6.9%  100465316         1499009: 3.7%           
lto-cgraph.c:141 (lto_symtab_encoder_encode)       30436272:12.2%   53192752            1460: 0.0%           
passes.c:2254 (execute_one_pass)                   53853360:21.6%   83885960         1426939: 3.5%           
ipa-inline-analysis.c:974 (inline_summary_alloc)   84406056:33.8%  137806000          484472: 1.2%         
Total                                             249721648                          40747241
Firefox:
ipa-cp.c:2421 (grow_edge_clone_vectors)             7753312: 6.1%    7753312               1: 0.0%
ipa-inline-analysis.c:4053 (read_inline_edge_sum    8758216: 6.9%   26420804          909584: 4.9%
ipa-ref.c:54 (ipa_record_reference)                10747880: 8.4%   20943288          371083: 2.0%
lto-cgraph.c:141 (lto_symtab_encoder_encode)       19756008:15.5%   23548272            1335: 0.0%
passes.c:2254 (execute_one_pass)                   26769688:21.0%   41942904          716378: 3.9%
ipa-inline-analysis.c:974 (inline_summary_alloc)   40110248:31.5%   62026480          284283: 1.5%
Total                                             127480444                          18430703

that seems as usual. 249MB seems acceptable.

Bitmaps seems to be dominated by ipa-reference.  On Chromium this pass seems to go crazy, having
about 800000MB of bitmaps.  Perhaps you could try to get data with -fno-ipa-reference?

We ought to get stats on hashtables, since these probably consume quite some memory
during LTO streaing.
Could you perhaps also get -flto-report?

Honza
> 
> Attachment (due to size restriction): https://drive.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit?usp=sharing
> 
> Thank you,
> Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]