This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
- From: "hubicka at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 19 Oct 2011 09:22:01 +0000
- Subject: [Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO
- Auto-submitted: auto-generated
- References: <bug-45375-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375
--- Comment #119 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-10-19 09:22:01 UTC ---
Some up to date perfomrance data. WPA peaks 3.1GB in TOP now. (3261 virt).
Overall compile time is 4m32s real, 21m14 user.
GGC memory is GC 2248537k -> 1727826k
WPA time report:
callgraph optimization : 1.68 ( 1%) usr 0.00 ( 0%) sys 1.70 ( 1%) wall
16008 kB (11%) ggc
varpool construction : 0.66 ( 0%) usr 0.02 ( 0%) sys 0.68 ( 0%) wall
55300 kB (39%) ggc
ipa cp : 1.70 ( 1%) usr 0.09 ( 1%) sys 1.79 ( 1%) wall
75845 kB (53%) ggc
ipa lto gimple out : 9.40 ( 6%) usr 0.91 (10%) sys 10.36 ( 6%) wall
0 kB ( 0%) ggc
ipa lto decl in : 45.99 (29%) usr 1.66 (19%) sys 47.95 (28%) wall
3285797 kB (2315%) ggc
ipa lto decl out : 35.61 (22%) usr 1.65 (19%) sys 37.23 (22%) wall
0 kB ( 0%) ggc
ipa lto cgraph I/O : 3.73 ( 2%) usr 0.22 ( 2%) sys 3.95 ( 2%) wall
621046 kB (438%) ggc
ipa lto decl merge : 5.75 ( 4%) usr 0.00 ( 0%) sys 5.75 ( 3%) wall
803 kB ( 1%) ggc
ipa lto cgraph merge : 2.79 ( 2%) usr 0.02 ( 0%) sys 2.81 ( 2%) wall
27731 kB (20%) ggc
inline heuristics : 31.32 (19%) usr 0.13 ( 1%) sys 31.48 (18%) wall
252282 kB (178%) ggc
TOTAL : 161.21 8.82 170.40
141952 kB
(i.e. 60% of overall compilation time and about 1/3 if streaming in 1/3 of
straming out and 1/5th for inliner).
oprofile of streaming in:
9467 6.8109 lto1 htab_find_slot_with_hash
9036 6.5008 lto1 inflate_fast
6608 4.7540 libc-2.11.1.so memset
6256 4.5008 libc-2.11.1.so _int_malloc
6243 4.4914 lto1 pointer_map_insert
5694 4.0965 lto1 lto_input_tree
5014 3.6072 lto1 gt_ggc_mx_lang_tree_node
4522 3.2533 lto1 streamer_read_tree_bitfields
4463 3.2108 lto1 ggc_set_mark
4087 2.9403 opreport /usr/bin/opreport
3661 2.6339 lto1 ggc_internal_alloc_stat
3475 2.5000 lto1 streamer_read_uhwi
2508 1.8043 lto1 gimple_type_eq
2418 1.7396 lto1 streamer_read_tree_body
2310 1.6619 libc-2.11.1.so memcpy
2292 1.6489 lto1 streamer_tree_cache_insert_1
2255 1.6223 libc-2.11.1.so memcmp
2119 1.5245 lto1 ht_lookup_with_hash
1902 1.3684 lto1 iterative_hash_hashval_t
1885 1.3561 lto1 lto_fixup_types
1884 1.3554 libc-2.11.1.so _int_free
1872 1.3468 lto1 uniquify_nodes
1842 1.3252 lto1 htab_expand
1825 1.3130 oprofiled /usr/bin/oprofiled
1813 1.3043 lto1 adler32
1734 1.2475 lto1 htab_hash_string
1509 1.0856 libc-2.11.1.so _IO_vfscanf
1470 1.0576 libc-2.11.1.so malloc_consolidate
pointer map and htab is mostly type merging still, I believe.
oprofile of inliner:
8772 37.9215 lto1 edge_badness
5532 23.9149 lto1 do_estimate_growth_1
1647 7.1200 lto1 update_caller_keys
1484 6.4154 lto1 can_inline_edge_p
744 3.2163 lto1 estimate_calls_size_and_time.isra.32
509 2.2004 lto1
estimate_edge_size_and_time.constprop.65
495 2.1399 lto1 fibheap_consolidate
267 1.1542 lto1 fibheap_extr_min_node
210 0.9078 lto1 cgraph_maybe_hot_edge_p
I.e. easy to handle by taming down amout of heap updating.
Stream out:
33711 19.7166 lto1 lto1
varpool_node_for_asm
13947 8.1572 lto1 lto1
decl_assembler_name_equal
8873 5.1896 lto1 lto1
pointer_map_insert
8765 5.1264 lto1 lto1
linemap_lookup
6809 3.9824 lto1 lto1
lto_output_tree
4931 2.8840 lto1 lto1
inflate_fast
4718 2.7594 lto1 lto1
streamer_write_uhwi_stream
3521 2.0593 lto1 lto1
streamer_tree_cache_insert_1
3340 1.9535 lto1 lto1
splay_tree_splay
3293 1.9260 lto1 lto1
streamer_pack_tree_bitfields
3210 1.8774 libc-2.11.1.so libc-2.11.1.so memcpy
3175 1.8570 libc-2.11.1.so libc-2.11.1.so _int_malloc
The assembler name lookups will go away with finishing the alias rewrite.
Oprofile of ltrans stage:
52827 3.3333 lto1 lto1
value_member
45691 2.8830 libc-2.11.1.so libc-2.11.1.so _int_malloc
42528 2.6835 lto1 lto1
bitmap_set_bit
41934 2.6460 oprofiled oprofiled
/usr/bin/oprofiled
22353 1.4104 libc-2.11.1.so libc-2.11.1.so memset
21573 1.3612 lto1 lto1
htab_find_slot_with_hash
20936 1.3210 lto1 lto1
ggc_internal_alloc_stat
19608 1.2372 lto1 lto1
record_reg_classes.constprop.10
17423 1.0994 lto1 lto1
bitmap_bit_p
17195 1.0850 lto1 lto1
for_each_rtx_1
13504 0.8521 libc-2.11.1.so libc-2.11.1.so _int_free
12343 0.7788 lto1 lto1
bitmap_clear_bit
11826 0.7462 lto1 lto1
constrain_operands
The slowest of ltrans is:
garbage collection : 1.69 ( 2%) usr 0.01 ( 0%) sys 1.72 ( 2%) wall
0 kB ( 0%) ggc
ipa lto gimple in : 1.52 ( 2%) usr 0.45 ( 9%) sys 1.94 ( 2%) wall
212002 kB (11%) ggc
ipa lto decl in : 1.61 ( 2%) usr 0.19 ( 4%) sys 1.81 ( 2%) wall
147115 kB ( 7%) ggc
cfg cleanup : 1.46 ( 2%) usr 0.03 ( 1%) sys 1.60 ( 2%) wall
5376 kB ( 0%) ggc
df live regs : 2.26 ( 3%) usr 0.03 ( 1%) sys 2.62 ( 3%) wall
0 kB ( 0%) ggc
tree VRP : 2.04 ( 2%) usr 0.05 ( 1%) sys 2.34 ( 2%) wall
126142 kB ( 6%) ggc
tree PTA : 1.97 ( 2%) usr 0.00 ( 0%) sys 2.43 ( 3%) wall
8733 kB ( 0%) ggc
tree PRE : 2.98 ( 3%) usr 0.07 ( 1%) sys 3.83 ( 4%) wall
64875 kB ( 3%) ggc
tree FRE : 1.50 ( 2%) usr 0.01 ( 0%) sys 1.98 ( 2%) wall
33609 kB ( 2%) ggc
expand : 4.11 ( 5%) usr 0.11 ( 2%) sys 4.85 ( 5%) wall
138280 kB ( 7%) ggc
CSE : 1.88 ( 2%) usr 0.04 ( 1%) sys 2.16 ( 2%) wall
2764 kB ( 0%) ggc
CPROP : 1.83 ( 2%) usr 0.04 ( 1%) sys 1.87 ( 2%) wall
21657 kB ( 1%) ggc
integrated RA : 6.84 ( 8%) usr 0.08 ( 2%) sys 7.30 ( 8%) wall
367479 kB (19%) ggc
reload : 2.47 ( 3%) usr 0.04 ( 1%) sys 2.82 ( 3%) wall
8783 kB ( 0%) ggc
reload CSE regs : 2.03 ( 2%) usr 0.01 ( 0%) sys 2.02 ( 2%) wall
19115 kB ( 1%) ggc
scheduling 2 : 3.08 ( 3%) usr 0.03 ( 1%) sys 3.14 ( 3%) wall
3942 kB ( 0%) ggc
final : 11.46 (13%) usr 1.06 (21%) sys 3.62 ( 4%) wall
40822 kB ( 2%) ggc
rest of compilation : 2.97 ( 3%) usr 0.87 (17%) sys 5.22 ( 5%) wall
60101 kB ( 3%) ggc
unaccounted todo : 1.35 ( 2%) usr 0.67 (13%) sys 2.37 ( 2%) wall
0 kB ( 0%) ggc
TOTAL : 89.65 5.08 95.59
1962376 kB
Final is suprisingly slow.