This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug lto/45375] [meta-bug] Issues with building Mozilla with LTO


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45375

--- Comment #119 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-10-19 09:22:01 UTC ---
Some up to date perfomrance data.  WPA peaks 3.1GB in TOP now. (3261 virt).
Overall compile time is 4m32s real, 21m14 user.
GGC memory is GC 2248537k -> 1727826k

WPA time report:
 callgraph optimization  :   1.68 ( 1%) usr   0.00 ( 0%) sys   1.70 ( 1%) wall 
 16008 kB (11%) ggc
 varpool construction    :   0.66 ( 0%) usr   0.02 ( 0%) sys   0.68 ( 0%) wall 
 55300 kB (39%) ggc
 ipa cp                  :   1.70 ( 1%) usr   0.09 ( 1%) sys   1.79 ( 1%) wall 
 75845 kB (53%) ggc
 ipa lto gimple out      :   9.40 ( 6%) usr   0.91 (10%) sys  10.36 ( 6%) wall 
     0 kB ( 0%) ggc
 ipa lto decl in         :  45.99 (29%) usr   1.66 (19%) sys  47.95 (28%) wall
3285797 kB (2315%) ggc
 ipa lto decl out        :  35.61 (22%) usr   1.65 (19%) sys  37.23 (22%) wall 
     0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   3.73 ( 2%) usr   0.22 ( 2%) sys   3.95 ( 2%) wall 
621046 kB (438%) ggc
 ipa lto decl merge      :   5.75 ( 4%) usr   0.00 ( 0%) sys   5.75 ( 3%) wall 
   803 kB ( 1%) ggc
 ipa lto cgraph merge    :   2.79 ( 2%) usr   0.02 ( 0%) sys   2.81 ( 2%) wall 
 27731 kB (20%) ggc
 inline heuristics       :  31.32 (19%) usr   0.13 ( 1%) sys  31.48 (18%) wall 
252282 kB (178%) ggc
 TOTAL                 : 161.21             8.82           170.40            
141952 kB

(i.e. 60% of overall compilation time and about 1/3 if streaming in 1/3 of
straming out and 1/5th for inliner).

oprofile of streaming in:
9467      6.8109  lto1                     htab_find_slot_with_hash
9036      6.5008  lto1                     inflate_fast
6608      4.7540  libc-2.11.1.so           memset
6256      4.5008  libc-2.11.1.so           _int_malloc
6243      4.4914  lto1                     pointer_map_insert
5694      4.0965  lto1                     lto_input_tree
5014      3.6072  lto1                     gt_ggc_mx_lang_tree_node
4522      3.2533  lto1                     streamer_read_tree_bitfields
4463      3.2108  lto1                     ggc_set_mark
4087      2.9403  opreport                 /usr/bin/opreport
3661      2.6339  lto1                     ggc_internal_alloc_stat
3475      2.5000  lto1                     streamer_read_uhwi
2508      1.8043  lto1                     gimple_type_eq
2418      1.7396  lto1                     streamer_read_tree_body
2310      1.6619  libc-2.11.1.so           memcpy
2292      1.6489  lto1                     streamer_tree_cache_insert_1
2255      1.6223  libc-2.11.1.so           memcmp
2119      1.5245  lto1                     ht_lookup_with_hash
1902      1.3684  lto1                     iterative_hash_hashval_t
1885      1.3561  lto1                     lto_fixup_types
1884      1.3554  libc-2.11.1.so           _int_free
1872      1.3468  lto1                     uniquify_nodes
1842      1.3252  lto1                     htab_expand
1825      1.3130  oprofiled                /usr/bin/oprofiled
1813      1.3043  lto1                     adler32
1734      1.2475  lto1                     htab_hash_string
1509      1.0856  libc-2.11.1.so           _IO_vfscanf
1470      1.0576  libc-2.11.1.so           malloc_consolidate

pointer map and htab is mostly type merging still, I believe.

oprofile of inliner:
8772     37.9215  lto1                     edge_badness
5532     23.9149  lto1                     do_estimate_growth_1
1647      7.1200  lto1                     update_caller_keys
1484      6.4154  lto1                     can_inline_edge_p
744       3.2163  lto1                     estimate_calls_size_and_time.isra.32
509       2.2004  lto1                    
estimate_edge_size_and_time.constprop.65
495       2.1399  lto1                     fibheap_consolidate
267       1.1542  lto1                     fibheap_extr_min_node
210       0.9078  lto1                     cgraph_maybe_hot_edge_p

I.e. easy to handle by taming down amout of heap updating.

Stream out:
33711    19.7166  lto1                     lto1                    
varpool_node_for_asm
13947     8.1572  lto1                     lto1                    
decl_assembler_name_equal
8873      5.1896  lto1                     lto1                    
pointer_map_insert
8765      5.1264  lto1                     lto1                    
linemap_lookup
6809      3.9824  lto1                     lto1                    
lto_output_tree
4931      2.8840  lto1                     lto1                    
inflate_fast
4718      2.7594  lto1                     lto1                    
streamer_write_uhwi_stream
3521      2.0593  lto1                     lto1                    
streamer_tree_cache_insert_1
3340      1.9535  lto1                     lto1                    
splay_tree_splay
3293      1.9260  lto1                     lto1                    
streamer_pack_tree_bitfields
3210      1.8774  libc-2.11.1.so           libc-2.11.1.so           memcpy
3175      1.8570  libc-2.11.1.so           libc-2.11.1.so           _int_malloc

The assembler name lookups will go away with finishing the alias rewrite.

Oprofile of ltrans stage:
52827     3.3333  lto1                     lto1                    
value_member
45691     2.8830  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
42528     2.6835  lto1                     lto1                    
bitmap_set_bit
41934     2.6460  oprofiled                oprofiled               
/usr/bin/oprofiled
22353     1.4104  libc-2.11.1.so           libc-2.11.1.so           memset
21573     1.3612  lto1                     lto1                    
htab_find_slot_with_hash
20936     1.3210  lto1                     lto1                    
ggc_internal_alloc_stat
19608     1.2372  lto1                     lto1                    
record_reg_classes.constprop.10
17423     1.0994  lto1                     lto1                    
bitmap_bit_p
17195     1.0850  lto1                     lto1                    
for_each_rtx_1
13504     0.8521  libc-2.11.1.so           libc-2.11.1.so           _int_free
12343     0.7788  lto1                     lto1                    
bitmap_clear_bit
11826     0.7462  lto1                     lto1                    
constrain_operands


The slowest of ltrans is:
 garbage collection      :   1.69 ( 2%) usr   0.01 ( 0%) sys   1.72 ( 2%) wall 
     0 kB ( 0%) ggc
 ipa lto gimple in       :   1.52 ( 2%) usr   0.45 ( 9%) sys   1.94 ( 2%) wall 
212002 kB (11%) ggc
 ipa lto decl in         :   1.61 ( 2%) usr   0.19 ( 4%) sys   1.81 ( 2%) wall 
147115 kB ( 7%) ggc
 cfg cleanup             :   1.46 ( 2%) usr   0.03 ( 1%) sys   1.60 ( 2%) wall 
  5376 kB ( 0%) ggc
 df live regs            :   2.26 ( 3%) usr   0.03 ( 1%) sys   2.62 ( 3%) wall 
     0 kB ( 0%) ggc
 tree VRP                :   2.04 ( 2%) usr   0.05 ( 1%) sys   2.34 ( 2%) wall 
126142 kB ( 6%) ggc
 tree PTA                :   1.97 ( 2%) usr   0.00 ( 0%) sys   2.43 ( 3%) wall 
  8733 kB ( 0%) ggc
 tree PRE                :   2.98 ( 3%) usr   0.07 ( 1%) sys   3.83 ( 4%) wall 
 64875 kB ( 3%) ggc
 tree FRE                :   1.50 ( 2%) usr   0.01 ( 0%) sys   1.98 ( 2%) wall 
 33609 kB ( 2%) ggc
 expand                  :   4.11 ( 5%) usr   0.11 ( 2%) sys   4.85 ( 5%) wall 
138280 kB ( 7%) ggc
 CSE                     :   1.88 ( 2%) usr   0.04 ( 1%) sys   2.16 ( 2%) wall 
  2764 kB ( 0%) ggc
 CPROP                   :   1.83 ( 2%) usr   0.04 ( 1%) sys   1.87 ( 2%) wall 
 21657 kB ( 1%) ggc
 integrated RA           :   6.84 ( 8%) usr   0.08 ( 2%) sys   7.30 ( 8%) wall 
367479 kB (19%) ggc
 reload                  :   2.47 ( 3%) usr   0.04 ( 1%) sys   2.82 ( 3%) wall 
  8783 kB ( 0%) ggc
 reload CSE regs         :   2.03 ( 2%) usr   0.01 ( 0%) sys   2.02 ( 2%) wall 
 19115 kB ( 1%) ggc
 scheduling 2            :   3.08 ( 3%) usr   0.03 ( 1%) sys   3.14 ( 3%) wall 
  3942 kB ( 0%) ggc
 final                   :  11.46 (13%) usr   1.06 (21%) sys   3.62 ( 4%) wall 
 40822 kB ( 2%) ggc
 rest of compilation     :   2.97 ( 3%) usr   0.87 (17%) sys   5.22 ( 5%) wall 
 60101 kB ( 3%) ggc
 unaccounted todo        :   1.35 ( 2%) usr   0.67 (13%) sys   2.37 ( 2%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :  89.65             5.08            95.59           
1962376 kB

Final is suprisingly slow.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]