Bug 44440 - ira_initialization and buitins construction taking too much of startup time
Summary: ira_initialization and buitins construction taking too much of startup time
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.5.0
: P2 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 47344
  Show dependency treegraph
 
Reported: 2010-06-07 08:55 UTC by Jan Hubicka
Modified: 2017-11-25 17:28 UTC (History)
5 users (show)

See Also:
Host: x86_64-linux
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-01-12 14:40:30


Attachments
callgrind.startup.bz2 (171.94 KB, application/octet-stream)
2010-06-07 13:36 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2010-06-07 08:55:41 UTC
Hi,
oprofiling compilation of empty file I get:
48319    59.8126  no-vmlinux               /no-vmlinux
3057      3.7842  ld-2.11.1.so             do_lookup_x
2935      3.6331  libc-2.11.1.so           memset
2921      3.6158  ld-2.11.1.so             _dl_relocate_object
1589      1.9670  as                       /usr/bin/as
1270      1.5721  ld-2.11.1.so             _dl_lookup_symbol_x
953       1.1797  cc1                      ggc_alloc_stat
671       0.8306  libc-2.11.1.so           _int_malloc
610       0.7551  ld-2.11.1.so             strcmp
595       0.7365  cc1                      ira_init
594       0.7353  libc-2.11.1.so           strlen
493       0.6103  cc1                      add_builtin_function_common.147729
491       0.6078  cc1                      decl_attributes
483       0.5979  libc-2.11.1.so           memcpy
452       0.5595  libc-2.11.1.so           strcmp
446       0.5521  cc1                      init_reg_sets_1.190433
400       0.4951  cc1                      pop_scope

It is a lot of dynamic linking. Porifling cc1 binary only it is:
953       8.2525  ggc_alloc_stat
595       5.1524  ira_init
493       4.2691  add_builtin_function_common.147729
491       4.2518  decl_attributes
446       3.8621  init_reg_sets_1.190433
400       3.4638  pop_scope
387       3.3512  ix86_hard_regno_mode_ok
362       3.1347  c_write_global_declarations_1.9246.5242
357       3.0914  do_multiply.182320
328       2.8403  do_add.182279
302       2.6152  rtx_cost
293       2.5372  make_node_stat
258       2.2342  ix86_memory_move_cost.386116.7474
256       2.2168  do_divide.182325
255       2.2082  ht_lookup_with_hash
236       2.0436  ix86_rtx_costs.386572.6577
231       2.0003  bind.9267
223       1.9311  normalize.182203
212       1.8358  iterative_hash
194       1.6799  recog
176       1.5241  htab_find_with_hash
168       1.4548  tree_code_size
167       1.4461  def_builtin_1.17388.constprop.16.4002
132       1.1431  copy_node_stat
125       1.0824  is_attribute_with_length_p._part.7.371469
94        0.8140  debug_nothing_tree
90        0.7794  main
88        0.7620  build_int_cst_wide
80        0.6928  c_builtin_function

I guess especially ira initialization can be esially done lazilly on demand like we I for regclass some time ago? The may_move_*_costs can be computed when needed for given mode first time.
Note that this is LTO build, so ira_init gets cross module inlining of functions called once into it.

Honza
Comment 1 Jakub Jelinek 2010-06-07 13:36:14 UTC
Created attachment 20854 [details]
callgrind.startup.bz2

Callgrind dump for --enable-checking=release trunk cc1 from today on an empty file.
Comment 2 Jan Hubicka 2011-01-12 14:40:30 UTC
compiling empty file 100 times takes 3.8s on 4.6, while it takes 2.4s on on gcc 4.3 as well as gcc 4.5.

This is 50% regression. User time increases 100%. So we probably do a lot more initialization than before.

2979      9.4936  ggc_internal_alloc_stat
1771      5.6439  ira_init
1539      4.9046  pop_scope
1258      4.0091  init_reg_sets_1
1036      3.3016  ht_lookup_with_hash
986       3.1422  ht_lookup
951       3.0307  decl_attributes
905       2.8841  ix86_hard_regno_mode_ok
852       2.7152  copy_node_stat
750       2.3901  rtx_cost
717       2.2850  bind
635       2.0236  ix86_rtx_costs
633       2.0173  ggc_internal_cleared_alloc_stat
632       2.0141  ix86_memory_move_cost
619       1.9727  c_write_global_declarations_1
566       1.8038  make_node_stat
553       1.7623  htab_find_with_hash
519       1.6540  recog
497       1.5839  iterative_hash
Comment 3 Richard Biener 2011-03-03 11:29:36 UTC
Can it be a side-effect of turning target macros into target hooks?
Comment 4 Jakub Jelinek 2011-03-25 19:51:55 UTC
GCC 4.6.0 is being released, adjusting target milestone.
Comment 5 Jakub Jelinek 2011-06-27 12:32:37 UTC
GCC 4.6.1 is being released.
Comment 6 Jakub Jelinek 2011-10-26 17:13:19 UTC
GCC 4.6.2 is being released.
Comment 7 Andrew Pinski 2012-01-20 03:41:27 UTC
Can you try to see if this has been improved now?
Comment 8 Jakub Jelinek 2012-03-01 14:38:04 UTC
GCC 4.6.3 is being released.
Comment 9 Jakub Jelinek 2013-04-12 15:16:38 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 10 Richard Biener 2014-06-12 13:44:43 UTC
The 4.7 branch is being closed, moving target milestone to 4.8.4.
Comment 11 Jakub Jelinek 2014-12-19 13:34:31 UTC
GCC 4.8.4 has been released.
Comment 12 Richard Biener 2015-06-23 08:17:23 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 13 Jakub Jelinek 2015-06-26 19:58:31 UTC
GCC 4.9.3 has been released.
Comment 14 Richard Biener 2016-08-03 10:46:54 UTC
GCC 4.9 branch is being closed
Comment 15 Jan Hubicka 2017-01-19 15:08:17 UTC
Now I get (for 500 invocations)
           real       user      sys
 GCC 7:    0m9.816s   0m6.274s  0m3.546s
 GCC 6:    0m7.880s   0m4.253s  0m3.605s
 GCC 5:    0m7.655s   0m4.264s  0m3.159s
 GCC 4.6:  0m7.271s   0m4.094s  0m3.085s

We used to lazilly initialize regalloc cost tables before IRA was merged. I think it contributes to major part of the slowdown.

Note that newly, in addition to the regclass cost, we do fair amount of software simulated multiplies and divides

   4.59%  cc1      cc1                [.] do_multiply                                                                                                                                       
   3.11%  cc1      [kernel.kallsyms]  [k] clear_page                                                                                                                                        
   3.01%  cc1      libc-2.19.so       [.] memset                                                                                                                                            
   2.86%  cc1      cc1                [.] ggc_internal_alloc                                                                                                                                
   2.83%  cc1      [kernel.kallsyms]  [k] page_fault                                                                                                                                        
   2.79%  cc1      [kernel.kallsyms]  [k] filemap_map_pages                                                                                                                                 
   2.45%  cc1      cc1                [.] init_reg_sets_1                                                                                                                                   
   2.22%  cc1      cc1                [.] do_add                                                                                                                                            
   2.18%  cc1      libc-2.19.so       [.] __GI___strcmp_ssse3                                                                                                                               
   2.02%  cc1      cc1                [.] ht_lookup_with_hash                                                                                                                               
   1.85%  cc1      cc1                [.] do_divide                                                                                                                                         
   1.49%  cc1      cc1                [.] hash_table<attribute_hasher, xcallocator>::find_with_hash                                                                                         
   1.41%  cc1      [kernel.kallsyms]  [k] unmap_single_vma                                                                                                                                  
   1.37%  cc1      [kernel.kallsyms]  [k] release_pages                                                                                                                                     
   1.29%  cc1      cc1                [.] decl_attributes                                                                                                                                   
   1.21%  cc1      libc-2.19.so       [.] _int_malloc                                                                                                                                       
   1.19%  cc1      libc-2.19.so       [.] strlen                                                                                                                                            
   1.12%  cc1      cc1                [.] normalize
Comment 16 Jeffrey A. Law 2017-02-14 17:08:00 UTC
Attached to our meta bug for old slowdowns/memory consumption issues.  Regression marker removed.  AFAIK nobody is working on this.