Bug 44440 - [4.7/4.8/4.9/4.10 regression] ira_initialization and buitins construction taking too much of startup time
[4.7/4.8/4.9/4.10 regression] ira_initialization and buitins construction tak...
Status: NEW
Product: gcc
Classification: Unclassified
Component: middle-end
4.5.0
: P2 normal
: 4.7.4
Assigned To: Not yet assigned to anyone
:
Depends on:
Blocks: 47344
  Show dependency treegraph
 
Reported: 2010-06-07 08:55 UTC by Jan Hubicka
Modified: 2013-04-12 15:16 UTC (History)
2 users (show)

See Also:
Host: x86_64-linux
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-01-12 14:40:30


Attachments
callgrind.startup.bz2 (171.90 KB, application/octet-stream)
2010-06-07 13:36 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2010-06-07 08:55:41 UTC
Hi,
oprofiling compilation of empty file I get:
48319    59.8126  no-vmlinux               /no-vmlinux
3057      3.7842  ld-2.11.1.so             do_lookup_x
2935      3.6331  libc-2.11.1.so           memset
2921      3.6158  ld-2.11.1.so             _dl_relocate_object
1589      1.9670  as                       /usr/bin/as
1270      1.5721  ld-2.11.1.so             _dl_lookup_symbol_x
953       1.1797  cc1                      ggc_alloc_stat
671       0.8306  libc-2.11.1.so           _int_malloc
610       0.7551  ld-2.11.1.so             strcmp
595       0.7365  cc1                      ira_init
594       0.7353  libc-2.11.1.so           strlen
493       0.6103  cc1                      add_builtin_function_common.147729
491       0.6078  cc1                      decl_attributes
483       0.5979  libc-2.11.1.so           memcpy
452       0.5595  libc-2.11.1.so           strcmp
446       0.5521  cc1                      init_reg_sets_1.190433
400       0.4951  cc1                      pop_scope

It is a lot of dynamic linking. Porifling cc1 binary only it is:
953       8.2525  ggc_alloc_stat
595       5.1524  ira_init
493       4.2691  add_builtin_function_common.147729
491       4.2518  decl_attributes
446       3.8621  init_reg_sets_1.190433
400       3.4638  pop_scope
387       3.3512  ix86_hard_regno_mode_ok
362       3.1347  c_write_global_declarations_1.9246.5242
357       3.0914  do_multiply.182320
328       2.8403  do_add.182279
302       2.6152  rtx_cost
293       2.5372  make_node_stat
258       2.2342  ix86_memory_move_cost.386116.7474
256       2.2168  do_divide.182325
255       2.2082  ht_lookup_with_hash
236       2.0436  ix86_rtx_costs.386572.6577
231       2.0003  bind.9267
223       1.9311  normalize.182203
212       1.8358  iterative_hash
194       1.6799  recog
176       1.5241  htab_find_with_hash
168       1.4548  tree_code_size
167       1.4461  def_builtin_1.17388.constprop.16.4002
132       1.1431  copy_node_stat
125       1.0824  is_attribute_with_length_p._part.7.371469
94        0.8140  debug_nothing_tree
90        0.7794  main
88        0.7620  build_int_cst_wide
80        0.6928  c_builtin_function

I guess especially ira initialization can be esially done lazilly on demand like we I for regclass some time ago? The may_move_*_costs can be computed when needed for given mode first time.
Note that this is LTO build, so ira_init gets cross module inlining of functions called once into it.

Honza
Comment 1 Jakub Jelinek 2010-06-07 13:36:14 UTC
Created attachment 20854 [details]
callgrind.startup.bz2

Callgrind dump for --enable-checking=release trunk cc1 from today on an empty file.
Comment 2 Jan Hubicka 2011-01-12 14:40:30 UTC
compiling empty file 100 times takes 3.8s on 4.6, while it takes 2.4s on on gcc 4.3 as well as gcc 4.5.

This is 50% regression. User time increases 100%. So we probably do a lot more initialization than before.

2979      9.4936  ggc_internal_alloc_stat
1771      5.6439  ira_init
1539      4.9046  pop_scope
1258      4.0091  init_reg_sets_1
1036      3.3016  ht_lookup_with_hash
986       3.1422  ht_lookup
951       3.0307  decl_attributes
905       2.8841  ix86_hard_regno_mode_ok
852       2.7152  copy_node_stat
750       2.3901  rtx_cost
717       2.2850  bind
635       2.0236  ix86_rtx_costs
633       2.0173  ggc_internal_cleared_alloc_stat
632       2.0141  ix86_memory_move_cost
619       1.9727  c_write_global_declarations_1
566       1.8038  make_node_stat
553       1.7623  htab_find_with_hash
519       1.6540  recog
497       1.5839  iterative_hash
Comment 3 Richard Biener 2011-03-03 11:29:36 UTC
Can it be a side-effect of turning target macros into target hooks?
Comment 4 Jakub Jelinek 2011-03-25 19:51:55 UTC
GCC 4.6.0 is being released, adjusting target milestone.
Comment 5 Jakub Jelinek 2011-06-27 12:32:37 UTC
GCC 4.6.1 is being released.
Comment 6 Jakub Jelinek 2011-10-26 17:13:19 UTC
GCC 4.6.2 is being released.
Comment 7 Andrew Pinski 2012-01-20 03:41:27 UTC
Can you try to see if this has been improved now?
Comment 8 Jakub Jelinek 2012-03-01 14:38:04 UTC
GCC 4.6.3 is being released.
Comment 9 Jakub Jelinek 2013-04-12 15:16:38 UTC
GCC 4.6.4 has been released and the branch has been closed.