This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

some interesting timing statistics



I have a source file which demonstrates pretty nicely how much slower
the compiler's gotten since 2.7.x.  It is 12583 lines long, with 286
functions.  On x86:

19991115, -O2	37.78 - 36.43u, 0.62s
19991115, -Os	29.95 - 29.63u, 0.29s
2.95.2, -O2	29.18 - 28.13u, 0.32s
2.7.2.3, -O2	21.31 - 19.05u, 0.38s

-Q statistics still say cse and global are the worst offenders.  These
are from 19991115:

-O2
flow	1.010000 3%
local	1.020000 3%
gcse	1.280000 4%
loop	1.810000 5%
combine	1.670000 5%
sched2	2.510000 7%
jump	3.400000 10%
cse2	3.770000 11%
global	3.910000 11%
cse	4.150000 12%
parse	5.470000 15%

-Os
loop	0.870000 3%
local	0.910000 3%
gcse	1.290000 4%
jump	1.530000 5%
parse	1.820000 6%
sched2	1.860000 6%
combine	1.880000 7%
global	3.790000 13%
cse2	4.860000 17%
cse	5.090000 18%

I'm surprised -O2 slows down the parser, of all things, compared to
-Os.  A hefty chunk of the performance loss comes from there.

gprof sez:
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  4.92      1.81     1.81   585545     0.00     0.00  bzero
  3.80      3.21     1.40   259331     0.01     0.02  cse_insn
  3.21      4.39     1.18  2604220     0.00     0.00  memset
  2.34      5.25     0.86     3505     0.25     0.26  allocate_reg_info
  2.07      6.01     0.76                             find_hash_table_entry

Hm, that's an awful lot of calls to bzero and memset...

                0.13    0.00   41776/585545      calculate_needs
                0.13    0.00   42324/585545      choose_reload_regs_init
                0.22    0.00   70071/585545      find_reloads
                0.36    0.00  116474/585545      reload_cse_simplify_operands
                0.49    0.00  159754/585545      sbitmap_zero
[15]     4.9    1.81    0.00  585545         bzero

                0.01    0.00   17769/2604220     ggc_collect
                0.02    0.00   49912/2604220     create_hash_table
                0.04    0.00   83502/2604220     calloc
                0.04    0.00   87474/2604220     sbitmap_ones
                1.07    0.00 2364825/2604220     ggc_alloc_obj
[31]     3.2    1.18    0.00 2604220         memset

The callers of ggc_alloc_obj are, unsurprisingly, gen_rtx_*,
tree_cons, etc.  Therefore, we pretty much know the fields of the
memory block returned from ggc_alloc_obj will be initialized in short
order - perhaps we could avoid clearing it?

sbitmap_zero needs a bit closer examination...

                0.00    0.04   14367/159754      reset_opr_set_tables
                0.00    0.05   16266/159754      schedule_insns
                0.01    0.39  125333/159754      sbitmap_vector_zero
[65]     1.4    0.01    0.49  159754         sbitmap_zero

                0.00    0.03     243/3383        compute_pre_data
                0.00    0.03     243/3383        pre_edge_insert
                0.00    0.09     729/3383        compute_hash_table
                0.00    0.11     974/3383        delete_null_pointer_checks_1
                0.00    0.14    1194/3383        compute_local_properties
[90]     1.1    0.01    0.40    3383         sbitmap_vector_zero
                0.01    0.39  125333/159754      sbitmap_zero

Looking in sbitmap.c, I see we make no effort to capitalize on the
fact that a bitmap vector is allocated in one huge block.  The header
info does repeat, but it's all duplicated.  I may try to patch this
tomorrow.

The complete profiler output is available on request; it's 252k
compressed, though.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]