This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
some interesting timing statistics
- To: gcc at gcc dot gnu dot org
- Subject: some interesting timing statistics
- From: Zack Weinberg <zack at bitmover dot com>
- Date: Tue, 16 Nov 1999 00:36:57 -0800
I have a source file which demonstrates pretty nicely how much slower
the compiler's gotten since 2.7.x. It is 12583 lines long, with 286
functions. On x86:
19991115, -O2 37.78 - 36.43u, 0.62s
19991115, -Os 29.95 - 29.63u, 0.29s
2.95.2, -O2 29.18 - 28.13u, 0.32s
2.7.2.3, -O2 21.31 - 19.05u, 0.38s
-Q statistics still say cse and global are the worst offenders. These
are from 19991115:
-O2
flow 1.010000 3%
local 1.020000 3%
gcse 1.280000 4%
loop 1.810000 5%
combine 1.670000 5%
sched2 2.510000 7%
jump 3.400000 10%
cse2 3.770000 11%
global 3.910000 11%
cse 4.150000 12%
parse 5.470000 15%
-Os
loop 0.870000 3%
local 0.910000 3%
gcse 1.290000 4%
jump 1.530000 5%
parse 1.820000 6%
sched2 1.860000 6%
combine 1.880000 7%
global 3.790000 13%
cse2 4.860000 17%
cse 5.090000 18%
I'm surprised -O2 slows down the parser, of all things, compared to
-Os. A hefty chunk of the performance loss comes from there.
gprof sez:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
4.92 1.81 1.81 585545 0.00 0.00 bzero
3.80 3.21 1.40 259331 0.01 0.02 cse_insn
3.21 4.39 1.18 2604220 0.00 0.00 memset
2.34 5.25 0.86 3505 0.25 0.26 allocate_reg_info
2.07 6.01 0.76 find_hash_table_entry
Hm, that's an awful lot of calls to bzero and memset...
0.13 0.00 41776/585545 calculate_needs
0.13 0.00 42324/585545 choose_reload_regs_init
0.22 0.00 70071/585545 find_reloads
0.36 0.00 116474/585545 reload_cse_simplify_operands
0.49 0.00 159754/585545 sbitmap_zero
[15] 4.9 1.81 0.00 585545 bzero
0.01 0.00 17769/2604220 ggc_collect
0.02 0.00 49912/2604220 create_hash_table
0.04 0.00 83502/2604220 calloc
0.04 0.00 87474/2604220 sbitmap_ones
1.07 0.00 2364825/2604220 ggc_alloc_obj
[31] 3.2 1.18 0.00 2604220 memset
The callers of ggc_alloc_obj are, unsurprisingly, gen_rtx_*,
tree_cons, etc. Therefore, we pretty much know the fields of the
memory block returned from ggc_alloc_obj will be initialized in short
order - perhaps we could avoid clearing it?
sbitmap_zero needs a bit closer examination...
0.00 0.04 14367/159754 reset_opr_set_tables
0.00 0.05 16266/159754 schedule_insns
0.01 0.39 125333/159754 sbitmap_vector_zero
[65] 1.4 0.01 0.49 159754 sbitmap_zero
0.00 0.03 243/3383 compute_pre_data
0.00 0.03 243/3383 pre_edge_insert
0.00 0.09 729/3383 compute_hash_table
0.00 0.11 974/3383 delete_null_pointer_checks_1
0.00 0.14 1194/3383 compute_local_properties
[90] 1.1 0.01 0.40 3383 sbitmap_vector_zero
0.01 0.39 125333/159754 sbitmap_zero
Looking in sbitmap.c, I see we make no effort to capitalize on the
fact that a bitmap vector is allocated in one huge block. The header
info does repeat, but it's all duplicated. I may try to patch this
tomorrow.
The complete profiler output is available on request; it's 252k
compressed, though.
zw