This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


On Tue, 13 Aug 2002, David Edelsohn wrote:
> 	Here's an interesting (aka depressing) data point.  My previous
> cache miss statistics were for GCC -O2.  At -O0, GCC's cache miss
> statistics stay the same or get up to 20% *worse*.  In comparison, the
> cache statistics for IBM's compiler without optimization enabled *improve*
> up to 50 for the same reload.c and insn-recog.c input files compared to
> optimized.

Here's a data point on alpha-linux:

cc1 -quiet -O2 reload.i
issues/cycles = 0.51  issues/dcache_miss = 26.93

Without optimization:

cc1 -quiet  reload.i
issues/cycles = 0.52  issues/dcache_miss = 31.29

This is on a ev56 with a direct-mapped cache.  To get some idea where the
misses are taking place, I experimented with iprobe's sampling mode.
Omitting results below the 1% sample threshold, I get:

function                    | issues | access | misses | i/m |  a/m
----------------------------+--------+--------+--------+-----+-----
yyparse                     |   2924 |    848 |    148 |  20 |  5.7
gt_ggc_mx_lang_tree_node    |   1336 |    612 |     74 |  18 |  8.2
verify_flow_info            |   1388 |    408 |    129 |  11 |  3.1
copy_rtx_if_shared          |   2120 |   1012 |     53 |  40 | 19.0
propagate_one_insn          |   3636 |    504 |     52 |  70 |  9.6
find_temp_slot_from_address |    728 |    232 |    126 |   6 |  1.8
ggc_mark_rtx_children_1     |   1580 |    316 |     40 |  40 |  7.9
extract_insn                |   1576 |    476 |     52 |  30 |  9.1
record_reg_classes          |   3848 |    944 |     65 |  59 | 14.5
reg_scan_mark_refs          |   1472 |    632 |     66 |  22 |  9.5
find_reloads                |   7680 |   3104 |    148 |  52 | 20.9
subst_reloads               |   4772 |   2736 |    169 |  28 | 16.1
side_effects_p              |   1344 |    564 |     43 |  31 | 13.1
for_each_rtx                |   4924 |   1464 |     75 |  66 | 19.5
ggc_alloc                   |   2424 |    728 |    111 |  22 |  6.5
ggc_set_mark                |   3392 |    976 |    107 |  32 |  9.1

(Each sample reported is 2^14 events.)

yyparse performs badly (as would any table-driven parser), but how about
verify_flow_info and find_temp_slot_from_address?  Both are reporting
awful cache behavior.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]