This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Faster compilation speed
- From: Jeff Sturm <jsturm at one-point dot com>
- To: David Edelsohn <dje at watson dot ibm dot com>
- Cc: "David S. Miller" <davem at redhat dot com>, <dan at dberlin dot org>, <austern at apple dot com>, <gcc at gcc dot gnu dot org>
- Date: Sun, 18 Aug 2002 15:57:43 -0400 (EDT)
- Subject: Re: Faster compilation speed
On Tue, 13 Aug 2002, David Edelsohn wrote:
> Here's an interesting (aka depressing) data point. My previous
> cache miss statistics were for GCC -O2. At -O0, GCC's cache miss
> statistics stay the same or get up to 20% *worse*. In comparison, the
> cache statistics for IBM's compiler without optimization enabled *improve*
> up to 50 for the same reload.c and insn-recog.c input files compared to
> optimized.
Here's a data point on alpha-linux:
cc1 -quiet -O2 reload.i
issues/cycles = 0.51 issues/dcache_miss = 26.93
Without optimization:
cc1 -quiet reload.i
issues/cycles = 0.52 issues/dcache_miss = 31.29
This is on a ev56 with a direct-mapped cache. To get some idea where the
misses are taking place, I experimented with iprobe's sampling mode.
Omitting results below the 1% sample threshold, I get:
function | issues | access | misses | i/m | a/m
----------------------------+--------+--------+--------+-----+-----
yyparse | 2924 | 848 | 148 | 20 | 5.7
gt_ggc_mx_lang_tree_node | 1336 | 612 | 74 | 18 | 8.2
verify_flow_info | 1388 | 408 | 129 | 11 | 3.1
copy_rtx_if_shared | 2120 | 1012 | 53 | 40 | 19.0
propagate_one_insn | 3636 | 504 | 52 | 70 | 9.6
find_temp_slot_from_address | 728 | 232 | 126 | 6 | 1.8
ggc_mark_rtx_children_1 | 1580 | 316 | 40 | 40 | 7.9
extract_insn | 1576 | 476 | 52 | 30 | 9.1
record_reg_classes | 3848 | 944 | 65 | 59 | 14.5
reg_scan_mark_refs | 1472 | 632 | 66 | 22 | 9.5
find_reloads | 7680 | 3104 | 148 | 52 | 20.9
subst_reloads | 4772 | 2736 | 169 | 28 | 16.1
side_effects_p | 1344 | 564 | 43 | 31 | 13.1
for_each_rtx | 4924 | 1464 | 75 | 66 | 19.5
ggc_alloc | 2424 | 728 | 111 | 22 | 6.5
ggc_set_mark | 3392 | 976 | 107 | 32 | 9.1
(Each sample reported is 2^14 events.)
yyparse performs badly (as would any table-driven parser), but how about
verify_flow_info and find_temp_slot_from_address? Both are reporting
awful cache behavior.
Jeff