This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: test patch for computed gotos
- From: Brad Lucier <lucier at math dot purdue dot edu>
- To: gcc-patches at gcc dot gnu dot org
- Cc: lucier at math dot purdue dot edu (Brad Lucier), rth at redhat dot com (Richard Henderson), feeley at iro dot umontreal dot ca
- Date: Thu, 6 Mar 2003 16:06:07 -0500 (EST)
- Subject: Re: test patch for computed gotos
> Ah, never mind. I'll try to profile reorder_blocks and see if things can
> be speeded up there.
With a slightly simpler file:
http://www.math.purdue.edu/~lucier/all2.i.gz
gcc version 3.4 20030305 gives the following times:
popov-65% /export/home/lucier/programs/gcc/objdir/gcc/cc1 -fPIC -O1 -fno-trapping-math -fomit-frame-pointer -mieee -fno-math-errno -mcpu=ev6 -fschedule-insns2 -fno-strict-aliasing -freorder-blocks all.i
<lots of function names deleted>
Execution times (seconds)
cfg construction : 1.71 ( 0%) usr 0.10 ( 3%) sys 1.81 ( 0%) wall
cfg cleanup : 7.81 ( 2%) usr 0.01 ( 0%) sys 7.82 ( 2%) wall
trivially dead code : 4.54 ( 1%) usr 0.00 ( 0%) sys 4.55 ( 1%) wall
life analysis : 17.93 ( 4%) usr 0.02 ( 0%) sys 17.95 ( 4%) wall
life info update : 6.44 ( 2%) usr 0.00 ( 0%) sys 6.44 ( 2%) wall
alias analysis : 2.53 ( 1%) usr 0.03 ( 1%) sys 2.57 ( 1%) wall
register scan : 1.38 ( 0%) usr 0.00 ( 0%) sys 1.38 ( 0%) wall
rebuild jump labels : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.63 ( 0%) wall
preprocessing : 4.27 ( 1%) usr 0.40 (10%) sys 4.69 ( 1%) wall
lexical analysis : 5.73 ( 1%) usr 0.90 (23%) sys 6.64 ( 2%) wall
parser : 12.27 ( 3%) usr 0.75 (19%) sys 13.13 ( 3%) wall
expand : 6.63 ( 2%) usr 0.12 ( 3%) sys 6.75 ( 2%) wall
varconst : 0.86 ( 0%) usr 0.02 ( 1%) sys 0.88 ( 0%) wall
integration : 1.48 ( 0%) usr 0.04 ( 1%) sys 1.52 ( 0%) wall
jump : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
CSE : 8.66 ( 2%) usr 0.02 ( 0%) sys 8.67 ( 2%) wall
loop analysis : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
branch prediction : 6.12 ( 1%) usr 0.02 ( 1%) sys 6.14 ( 1%) wall
flow analysis : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall
combiner : 14.78 ( 4%) usr 0.04 ( 1%) sys 14.82 ( 4%) wall
if-conversion : 2.71 ( 1%) usr 0.00 ( 0%) sys 2.71 ( 1%) wall
local alloc : 3.41 ( 1%) usr 0.01 ( 0%) sys 3.42 ( 1%) wall
global alloc : 9.04 ( 2%) usr 0.21 ( 5%) sys 9.25 ( 2%) wall
reload CSE regs : 15.84 ( 4%) usr 0.06 ( 1%) sys 15.90 ( 4%) wall
flow 2 : 1.47 ( 0%) usr 0.02 ( 0%) sys 1.49 ( 0%) wall
if-conversion 2 : 4.26 ( 1%) usr 0.00 ( 0%) sys 4.26 ( 1%) wall
rename registers : 2.96 ( 1%) usr 0.02 ( 0%) sys 2.98 ( 1%) wall
scheduling 2 : 12.26 ( 3%) usr 0.07 ( 2%) sys 12.41 ( 3%) wall
reorder blocks : 246.99 (59%) usr 0.91 (23%) sys 247.93 (59%) wall
shorten branches : 1.31 ( 0%) usr 0.00 ( 0%) sys 1.32 ( 0%) wall
final : 5.06 ( 1%) usr 0.08 ( 2%) sys 5.16 ( 1%) wall
rest of compilation : 7.22 ( 2%) usr 0.05 ( 1%) sys 7.29 ( 2%) wall
TOTAL : 417.08 3.93 421.27
The profile results are fairly simple; cached_make_edge seems to take
a long time for this problem---is the cache enabled? Do we need to
take the quadratic path through cached_make_edge each time?
(This is on a 500MHz alphaev6.)
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
55.96 106.17 106.17 4615324 0.00 0.00 cached_make_edge
3.21 112.25 6.08 htab_traverse
1.61 115.32 3.06 124722 0.00 0.00 et_forest_common_ancestor
1.31 117.81 2.49 486 0.01 0.01 compute_alignments
1.24 120.15 2.34 26877 0.00 0.00 bb_to_key
1.17 122.37 2.22 3888 0.00 0.00 calc_dfs_tree_nonrec
1.03 124.33 1.95 8172 0.00 0.00 find_unreachable_blocks
0.97 126.16 1.83 27619 0.00 0.00 find_if_block
0.94 127.94 1.78 807650 0.00 0.00 constrain_operands
0.87 129.60 1.66 370361 0.00 0.00 try_forward_edges
0.82 131.15 1.55 1 1.55 180.42 yyparse
...
-----------------------------------------------
0.01 170.94 498/498 c_expand_body_1 [7]
[8] 90.1 0.01 170.94 498 rest_of_compilation [8]
0.00 109.52 486/486 reorder_basic_blocks [10]
0.02 8.69 5347/8325 cleanup_cfg <cycle 7> [19]
...
-----------------------------------------------
0.00 109.52 486/486 rest_of_compilation [8]
[10] 57.7 0.00 109.52 486 reorder_basic_blocks [10]
1.13 104.60 484/484 connect_traces [12]
0.00 2.48 484/484 find_traces [55]
0.00 0.97 484/484 cfg_layout_finalize [109]
0.00 0.26 484/484 cfg_layout_initialize [244]
0.01 0.06 484/484 set_edge_can_fallthru_flag [453]
0.01 0.00 484/1455 mark_dfs_back_edges [683]
0.00 0.00 484/484 record_effective_endpoints [1101]
0.00 0.00 484/484 break_superblocks [1196]
0.00 0.00 1/1 get_uncond_jump_length [1560]
0.00 0.00 484/5893 hook_bool_void_false [1722]
-----------------------------------------------
0.08 0.00 3597/4615324 make_single_succ_edge [416]
0.10 0.00 4137/4615324 force_nonfallthru_and_redirect [206]
1.17 0.01 51030/4615324 make_edges [33]
2.33 0.02 101432/4615324 make_label_edge [59]
102.48 0.96 4455128/4615324 cfg_layout_duplicate_bb [14]
[11] 56.5 106.17 0.99 4615324 cached_make_edge [11]
0.99 0.00 4608732/5147236 pool_alloc [100]
-----------------------------------------------
1.13 104.60 484/484 reorder_basic_blocks [10]
[12] 55.7 1.13 104.60 484 connect_traces [12]
0.00 104.38 3273/3273 copy_bb [13]
0.00 0.22 3673/5382 copy_bb_p [227]
-----------------------------------------------