This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Compilation performance comparison of 3.5.0 and TreeSSA treeson MICO sources as requested in: [tree-ssa] Merge status 2004-05-03
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: Daniel Berlin <dberlin at dberlin dot org>
- Cc: Karel Gardas <kgardas at objectsecurity dot com>, Diego Novillo <dnovillo at redhat dot com>,"gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Steven Bosscher <stevenb at suse dot de>, law at redhat dot com
- Date: Wed, 05 May 2004 23:58:35 +0200
- Subject: Re: Compilation performance comparison of 3.5.0 and TreeSSA treeson MICO sources as requested in: [tree-ssa] Merge status 2004-05-03
- References: <Pine.LNX.4.43.0405052206370.13381-100000@thinkpad.gardas.net> <CC107CEE-9ED1-11D8-9B55-000A95DA505C@dberlin.org>
Daniel Berlin wrote:
Is it possible to not bootstrap tree-ssa, but just compile
it by GCC3.4.0/3.5.0 and see if parser is faster? If so, how?
Sure.
Don't do make bootstrap, just do make
It'll compile gcc with your system compiler
I made that experiment to check my suspicions about tree-ssa not being
good at optimizing itself. Results do _not_ confirm this theory:
tree-ssa built by tree-ssa compiling tramp3d-v3.cpp:
Execution times (seconds)
garbage collection : 12.90 ( 6%) usr 0.00 ( 0%) sys 13.04 ( 6%)
callgraph construction: 1.50 ( 1%) usr 0.00 ( 0%) sys 1.52 ( 1%)
callgraph optimization: 0.56 ( 0%) usr 0.06 ( 1%) sys 0.62 ( 0%)
cfg construction : 0.64 ( 0%) usr 0.02 ( 0%) sys 0.66 ( 0%)
cfg cleanup : 2.23 ( 1%) usr 0.01 ( 0%) sys 2.24 ( 1%)
trivially dead code : 2.31 ( 1%) usr 0.00 ( 0%) sys 2.31 ( 1%)
life analysis : 4.46 ( 2%) usr 0.00 ( 0%) sys 4.50 ( 2%)
life info update : 2.38 ( 1%) usr 0.01 ( 0%) sys 2.39 ( 1%)
alias analysis : 3.33 ( 2%) usr 0.00 ( 0%) sys 3.35 ( 2%)
register scan : 1.88 ( 1%) usr 0.00 ( 0%) sys 1.90 ( 1%)
rebuild jump labels : 0.56 ( 0%) usr 0.00 ( 0%) sys 0.56 ( 0%)
preprocessing : 0.70 ( 0%) usr 0.14 ( 3%) sys 0.88 ( 0%)
parser : 16.16 ( 8%) usr 1.15 (25%) sys 17.52 ( 8%)
name lookup : 4.98 ( 2%) usr 1.27 (28%) sys 6.32 ( 3%)
integration : 22.32 (11%) usr 0.15 ( 3%) sys 22.71 (10%)
tree gimplify : 4.01 ( 2%) usr 0.08 ( 2%) sys 4.19 ( 2%)
tree eh : 2.84 ( 1%) usr 0.01 ( 0%) sys 2.85 ( 1%)
tree CFG construction : 1.66 ( 1%) usr 0.05 ( 1%) sys 1.71 ( 1%)
tree CFG cleanup : 2.61 ( 1%) usr 0.00 ( 0%) sys 2.63 ( 1%)
tree PTA : 0.71 ( 0%) usr 0.00 ( 0%) sys 0.73 ( 0%)
tree alias analysis : 0.88 ( 0%) usr 0.01 ( 0%) sys 0.89 ( 0%)
tree PHI insertion : 2.50 ( 1%) usr 0.01 ( 0%) sys 2.57 ( 1%)
tree SSA rewrite : 3.19 ( 2%) usr 0.00 ( 0%) sys 3.21 ( 1%)
tree SSA other : 5.27 ( 2%) usr 0.15 ( 3%) sys 5.44 ( 2%)
tree operand scan : 2.92 ( 1%) usr 0.29 ( 6%) sys 3.32 ( 2%)
dominator optimization: 11.84 ( 6%) usr 0.18 ( 4%) sys 12.09 ( 6%)
tree SRA : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%)
tree CCP : 2.05 ( 1%) usr 0.01 ( 0%) sys 2.10 ( 1%)
tree split crit edges : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%)
tree PRE : 6.72 ( 3%) usr 0.04 ( 1%) sys 6.76 ( 3%)
tree linearize phis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%)
tree forward propagate: 1.25 ( 1%) usr 0.00 ( 0%) sys 1.29 ( 1%)
tree conservative DCE : 2.66 ( 1%) usr 0.02 ( 0%) sys 2.68 ( 1%)
tree aggressive DCE : 1.23 ( 1%) usr 0.01 ( 0%) sys 1.24 ( 1%)
tree DSE : 2.51 ( 1%) usr 0.00 ( 0%) sys 2.57 ( 1%)
tree copy headers : 2.09 ( 1%) usr 0.03 ( 1%) sys 2.16 ( 1%)
tree SSA to normal : 3.48 ( 2%) usr 0.13 ( 3%) sys 3.66 ( 2%)
tree rename SSA copies: 0.99 ( 0%) usr 0.01 ( 0%) sys 1.02 ( 0%)
dominance frontiers : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%)
control dependences : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%)
expand : 20.61 (10%) usr 0.09 ( 2%) sys 20.93 (10%)
varconst : 0.49 ( 0%) usr 0.00 ( 0%) sys 0.49 ( 0%)
jump : 1.15 ( 1%) usr 0.06 ( 1%) sys 1.21 ( 1%)
CSE : 7.82 ( 4%) usr 0.02 ( 0%) sys 7.90 ( 4%)
global CSE : 5.15 ( 2%) usr 0.01 ( 0%) sys 5.20 ( 2%)
loop analysis : 1.12 ( 1%) usr 0.00 ( 0%) sys 1.14 ( 1%)
bypass jumps : 1.05 ( 0%) usr 0.01 ( 0%) sys 1.06 ( 0%)
web : 1.30 ( 1%) usr 0.02 ( 0%) sys 1.32 ( 1%)
CSE 2 : 3.02 ( 1%) usr 0.02 ( 0%) sys 3.09 ( 1%)
branch prediction : 2.05 ( 1%) usr 0.03 ( 1%) sys 2.12 ( 1%)
flow analysis : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%)
combiner : 2.94 ( 1%) usr 0.02 ( 0%) sys 3.00 ( 1%)
if-conversion : 0.59 ( 0%) usr 0.00 ( 0%) sys 0.63 ( 0%)
regmove : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.86 ( 0%)
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
local alloc : 2.49 ( 1%) usr 0.02 ( 0%) sys 2.55 ( 1%)
global alloc : 5.72 ( 3%) usr 0.10 ( 2%) sys 5.88 ( 3%)
reload CSE regs : 2.49 ( 1%) usr 0.02 ( 0%) sys 2.58 ( 1%)
flow 2 : 0.69 ( 0%) usr 0.00 ( 0%) sys 0.71 ( 0%)
if-conversion 2 : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%)
peephole 2 : 0.51 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%)
rename registers : 0.81 ( 0%) usr 0.01 ( 0%) sys 0.82 ( 0%)
scheduling 2 : 4.50 ( 2%) usr 0.06 ( 1%) sys 4.65 ( 2%)
machine dep reorg : 0.80 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%)
reorder blocks : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.65 ( 0%)
shorten branches : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.92 ( 0%)
reg stack : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%)
final : 1.53 ( 1%) usr 0.14 ( 3%) sys 1.67 ( 1%)
symout : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.04 ( 0%)
rest of compilation : 2.39 ( 1%) usr 0.02 ( 0%) sys 2.44 ( 1%)
TOTAL : 211.55 4.52 218.47
tree-ssa build by mainline building tramp3d-v3.cpp testcase:
Execution times (seconds)
garbage collection : 13.59 ( 6%) usr 0.01 ( 0%) sys 13.76 ( 6%)
callgraph construction: 1.56 ( 1%) usr 0.01 ( 0%) sys 1.58 ( 1%)
callgraph optimization: 0.60 ( 0%) usr 0.02 ( 0%) sys 0.62 ( 0%)
cfg construction : 0.58 ( 0%) usr 0.01 ( 0%) sys 0.59 ( 0%)
cfg cleanup : 2.04 ( 1%) usr 0.00 ( 0%) sys 2.06 ( 1%)
trivially dead code : 2.33 ( 1%) usr 0.01 ( 0%) sys 2.36 ( 1%)
life analysis : 4.83 ( 2%) usr 0.02 ( 0%) sys 4.91 ( 2%)
life info update : 2.48 ( 1%) usr 0.00 ( 0%) sys 2.52 ( 1%)
alias analysis : 3.09 ( 1%) usr 0.00 ( 0%) sys 3.09 ( 1%)
register scan : 2.14 ( 1%) usr 0.02 ( 0%) sys 2.16 ( 1%)
rebuild jump labels : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.62 ( 0%)
preprocessing : 0.73 ( 0%) usr 0.08 ( 2%) sys 0.82 ( 0%)
parser : 16.75 ( 8%) usr 0.85 (19%) sys 17.88 ( 8%)
name lookup : 5.32 ( 2%) usr 1.27 (29%) sys 6.92 ( 3%)
integration : 22.29 (10%) usr 0.11 ( 3%) sys 22.61 (10%)
tree gimplify : 3.64 ( 2%) usr 0.04 ( 1%) sys 3.72 ( 2%)
tree eh : 2.73 ( 1%) usr 0.01 ( 0%) sys 2.77 ( 1%)
tree CFG construction : 1.65 ( 1%) usr 0.07 ( 2%) sys 1.75 ( 1%)
tree CFG cleanup : 2.63 ( 1%) usr 0.01 ( 0%) sys 2.64 ( 1%)
tree PTA : 0.74 ( 0%) usr 0.00 ( 0%) sys 0.74 ( 0%)
tree alias analysis : 0.95 ( 0%) usr 0.02 ( 0%) sys 0.97 ( 0%)
tree PHI insertion : 2.78 ( 1%) usr 0.05 ( 1%) sys 2.89 ( 1%)
tree SSA rewrite : 2.88 ( 1%) usr 0.01 ( 0%) sys 2.93 ( 1%)
tree SSA other : 4.51 ( 2%) usr 0.16 ( 4%) sys 4.68 ( 2%)
tree operand scan : 3.07 ( 1%) usr 0.26 ( 6%) sys 3.35 ( 2%)
dominator optimization: 11.99 ( 6%) usr 0.15 ( 3%) sys 12.26 ( 6%)
tree SRA : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%)
tree CCP : 1.89 ( 1%) usr 0.02 ( 0%) sys 1.91 ( 1%)
tree split crit edges : 0.33 ( 0%) usr 0.02 ( 0%) sys 0.35 ( 0%)
tree PRE : 6.56 ( 3%) usr 0.02 ( 0%) sys 6.70 ( 3%)
tree linearize phis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%)
tree forward propagate: 1.40 ( 1%) usr 0.00 ( 0%) sys 1.40 ( 1%)
tree conservative DCE : 2.53 ( 1%) usr 0.01 ( 0%) sys 2.54 ( 1%)
tree aggressive DCE : 1.24 ( 1%) usr 0.01 ( 0%) sys 1.28 ( 1%)
tree DSE : 2.57 ( 1%) usr 0.00 ( 0%) sys 2.59 ( 1%)
tree copy headers : 2.30 ( 1%) usr 0.05 ( 1%) sys 2.36 ( 1%)
tree SSA to normal : 3.45 ( 2%) usr 0.13 ( 3%) sys 3.63 ( 2%)
tree NRV optimization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
tree rename SSA copies: 1.20 ( 1%) usr 0.06 ( 1%) sys 1.29 ( 1%)
dominance frontiers : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%)
control dependences : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%)
expand : 20.89 (10%) usr 0.13 ( 3%) sys 21.48 (10%)
varconst : 0.56 ( 0%) usr 0.01 ( 0%) sys 0.57 ( 0%)
jump : 1.32 ( 1%) usr 0.09 ( 2%) sys 1.43 ( 1%)
CSE : 7.91 ( 4%) usr 0.02 ( 0%) sys 7.98 ( 4%)
global CSE : 5.22 ( 2%) usr 0.07 ( 2%) sys 5.41 ( 2%)
loop analysis : 1.25 ( 1%) usr 0.01 ( 0%) sys 1.27 ( 1%)
bypass jumps : 1.06 ( 0%) usr 0.01 ( 0%) sys 1.07 ( 0%)
web : 1.41 ( 1%) usr 0.04 ( 1%) sys 1.45 ( 1%)
CSE 2 : 3.30 ( 2%) usr 0.00 ( 0%) sys 3.40 ( 2%)
branch prediction : 2.16 ( 1%) usr 0.03 ( 1%) sys 2.26 ( 1%)
flow analysis : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%)
combiner : 2.84 ( 1%) usr 0.01 ( 0%) sys 2.88 ( 1%)
if-conversion : 0.56 ( 0%) usr 0.01 ( 0%) sys 0.58 ( 0%)
regmove : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.88 ( 0%)
local alloc : 2.55 ( 1%) usr 0.02 ( 0%) sys 2.61 ( 1%)
global alloc : 5.68 ( 3%) usr 0.10 ( 2%) sys 5.88 ( 3%)
reload CSE regs : 2.94 ( 1%) usr 0.00 ( 0%) sys 2.94 ( 1%)
flow 2 : 0.50 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%)
if-conversion 2 : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%)
peephole 2 : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.64 ( 0%)
rename registers : 0.96 ( 0%) usr 0.04 ( 1%) sys 1.01 ( 0%)
scheduling 2 : 4.27 ( 2%) usr 0.08 ( 2%) sys 4.41 ( 2%)
machine dep reorg : 0.91 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%)
reorder blocks : 0.49 ( 0%) usr 0.01 ( 0%) sys 0.52 ( 0%)
shorten branches : 0.78 ( 0%) usr 0.01 ( 0%) sys 0.79 ( 0%)
reg stack : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%)
final : 1.60 ( 1%) usr 0.15 ( 3%) sys 1.79 ( 1%)
symout : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%)
rest of compilation : 2.20 ( 1%) usr 0.02 ( 0%) sys 2.24 ( 1%)
TOTAL : 214.24 4.39 221.61
So, tree-ssa is actually better at optimizing tree-ssa for compiling the
tramp3d-v3.cpp testcase, at least not worse. The opposite (compiling
mainline with mainline/tree-ssa) check needs still to be performed.
Stripped binary size of cc1plus is comparable:
-rwxr-x--- 1 rguenth tat 4480828 May 5 23:57
gcc-obj-3.5/gcc/cc1plus*
-rwxr-x--- 1 rguenth tat 4501168 May 5 23:57
gcc-obj-ssa/gcc/cc1plus*
Richard.