This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 3.4 / 3.5 / tree-ssa comparisons
Andrew Pinski wrote:
On Apr 3, 2004, at 15:49, Richard Guenther wrote:
The automated tester at
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor- summary.html
completed its first 3.5 build. I never checked 3.5, and so I'm
surprised on the numbers it got:
bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min),
build times for the tramp3d-v3 test, too(!), I did expect them to
improve compared to 3.4, not already regress again..., they are now
2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa). Also performance
of the resulting binary is better(!) for 3.5 (6.9s/it) than for
tree-ssa (7.68s/it) and of course 3.4 is slowest (8.85s/it). This
means we'll regress in both compile and runtime if merging tree-ssa
now, but we won't have a runtime regression towards 3.4 then, only a
compile time performance regression.
The obvious question is, why is 3.5 so much better than 3.4? And of
course, why is tree-ssa not better than 3.5 for C++ expression
template numeric code?
You could check the tree-ssa with my patch at
<http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00169.html>,
it should give both a runtime improvement and a compile time improvement.
Numbers with this patch applied are 62min bootstrap time,
TOTAL : 151.44 3.21 154.66
before vs.
TOTAL : 155.70 3.18 158.89
after applying patch build time.
Runtime is 7.73s/it compared to 7.64s/it beforer.
So it's not helping, but instead pessimizing slightly!?
before:
tree gimplify : 2.04 ( 1%) usr 0.02 ( 1%) sys 2.06 ( 1%)
wall
tree eh : 1.33 ( 1%) usr 0.01 ( 0%) sys 1.34 ( 1%)
wall
tree CFG construction : 0.77 ( 0%) usr 0.02 ( 1%) sys 0.80 ( 1%)
wall
tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 1.00 ( 1%)
wall
tree PTA : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%)
wall
tree alias analysis : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%)
wall
tree PHI insertion : 1.70 ( 1%) usr 0.03 ( 1%) sys 1.72 ( 1%)
wall
tree SSA rewrite : 1.53 ( 1%) usr 0.00 ( 0%) sys 1.52 ( 1%)
wall
tree SSA other : 2.31 ( 1%) usr 0.16 ( 5%) sys 2.54 ( 2%)
wall
tree operand scan : 2.08 ( 1%) usr 0.25 ( 8%) sys 2.27 ( 1%)
wall
dominator optimization: 6.37 ( 4%) usr 0.11 ( 3%) sys 6.49 ( 4%)
wall
tree SRA : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%)
wall
tree CCP : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%)
wall
tree split crit edges : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%)
wall
tree PRE : 2.21 ( 1%) usr 0.01 ( 0%) sys 2.21 ( 1%)
wall
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
wall
tree forward propagate: 0.38 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%)
wall
tree conservative DCE : 1.03 ( 1%) usr 0.00 ( 0%) sys 1.04 ( 1%)
wall
tree aggressive DCE : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%)
wall
tree DSE : 0.91 ( 1%) usr 0.01 ( 0%) sys 0.91 ( 1%)
wall
tree copy headers : 0.88 ( 1%) usr 0.01 ( 0%) sys 0.88 ( 1%)
wall
tree SSA to normal : 1.13 ( 1%) usr 0.01 ( 0%) sys 1.16 ( 1%)
wall
tree rename SSA copies: 0.35 ( 0%) usr 0.01 ( 0%) sys 0.34 ( 0%)
wall
dominance frontiers : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%)
wall
control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%)
wall
expand : 9.44 ( 6%) usr 0.05 ( 2%) sys 9.47 ( 6%)
wall
after:
tree gimplify : 2.03 ( 1%) usr 0.02 ( 1%) sys 2.03 ( 1%)
wall
tree eh : 1.31 ( 1%) usr 0.01 ( 0%) sys 1.31 ( 1%)
wall
tree CFG construction : 0.74 ( 0%) usr 0.02 ( 1%) sys 0.76 ( 0%)
wall
tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 0.96 ( 1%)
wall
tree PTA : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%)
wall
tree alias analysis : 0.39 ( 0%) usr 0.01 ( 0%) sys 0.39 ( 0%)
wall
tree PHI insertion : 1.64 ( 1%) usr 0.05 ( 2%) sys 1.71 ( 1%)
wall
tree SSA rewrite : 1.47 ( 1%) usr 0.02 ( 0%) sys 1.49 ( 1%)
wall
tree SSA other : 2.36 ( 2%) usr 0.15 ( 5%) sys 2.48 ( 2%)
wall
tree operand scan : 2.23 ( 1%) usr 0.25 ( 8%) sys 2.48 ( 2%)
wall
dominator optimization: 6.44 ( 4%) usr 0.10 ( 3%) sys 6.54 ( 4%)
wall
tree SRA : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%)
wall
tree CCP : 0.59 ( 0%) usr 0.01 ( 0%) sys 0.60 ( 0%)
wall
tree split crit edges : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%)
wall
tree PRE : 1.96 ( 1%) usr 0.01 ( 0%) sys 1.96 ( 1%)
wall
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
wall
tree remove casts : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%)
wall
tree forward propagate: 0.36 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%)
wall
tree conservative DCE : 1.05 ( 1%) usr 0.01 ( 0%) sys 1.06 ( 1%)
wall
tree aggressive DCE : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%)
wall
tree DSE : 0.86 ( 1%) usr 0.01 ( 0%) sys 0.87 ( 1%)
wall
tree copy headers : 0.82 ( 1%) usr 0.01 ( 0%) sys 0.83 ( 1%)
wall
tree SSA to normal : 1.04 ( 1%) usr 0.02 ( 1%) sys 1.06 ( 1%)
wall
tree rename SSA copies: 0.34 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%)
wall
dominance frontiers : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%)
wall
control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%)
wall
expand : 9.26 ( 6%) usr 0.06 ( 2%) sys 9.32 ( 6%)
wall
So its not a win here.
With the suggested -fno-gcse and --param max-cse-path-length=0 I get a
compile time of
TOTAL : 143.30 2.99 146.30
and runtimes of 7.87s/it. With just -fno-gcse I get
TOTAL : 144.75 3.08 147.83
and 7.89s/it, with just --param max-cse-path-length=0 it's
TOTAL : 150.02 3.09 153.12
and 7.77s/it.
But maybe I'm chasing the wrong effects without enabling leafify as
there are no nice loops to optimize then...
Richard.