This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 3.4 / 3.5 / tree-ssa comparisons


Andrew Pinski wrote:

On Apr 3, 2004, at 15:49, Richard Guenther wrote:


The automated tester at http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor- summary.html
completed its first 3.5 build. I never checked 3.5, and so I'm surprised on the numbers it got:


bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min), build times for the tramp3d-v3 test, too(!), I did expect them to improve compared to 3.4, not already regress again..., they are now
2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa). Also performance of the resulting binary is better(!) for 3.5 (6.9s/it) than for tree-ssa (7.68s/it) and of course 3.4 is slowest (8.85s/it). This means we'll regress in both compile and runtime if merging tree-ssa now, but we won't have a runtime regression towards 3.4 then, only a compile time performance regression.


The obvious question is, why is 3.5 so much better than 3.4? And of course, why is tree-ssa not better than 3.5 for C++ expression template numeric code?


You could check the tree-ssa with my patch at <http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00169.html>,
it should give both a runtime improvement and a compile time improvement.

Numbers with this patch applied are 62min bootstrap time, TOTAL : 151.44 3.21 154.66 before vs. TOTAL : 155.70 3.18 158.89 after applying patch build time. Runtime is 7.73s/it compared to 7.64s/it beforer. So it's not helping, but instead pessimizing slightly!?

before:
tree gimplify : 2.04 ( 1%) usr 0.02 ( 1%) sys 2.06 ( 1%) wall
tree eh : 1.33 ( 1%) usr 0.01 ( 0%) sys 1.34 ( 1%) wall
tree CFG construction : 0.77 ( 0%) usr 0.02 ( 1%) sys 0.80 ( 1%) wall
tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall
tree PTA : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall
tree alias analysis : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall
tree PHI insertion : 1.70 ( 1%) usr 0.03 ( 1%) sys 1.72 ( 1%) wall
tree SSA rewrite : 1.53 ( 1%) usr 0.00 ( 0%) sys 1.52 ( 1%) wall
tree SSA other : 2.31 ( 1%) usr 0.16 ( 5%) sys 2.54 ( 2%) wall
tree operand scan : 2.08 ( 1%) usr 0.25 ( 8%) sys 2.27 ( 1%) wall
dominator optimization: 6.37 ( 4%) usr 0.11 ( 3%) sys 6.49 ( 4%) wall
tree SRA : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
tree CCP : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall
tree split crit edges : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
tree PRE : 2.21 ( 1%) usr 0.01 ( 0%) sys 2.21 ( 1%) wall
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
tree forward propagate: 0.38 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
tree conservative DCE : 1.03 ( 1%) usr 0.00 ( 0%) sys 1.04 ( 1%) wall
tree aggressive DCE : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall
tree DSE : 0.91 ( 1%) usr 0.01 ( 0%) sys 0.91 ( 1%) wall
tree copy headers : 0.88 ( 1%) usr 0.01 ( 0%) sys 0.88 ( 1%) wall
tree SSA to normal : 1.13 ( 1%) usr 0.01 ( 0%) sys 1.16 ( 1%) wall
tree rename SSA copies: 0.35 ( 0%) usr 0.01 ( 0%) sys 0.34 ( 0%) wall
dominance frontiers : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
expand : 9.44 ( 6%) usr 0.05 ( 2%) sys 9.47 ( 6%) wall



after:
tree gimplify : 2.03 ( 1%) usr 0.02 ( 1%) sys 2.03 ( 1%) wall
tree eh : 1.31 ( 1%) usr 0.01 ( 0%) sys 1.31 ( 1%) wall
tree CFG construction : 0.74 ( 0%) usr 0.02 ( 1%) sys 0.76 ( 0%) wall
tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 0.96 ( 1%) wall
tree PTA : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
tree alias analysis : 0.39 ( 0%) usr 0.01 ( 0%) sys 0.39 ( 0%) wall
tree PHI insertion : 1.64 ( 1%) usr 0.05 ( 2%) sys 1.71 ( 1%) wall
tree SSA rewrite : 1.47 ( 1%) usr 0.02 ( 0%) sys 1.49 ( 1%) wall
tree SSA other : 2.36 ( 2%) usr 0.15 ( 5%) sys 2.48 ( 2%) wall
tree operand scan : 2.23 ( 1%) usr 0.25 ( 8%) sys 2.48 ( 2%) wall
dominator optimization: 6.44 ( 4%) usr 0.10 ( 3%) sys 6.54 ( 4%) wall
tree SRA : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
tree CCP : 0.59 ( 0%) usr 0.01 ( 0%) sys 0.60 ( 0%) wall
tree split crit edges : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
tree PRE : 1.96 ( 1%) usr 0.01 ( 0%) sys 1.96 ( 1%) wall
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
tree remove casts : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
tree forward propagate: 0.36 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall
tree conservative DCE : 1.05 ( 1%) usr 0.01 ( 0%) sys 1.06 ( 1%) wall
tree aggressive DCE : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall
tree DSE : 0.86 ( 1%) usr 0.01 ( 0%) sys 0.87 ( 1%) wall
tree copy headers : 0.82 ( 1%) usr 0.01 ( 0%) sys 0.83 ( 1%) wall
tree SSA to normal : 1.04 ( 1%) usr 0.02 ( 1%) sys 1.06 ( 1%) wall
tree rename SSA copies: 0.34 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall
dominance frontiers : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
expand : 9.26 ( 6%) usr 0.06 ( 2%) sys 9.32 ( 6%) wall


So its not a win here.

With the suggested -fno-gcse and --param max-cse-path-length=0 I get a compile time of
TOTAL : 143.30 2.99 146.30
and runtimes of 7.87s/it. With just -fno-gcse I get
TOTAL : 144.75 3.08 147.83
and 7.89s/it, with just --param max-cse-path-length=0 it's
TOTAL : 150.02 3.09 153.12
and 7.77s/it.


But maybe I'm chasing the wrong effects without enabling leafify as there are no nice loops to optimize then...

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]