This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Huge compile time & run time performance regression 3.3 -> HEAD
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 18 May 2003 17:45:47 +0200 (CEST)
- Subject: Huge compile time & run time performance regression 3.3 -> HEAD
Hi!
As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
performance and performance of the resulting code. As always these
comparisons are for a POOMA based scientific application.
I experience a 100% compile time regression (673.50s -> 1284.48s) and
a 12% runtime performance regression (150s -> 171s) when comparing
gcc3.3 to HEAD.
Time reports follow, the most prominent regressions are expand, global CSE
(>300%!), loop analysis and branch prediction.
Compile options are -ftemplate-depth-80 -fno-exceptions -O2 -march=athlon
-funroll-loops -fomit-frame-pointer
Richard.
gcc-3.3:
Execution times (seconds)
garbage collection : 21.82 ( 3%) usr 0.01 ( 0%) sys 22.00 ( 3%)
cfg construction : 3.32 ( 1%) usr 0.08 ( 0%) sys 4.00 ( 1%)
cfg cleanup : 58.32 ( 9%) usr 0.20 ( 1%) sys 60.00 ( 9%)
trivially dead code : 9.11 ( 1%) usr 0.01 ( 0%) sys 9.00 ( 1%)
life analysis : 14.49 ( 2%) usr 0.60 ( 4%) sys 15.00 ( 2%)
life info update : 6.95 ( 1%) usr 0.53 ( 3%) sys 8.50 ( 1%)
preprocessing : 0.52 ( 0%) usr 0.24 ( 1%) sys 1.00 ( 0%)
lexical analysis : 0.31 ( 0%) usr 0.17 ( 1%) sys 1.50 ( 0%)
parser : 13.78 ( 2%) usr 0.55 ( 3%) sys 19.00 ( 3%)
name lookup : 6.88 ( 1%) usr 0.84 ( 5%) sys 4.50 ( 1%)
expand : 29.68 ( 5%) usr 1.63 (10%) sys 36.00 ( 5%)
varconst : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%)
integration : 27.52 ( 4%) usr 0.74 ( 5%) sys 27.50 ( 4%)
jump : 17.88 ( 3%) usr 0.26 ( 2%) sys 17.50 ( 3%)
CSE : 23.11 ( 4%) usr 0.05 ( 0%) sys 25.00 ( 4%)
global CSE : 108.29 (16%) usr 1.30 ( 8%) sys 108.00 (16%)
loop analysis : 206.77 (31%) usr 8.24 (51%) sys 213.50 (32%)
CSE 2 : 9.69 ( 1%) usr 0.01 ( 0%) sys 10.00 ( 1%)
branch prediction : 8.94 ( 1%) usr 0.04 ( 0%) sys 9.00 ( 1%)
flow analysis : 1.56 ( 0%) usr 0.02 ( 0%) sys 2.00 ( 0%)
combiner : 5.52 ( 1%) usr 0.07 ( 0%) sys 3.50 ( 1%)
if-conversion : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%)
regmove : 6.41 ( 1%) usr 0.00 ( 0%) sys 6.00 ( 1%)
mode switching : 0.91 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%)
local alloc : 24.06 ( 4%) usr 0.11 ( 1%) sys 23.00 ( 3%)
global alloc : 10.66 ( 2%) usr 0.11 ( 1%) sys 9.50 ( 1%)
reload CSE regs : 15.54 ( 2%) usr 0.05 ( 0%) sys 15.00 ( 2%)
flow 2 : 2.01 ( 0%) usr 0.01 ( 0%) sys 3.00 ( 0%)
if-conversion 2 : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
peephole 2 : 0.62 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%)
rename registers : 2.52 ( 0%) usr 0.03 ( 0%) sys 1.00 ( 0%)
scheduling 2 : 6.27 ( 1%) usr 0.14 ( 1%) sys 7.50 ( 1%)
reorder blocks : 0.21 ( 0%) usr 0.02 ( 0%) sys 0.50 ( 0%)
shorten branches : 1.76 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%)
reg stack : 0.53 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%)
final : 1.72 ( 0%) usr 0.15 ( 1%) sys 0.50 ( 0%)
rest of compilation : 7.72 ( 1%) usr 0.01 ( 0%) sys 6.50 ( 1%)
TOTAL : 656.71 16.26 673.50
gcc 3.4:
Execution times (seconds)
garbage collection : 25.93 ( 2%) usr 0.12 ( 0%) sys 26.04 ( 2%)
cfg construction : 4.14 ( 0%) usr 0.22 ( 0%) sys 4.06 ( 0%)
cfg cleanup : 8.45 ( 1%) usr 0.68 ( 0%) sys 9.17 ( 1%)
trivially dead code : 9.92 ( 1%) usr 0.16 ( 0%) sys 10.21 ( 1%)
life analysis : 17.45 ( 2%) usr 56.73 (31%) sys 74.21 ( 6%)
life info update : 8.87 ( 1%) usr 47.74 (26%) sys 56.72 ( 4%)
alias analysis : 18.90 ( 2%) usr 1.36 ( 1%) sys 19.57 ( 2%)
register scan : 3.63 ( 0%) usr 0.00 ( 0%) sys 3.80 ( 0%)
rebuild jump labels : 3.17 ( 0%) usr 0.00 ( 0%) sys 3.30 ( 0%)
preprocessing : 0.60 ( 0%) usr 0.24 ( 0%) sys 6.20 ( 0%)
parser : 13.69 ( 1%) usr 0.72 ( 0%) sys 14.31 ( 1%)
name lookup : 9.07 ( 1%) usr 0.72 ( 0%) sys 10.07 ( 1%)
expand : 190.76 (17%) usr 7.66 ( 4%) sys 198.53 (15%)
varconst : 0.30 ( 0%) usr 0.01 ( 0%) sys 0.31 ( 0%)
integration : 27.47 ( 3%) usr 2.63 ( 1%) sys 29.84 ( 2%)
jump : 15.55 ( 1%) usr 1.51 ( 1%) sys 17.02 ( 1%)
CSE : 20.92 ( 2%) usr 0.13 ( 0%) sys 21.00 ( 2%)
global CSE : 365.18 (33%) usr 3.40 ( 2%) sys 368.96 (29%)
loop analysis : 227.82 (21%) usr 57.81 (31%) sys 286.05 (22%)
bypass jumps : 4.69 ( 0%) usr 0.49 ( 0%) sys 5.05 ( 0%)
CSE 2 : 9.27 ( 1%) usr 0.06 ( 0%) sys 9.00 ( 1%)
branch prediction : 20.14 ( 2%) usr 1.57 ( 1%) sys 21.76 ( 2%)
flow analysis : 0.48 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%)
combiner : 6.40 ( 1%) usr 0.22 ( 0%) sys 6.81 ( 1%)
if-conversion : 0.52 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%)
regmove : 7.61 ( 1%) usr 0.01 ( 0%) sys 7.57 ( 1%)
mode switching : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%)
local alloc : 24.98 ( 2%) usr 0.27 ( 0%) sys 25.20 ( 2%)
global alloc : 12.88 ( 1%) usr 0.26 ( 0%) sys 13.24 ( 1%)
reload CSE regs : 5.70 ( 1%) usr 0.17 ( 0%) sys 5.82 ( 0%)
flow 2 : 1.18 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%)
if-conversion 2 : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%)
peephole 2 : 0.94 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%)
rename registers : 2.86 ( 0%) usr 0.07 ( 0%) sys 2.88 ( 0%)
scheduling 2 : 6.90 ( 1%) usr 0.26 ( 0%) sys 7.24 ( 1%)
reorder blocks : 0.51 ( 0%) usr 0.01 ( 0%) sys 0.64 ( 0%)
shorten branches : 1.99 ( 0%) usr 0.11 ( 0%) sys 2.17 ( 0%)
reg stack : 1.02 ( 0%) usr 0.04 ( 0%) sys 1.18 ( 0%)
final : 2.44 ( 0%) usr 0.25 ( 0%) sys 2.54 ( 0%)
rest of compilation : 10.52 ( 1%) usr 0.07 ( 0%) sys 10.56 ( 1%)
TOTAL :1093.17 185.71 1284.48