PATCH: [gcc3.5 improvement branch] Very Simple constant propagation
Roger Sayle
roger@eyesopen.com
Tue Feb 3 04:12:00 GMT 2004
On Mon, 2 Feb 2004, Geoff Keating wrote:
> To match Caroline's results, you should use --enable-intermodule and
> -O3, not just a regular bootstrap.
>
> The concern is compile-time performance on large-to-very-large
> functions, like those created by intermodule inlining in SPEC.
Hi Geoff,
I'm still not seeing it. Unfortunately, an --enable-intermodule
bootstrap with BOOT_CFLAGS="-O3" currently fails for mainline
during stage2 on i686-pc-linux-gnu due to "symbol already defined"
errors from the assembler. However, it is possible to analyse
the behaviour of stage1/cc1 prior to this fatal failure:
Execution times (seconds)
garbage collection : 60.92 ( 4%) usr 0.11 ( 1%) sys 61.03 ( 4%) wall
callgraph construction: 1.90 ( 0%) usr 0.03 ( 0%) sys 1.93 ( 0%) wall
callgraph optimization: 0.26 ( 0%) usr 0.03 ( 0%) sys 0.29 ( 0%) wall
cfg construction : 3.15 ( 0%) usr 0.16 ( 1%) sys 3.28 ( 0%) wall
cfg cleanup : 15.10 ( 1%) usr 0.20 ( 1%) sys 15.91 ( 1%) wall
CFG verifier : 18.63 ( 1%) usr 0.05 ( 0%) sys 18.02 ( 1%) wall
trivially dead code : 9.02 ( 1%) usr 0.01 ( 0%) sys 8.51 ( 1%) wall
life analysis : 19.47 ( 1%) usr 0.02 ( 0%) sys 20.04 ( 1%) wall
life info update : 15.59 ( 1%) usr 0.02 ( 0%) sys 15.45 ( 1%) wall
alias analysis : 7.65 ( 1%) usr 0.01 ( 0%) sys 7.91 ( 1%) wall
register scan : 6.17 ( 0%) usr 0.02 ( 0%) sys 6.25 ( 0%) wall
rebuild jump labels : 2.28 ( 0%) usr 0.00 ( 0%) sys 2.14 ( 0%) wall
preprocessing : 9.39 ( 1%) usr 2.41 (17%) sys 12.80 ( 1%) wall
lexical analysis : 5.67 ( 0%) usr 3.65 (26%) sys 9.27 ( 1%) wall
parser : 881.58 (63%) usr 2.83 (20%) sys 884.32 (62%) wall
expand : 21.07 ( 1%) usr 0.49 ( 3%) sys 22.45 ( 2%) wall
varconst : 0.32 ( 0%) usr 0.14 ( 1%) sys 0.46 ( 0%) wall
integration : 7.16 ( 1%) usr 0.03 ( 0%) sys 6.52 ( 0%) wall
jump : 3.18 ( 0%) usr 0.18 ( 1%) sys 3.44 ( 0%) wall
CSE : 70.92 ( 5%) usr 0.13 ( 1%) sys 71.31 ( 5%) wall
global CSE : 47.07 ( 3%) usr 0.86 ( 6%) sys 47.48 ( 3%) wall
loop analysis : 21.16 ( 2%) usr 0.64 ( 5%) sys 21.35 ( 2%) wall
bypass jumps : 6.29 ( 0%) usr 0.27 ( 2%) sys 6.42 ( 0%) wall
web : 8.46 ( 1%) usr 0.03 ( 0%) sys 8.37 ( 1%) wall
CSE 2 : 39.14 ( 3%) usr 0.09 ( 1%) sys 39.32 ( 3%) wall
branch prediction : 6.34 ( 0%) usr 0.03 ( 0%) sys 6.42 ( 0%) wall
flow analysis : 0.48 ( 0%) usr 0.01 ( 0%) sys 0.48 ( 0%) wall
combiner : 16.15 ( 1%) usr 0.09 ( 1%) sys 15.80 ( 1%) wall
if-conversion : 3.77 ( 0%) usr 0.04 ( 0%) sys 3.86 ( 0%) wall
regmove : 3.72 ( 0%) usr 0.00 ( 0%) sys 3.80 ( 0%) wall
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
local alloc : 12.89 ( 1%) usr 0.01 ( 0%) sys 12.84 ( 1%) wall
global alloc : 25.78 ( 2%) usr 0.16 ( 1%) sys 25.97 ( 2%) wall
reload CSE regs : 12.18 ( 1%) usr 0.16 ( 1%) sys 12.36 ( 1%) wall
flow 2 : 1.74 ( 0%) usr 0.04 ( 0%) sys 1.63 ( 0%) wall
if-conversion 2 : 2.03 ( 0%) usr 0.00 ( 0%) sys 1.78 ( 0%) wall
peephole 2 : 2.16 ( 0%) usr 0.05 ( 0%) sys 2.08 ( 0%) wall
rename registers : 10.05 ( 1%) usr 0.15 ( 1%) sys 10.57 ( 1%) wall
scheduling 2 : 14.46 ( 1%) usr 0.11 ( 1%) sys 14.57 ( 1%) wall
reorder blocks : 1.79 ( 0%) usr 0.01 ( 0%) sys 2.09 ( 0%) wall
shorten branches : 2.71 ( 0%) usr 0.13 ( 1%) sys 2.52 ( 0%) wall
reg stack : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
final : 3.76 ( 0%) usr 0.37 ( 3%) sys 4.53 ( 0%) wall
symout : 0.23 ( 0%) usr 0.16 ( 1%) sys 0.38 ( 0%) wall
rest of compilation : 5.37 ( 0%) usr 0.14 ( 1%) sys 6.17 ( 0%) wall
TOTAL :1407.84 14.20 1422.95
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --disable-checking to disable checks.
Notice that even with my patch applied, the combiner accounts for only 1%
of the compilation time, yet Caroline reports that 176.gcc somehow takes
twice as long to compile.
Perhaps this is target dependent with rs6000.md generating huge and
inefficient insn-recog finite state machines, such that recognizing
instructions in combine and GCSE takes an outrageously long time?
Another possibility are the PR's filed against IMO, that cause GCC's
builtins to become disabled between compilation units due to overwriting
their tree decls in duplicate_decls. Perhaps if memcpy, memset and
others become disabled you'd see a catastrophic slow-down.
Could you/someone please reconfirm Caroline's timings/observations?
They just don't make any sense at all.
Roger
--
More information about the Gcc-patches
mailing list