PATCH: [gcc3.5 improvement branch] Very Simple constant propagation

Tue Feb 3 04:12:00 GMT 2004

On Mon, 2 Feb 2004, Geoff Keating wrote:
> To match Caroline's results, you should use --enable-intermodule and
> -O3, not just a regular bootstrap.
>
> The concern is compile-time performance on large-to-very-large
> functions, like those created by intermodule inlining in SPEC.

Hi Geoff,

I'm still not seeing it.  Unfortunately, an --enable-intermodule
bootstrap with BOOT_CFLAGS="-O3" currently fails for mainline
during stage2 on i686-pc-linux-gnu due to "symbol already defined"
errors from the assembler.  However, it is possible to analyse
the behaviour of stage1/cc1 prior to this fatal failure:

Execution times (seconds)
 garbage collection    :  60.92 ( 4%) usr   0.11 ( 1%) sys  61.03 ( 4%) wall
 callgraph construction:   1.90 ( 0%) usr   0.03 ( 0%) sys   1.93 ( 0%) wall
 callgraph optimization:   0.26 ( 0%) usr   0.03 ( 0%) sys   0.29 ( 0%) wall
 cfg construction      :   3.15 ( 0%) usr   0.16 ( 1%) sys   3.28 ( 0%) wall
 cfg cleanup           :  15.10 ( 1%) usr   0.20 ( 1%) sys  15.91 ( 1%) wall
 CFG verifier          :  18.63 ( 1%) usr   0.05 ( 0%) sys  18.02 ( 1%) wall
 trivially dead code   :   9.02 ( 1%) usr   0.01 ( 0%) sys   8.51 ( 1%) wall
 life analysis         :  19.47 ( 1%) usr   0.02 ( 0%) sys  20.04 ( 1%) wall
 life info update      :  15.59 ( 1%) usr   0.02 ( 0%) sys  15.45 ( 1%) wall
 alias analysis        :   7.65 ( 1%) usr   0.01 ( 0%) sys   7.91 ( 1%) wall
 register scan         :   6.17 ( 0%) usr   0.02 ( 0%) sys   6.25 ( 0%) wall
 rebuild jump labels   :   2.28 ( 0%) usr   0.00 ( 0%) sys   2.14 ( 0%) wall
 preprocessing         :   9.39 ( 1%) usr   2.41 (17%) sys  12.80 ( 1%) wall
 lexical analysis      :   5.67 ( 0%) usr   3.65 (26%) sys   9.27 ( 1%) wall
 parser                : 881.58 (63%) usr   2.83 (20%) sys 884.32 (62%) wall
 expand                :  21.07 ( 1%) usr   0.49 ( 3%) sys  22.45 ( 2%) wall
 varconst              :   0.32 ( 0%) usr   0.14 ( 1%) sys   0.46 ( 0%) wall
 integration           :   7.16 ( 1%) usr   0.03 ( 0%) sys   6.52 ( 0%) wall
 jump                  :   3.18 ( 0%) usr   0.18 ( 1%) sys   3.44 ( 0%) wall
 CSE                   :  70.92 ( 5%) usr   0.13 ( 1%) sys  71.31 ( 5%) wall
 global CSE            :  47.07 ( 3%) usr   0.86 ( 6%) sys  47.48 ( 3%) wall
 loop analysis         :  21.16 ( 2%) usr   0.64 ( 5%) sys  21.35 ( 2%) wall
 bypass jumps          :   6.29 ( 0%) usr   0.27 ( 2%) sys   6.42 ( 0%) wall
 web                   :   8.46 ( 1%) usr   0.03 ( 0%) sys   8.37 ( 1%) wall
 CSE 2                 :  39.14 ( 3%) usr   0.09 ( 1%) sys  39.32 ( 3%) wall
 branch prediction     :   6.34 ( 0%) usr   0.03 ( 0%) sys   6.42 ( 0%) wall
 flow analysis         :   0.48 ( 0%) usr   0.01 ( 0%) sys   0.48 ( 0%) wall
 combiner              :  16.15 ( 1%) usr   0.09 ( 1%) sys  15.80 ( 1%) wall
 if-conversion         :   3.77 ( 0%) usr   0.04 ( 0%) sys   3.86 ( 0%) wall
 regmove               :   3.72 ( 0%) usr   0.00 ( 0%) sys   3.80 ( 0%) wall
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :  12.89 ( 1%) usr   0.01 ( 0%) sys  12.84 ( 1%) wall
 global alloc          :  25.78 ( 2%) usr   0.16 ( 1%) sys  25.97 ( 2%) wall
 reload CSE regs       :  12.18 ( 1%) usr   0.16 ( 1%) sys  12.36 ( 1%) wall
 flow 2                :   1.74 ( 0%) usr   0.04 ( 0%) sys   1.63 ( 0%) wall
 if-conversion 2       :   2.03 ( 0%) usr   0.00 ( 0%) sys   1.78 ( 0%) wall
 peephole 2            :   2.16 ( 0%) usr   0.05 ( 0%) sys   2.08 ( 0%) wall
 rename registers      :  10.05 ( 1%) usr   0.15 ( 1%) sys  10.57 ( 1%) wall
 scheduling 2          :  14.46 ( 1%) usr   0.11 ( 1%) sys  14.57 ( 1%) wall
 reorder blocks        :   1.79 ( 0%) usr   0.01 ( 0%) sys   2.09 ( 0%) wall
 shorten branches      :   2.71 ( 0%) usr   0.13 ( 1%) sys   2.52 ( 0%) wall
 reg stack             :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 final                 :   3.76 ( 0%) usr   0.37 ( 3%) sys   4.53 ( 0%) wall
 symout                :   0.23 ( 0%) usr   0.16 ( 1%) sys   0.38 ( 0%) wall
 rest of compilation   :   5.37 ( 0%) usr   0.14 ( 1%) sys   6.17 ( 0%) wall
 TOTAL                 :1407.84            14.20          1422.95
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --disable-checking to disable checks.

Notice that even with my patch applied, the combiner accounts for only 1%
of the compilation time, yet Caroline reports that 176.gcc somehow takes
twice as long to compile.

Perhaps this is target dependent with rs6000.md generating huge and
inefficient insn-recog finite state machines, such that recognizing
instructions in combine and GCSE takes an outrageously long time?

Another possibility are the PR's filed against IMO, that cause GCC's
builtins to become disabled between compilation units due to overwriting
their tree decls in duplicate_decls.  Perhaps if memcpy, memset and
others become disabled you'd see a catastrophic slow-down.

Could you/someone please reconfirm Caroline's timings/observations?
They just don't make any sense at all.

Roger
--