Compilation performance comparison of GCC 3.4.2 and GCC 4.0.0 (041024) on MICO sources

Karel Gardas kgardas@objectsecurity.com
Tue Oct 26 23:29:00 GMT 2004


On Tue, 26 Oct 2004, Daniel Berlin wrote:

> >> What patch are we talking about here?  I've been travelling and I'm out
> >> of touch a bit more than usual.
> >
> > Nothing now, it was just reverted.
> > Go back to sleep :)
>
> Whoops, i meant to send that privately and finger frobbed it.
> Diego knows i'm joking of course.
> --Dan
>

I just would like to use Daniel's email to reply some interesting
information. Although everybody here expect that reverting some patch
solves this issue, I'm afraid it is not solved yet. I hope Daniel is
talking about:

2004-10-25  Kenneth Zadeck <zadeck@naturalbridge.com>
        * gcc/Makefile.in: removed ggc for cgraphunit.
        * gcc/cgraph.c.dump_cgraph_node: removed static var analysis.
        * gcc/cgraph.h: removed static var analysis data structures and calls.
        * gcc/cgraphunit.c: cgraph_mark_local_and_external_functions:changed name to
                cgraph_mark_local_functions
        (print_order,convert_UIDs_in_bitmap,new_static_vars_info,
        cgraph_reset_static_var_maps,get_global_static_vars_info,
        get_global_statics_not_read,get_global_statics_not_written,searchc,
        cgraph_reduced_inorder,has_proper_scope_for_analysis,check_rhs_var,
        check_lhs_var,get_asm_expr_operands,process_call_for_static_vars,
        scan_for_static_refs,cgraph_characterize_statics_local,
        clear_static_vars_maps,cgraph_propagate_bits,cgraph_characterize_statics):
        removed.
        (cgraph_optimize,init_cgraph): removed calls to static vars analysis
        * gcc/tree-dfa.c find_referenced_vars: removed call to static vars
        analysis
        * gcc/tree-flow.h static_vars_info: removed
        * gcc/tree-ssa-operands.c (add_call_clobber_ops,add_call_read_ops):
        removed calls to static vars analysis.
        get_call_expr_operands: removed callee variable.


If so, then I have some maybe interesting information for you. With new
gcc 4.0.0 20041026, I'm able to get to compilation time 67sec from 69
provided by 2 days old trunk in comparison with 49 provided by 3.4.2. So
there is still huge regression here (I'm talking about -O0 example here).
-ftime and -fmem reports for all three compilers are below. ir.cc file
preprocessed by 4.0.0 is attached to PR#13776. Also I would like to point
out that while 3.4.2 uses about 99MB RAM max, 4.0.0 goes up to 230-240MB
on my 512 MB RAM machine, so this is also memory consumption regression.

Thanks,
Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com


GCC 4.0.0 20041026:

$ c++  -ftime-report -fmem-report -I../include  -time -O0 -Wall   -DPIC -fPIC  -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           1504k        799k         35k
16          2244k       2028k         35k
32            17M       6581k        209k
64           868k        849k       8680
128          732k        731k       6588
256         4196k       4193k         32k
512         1096k       1093k       8768
1024        2368k       2366k         18k
2048         436k        434k       3488
4096         168k        164k       1344
8192          48k         48k        192
16384        224k        224k        448
32768         96k         96k         96
65536        128k        128k         64
131072        256k        256k         64
262144        256k        256k         32
1048576       1024k       1024k         32
52          9812k       5566k         95k
116           41M         28M        373k
24            19M       9045k        250k
36          2372k       1140k         25k
12          8404k       5743k        147k
40          3532k       3516k         37k
Total        117M         73M       1291k

String pool
entries         76252
identifiers     76252 (100.00%)
slots           131072
bytes           1362k (84k overhead)
table size      512k
coll/search     0.8555
ins/search      0.1407
avg. entry      18.30 bytes (+/- 20.49)
longest entry   571

??? tree nodes created

(No per-node statistics)
Type hash: size 32749, 18744 elements, 1.196057 collisions
no search statistics

Execution times (seconds)
 garbage collection    :   6.71 (10%) usr   0.06 ( 2%) sys   7.02 (10%) wall
 callgraph construction:   3.17 ( 5%) usr   0.41 (12%) sys   3.71 ( 5%) wall
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 cfg cleanup           :   0.24 ( 0%) usr   0.01 ( 0%) sys   0.24 ( 0%) wall
 trivially dead code   :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall
 life analysis         :   1.82 ( 3%) usr   0.00 ( 0%) sys   1.87 ( 3%) wall
 life info update      :   0.71 ( 1%) usr   0.01 ( 0%) sys   0.78 ( 1%) wall
 register scan         :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall
 rebuild jump labels   :   0.35 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall
 preprocessing         :   0.40 ( 1%) usr   0.25 ( 7%) sys   0.98 ( 1%) wall
 parser                :  13.15 (20%) usr   1.10 (31%) sys  14.74 (20%) wall
 name lookup           :   1.97 ( 3%) usr   0.91 (26%) sys   2.99 ( 4%) wall
 tree gimplify         :   3.03 ( 5%) usr   0.12 ( 3%) sys   3.14 ( 4%) wall
 tree eh               :   0.89 ( 1%) usr   0.00 ( 0%) sys   0.87 ( 1%) wall
 tree CFG construction :   1.94 ( 3%) usr   0.02 ( 1%) sys   2.01 ( 3%) wall
 tree CFG cleanup      :   0.90 ( 1%) usr   0.01 ( 0%) sys   0.93 ( 1%) wall
 expand                :   7.97 (12%) usr   0.09 ( 3%) sys   8.37 (11%) wall
 varconst              :   0.82 ( 1%) usr   0.03 ( 1%) sys   0.82 ( 1%) wall
 jump                  :   0.33 ( 0%) usr   0.01 ( 0%) sys   0.54 ( 1%) wall
 flow analysis         :   0.65 ( 1%) usr   0.02 ( 1%) sys   0.46 ( 1%) wall
 mode switching        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :   5.63 ( 8%) usr   0.09 ( 3%) sys   5.65 ( 8%) wall
 global alloc          :   8.24 (12%) usr   0.14 ( 4%) sys   9.26 (13%) wall
 flow 2                :   0.84 ( 1%) usr   0.01 ( 0%) sys   0.83 ( 1%) wall
 shorten branches      :   1.73 ( 3%) usr   0.01 ( 0%) sys   1.93 ( 3%) wall
 reg stack             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 final                 :   3.42 ( 5%) usr   0.13 ( 4%) sys   3.60 ( 5%) wall
 symout                :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.15 ( 0%) wall
 rest of compilation   :   1.50 ( 2%) usr   0.04 ( 1%) sys   1.46 ( 2%) wall
 TOTAL                 :  67.21             3.50            73.38
# cc1plus 67.22 3.54
# as 5.73 0.27
$


GCC 4.0.0 20041024:

$ c++  -ftime-report -fmem-report -I../include  -time -O0 -Wall   -DPIC -fPIC  -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           1572k        845k         36k
16          2740k       2494k         42k
32            17M       6520k        209k
64           872k        836k       8720
128          720k        716k       6480
256         4108k       4105k         32k
512         1068k       1067k       8544
1024        2324k       2322k         18k
2048         432k        430k       3456
4096         168k        164k       1344
8192          48k         48k        192
16384        224k        224k        448
32768         96k         96k         96
65536        128k        128k         64
131072        256k        256k         64
262144        256k        256k         32
1048576       1024k       1024k         32
52         10080k       5725k         98k
116           41M         28M        373k
24            19M       9005k        249k
36          2460k       1250k         26k
12          8636k       5651k        151k
40          3468k       3450k         37k
Total        117M         73M       1304k

String pool
entries         76246
identifiers     76246 (100.00%)
slots           131072
bytes           1362k (84k overhead)
table size      512k
coll/search     0.8489
ins/search      0.1407
avg. entry      18.30 bytes (+/- 20.49)
longest entry   571

??? tree nodes created

(No per-node statistics)
Type hash: size 32749, 18731 elements, 1.163446 collisions
no search statistics

Execution times (seconds)
 garbage collection    :   6.73 (10%) usr   0.07 ( 2%) sys   6.98 ( 9%) wall
 callgraph construction:   3.16 ( 5%) usr   0.42 (12%) sys   3.68 ( 5%) wall
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 cfg cleanup           :   0.16 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall
 trivially dead code   :   0.39 ( 1%) usr   0.03 ( 1%) sys   0.19 ( 0%) wall
 life analysis         :   1.68 ( 2%) usr   0.00 ( 0%) sys   2.03 ( 3%) wall
 life info update      :   0.74 ( 1%) usr   0.00 ( 0%) sys   0.72 ( 1%) wall
 register scan         :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 1%) wall
 rebuild jump labels   :   0.28 ( 0%) usr   0.02 ( 1%) sys   0.30 ( 0%) wall
 preprocessing         :   0.43 ( 1%) usr   0.14 ( 4%) sys   1.26 ( 2%) wall
 parser                :  12.98 (19%) usr   1.32 (36%) sys  14.79 (19%) wall
 name lookup           :   2.27 ( 3%) usr   0.74 (20%) sys   3.20 ( 4%) wall
 tree gimplify         :   3.07 ( 4%) usr   0.12 ( 3%) sys   3.05 ( 4%) wall
 tree eh               :   0.88 ( 1%) usr   0.01 ( 0%) sys   0.95 ( 1%) wall
 tree CFG construction :   2.05 ( 3%) usr   0.04 ( 1%) sys   2.21 ( 3%) wall
 tree CFG cleanup      :   0.90 ( 1%) usr   0.01 ( 0%) sys   0.89 ( 1%) wall
 expand                :   8.18 (12%) usr   0.09 ( 2%) sys   8.72 (11%) wall
 varconst              :   2.09 ( 3%) usr   0.22 ( 6%) sys   2.46 ( 3%) wall
 jump                  :   0.57 ( 1%) usr   0.01 ( 0%) sys   0.56 ( 1%) wall
 flow analysis         :   0.54 ( 1%) usr   0.02 ( 1%) sys   0.56 ( 1%) wall
 local alloc           :   5.78 ( 8%) usr   0.06 ( 2%) sys   6.07 ( 8%) wall
 global alloc          :   8.12 (12%) usr   0.13 ( 4%) sys   8.52 (11%) wall
 flow 2                :   0.79 ( 1%) usr   0.00 ( 0%) sys   0.77 ( 1%) wall
 machine dep reorg     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 shorten branches      :   1.56 ( 2%) usr   0.01 ( 0%) sys   1.85 ( 2%) wall
 reg stack             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 final                 :   3.70 ( 5%) usr   0.07 ( 2%) sys   3.66 ( 5%) wall
 symout                :   0.12 ( 0%) usr   0.03 ( 1%) sys   0.13 ( 0%) wall
 rest of compilation   :   1.53 ( 2%) usr   0.04 ( 1%) sys   1.45 ( 2%) wall
 TOTAL                 :  69.15             3.63            75.91
# cc1plus 69.16 3.66
# as 5.80 0.27
$


GCC 3.4.2:

$ c++  -ftime-report -fmem-report -I../include  -time -O0 -Wall   -DPIC -fPIC  -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8            480k        160k         10k
16          1288k        878k         18k
32          2100k       1065k         22k
64          3688k       2720k         32k
128           16k       3584         128
256         1296k        631k       9072
512         3488k       3066k         23k
1024        1000k        783k       7000
2048        8192        6144          56
4096        8192        8192          56
8192          40k         40k        140
16384         16k         16k         28
32768        352k        352k        308
131072        640k        640k        140
108           25M         23M        201k
20            30M         11M        391k
24          4172k       2341k         48k
12          3212k        348k         53k
40          4340k       2320k         42k
Total         80M         50M        863k

String pool
entries         28046
identifiers     28046 (100.00%)
slots           65536
bytes           856k (37k overhead)
table size      256k
coll/search     0.9733
ins/search      0.0577
avg. entry      31.28 bytes (+/- 30.84)
longest entry   571

??? tree nodes created

(No per-node statistics)
Type hash: size 32749, 19967 elements, 2.130366 collisions
no search statistics

Execution times (seconds)
 garbage collection    :   6.26 (13%) usr   0.00 ( 0%) sys   6.45 (12%) wall
 cfg construction      :   0.68 ( 1%) usr   0.01 ( 0%) sys   0.68 ( 1%) wall
 cfg cleanup           :   0.26 ( 1%) usr   0.01 ( 0%) sys   0.23 ( 0%) wall
 trivially dead code   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 life analysis         :   1.45 ( 3%) usr   0.00 ( 0%) sys   1.28 ( 2%) wall
 life info update      :   0.62 ( 1%) usr   0.00 ( 0%) sys   0.54 ( 1%) wall
 register scan         :   0.42 ( 1%) usr   0.02 ( 1%) sys   0.38 ( 1%) wall
 rebuild jump labels   :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall
 preprocessing         :   0.53 ( 1%) usr   0.29 ( 8%) sys   1.16 ( 2%) wall
 parser                :  15.12 (31%) usr   0.91 (27%) sys  16.47 (30%) wall
 name lookup           :   5.04 (10%) usr   1.82 (53%) sys   7.31 (13%) wall
 expand                :   3.94 ( 8%) usr   0.02 ( 1%) sys   3.99 ( 7%) wall
 varconst              :   0.55 ( 1%) usr   0.04 ( 1%) sys   0.63 ( 1%) wall
 integration           :   0.40 ( 1%) usr   0.01 ( 0%) sys   0.42 ( 1%) wall
 jump                  :   0.33 ( 1%) usr   0.04 ( 1%) sys   0.41 ( 1%) wall
 flow analysis         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 scheduling            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   2.42 ( 5%) usr   0.03 ( 1%) sys   2.49 ( 5%) wall
 global alloc          :   4.35 ( 9%) usr   0.03 ( 1%) sys   4.88 ( 9%) wall
 flow 2                :   0.63 ( 1%) usr   0.00 ( 0%) sys   0.62 ( 1%) wall
 shorten branches      :   0.87 ( 2%) usr   0.01 ( 0%) sys   1.09 ( 2%) wall
 reg stack             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 final                 :   2.74 ( 6%) usr   0.10 ( 3%) sys   2.64 ( 5%) wall
 symout                :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.15 ( 0%) wall
 rest of compilation   :   1.97 ( 4%) usr   0.05 ( 1%) sys   2.13 ( 4%) wall
 TOTAL                 :  49.14             3.42            54.70
# cc1plus 49.14 3.43
# as 4.44 0.21




More information about the Gcc mailing list