This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Compilation performance comparison of GCC 3.4.2 and GCC 4.0.0(041024) on MICO sources
- From: Karel Gardas <kgardas at objectsecurity dot com>
- To: Diego Novillo <dnovillo at redhat dot com>, Jan Hubicka <hubicka at ucw dot cz>, Steven Bosscher <stevenb at suse dot de>, GCC Mailing List <gcc at gcc dot gnu dot org>, Daniel Berlin <dberlin at dberlin dot org>
- Date: Wed, 27 Oct 2004 00:36:03 +0200 (CEST)
- Subject: Re: Compilation performance comparison of GCC 3.4.2 and GCC 4.0.0(041024) on MICO sources
On Tue, 26 Oct 2004, Daniel Berlin wrote:
> >> What patch are we talking about here? I've been travelling and I'm out
> >> of touch a bit more than usual.
> >
> > Nothing now, it was just reverted.
> > Go back to sleep :)
>
> Whoops, i meant to send that privately and finger frobbed it.
> Diego knows i'm joking of course.
> --Dan
>
I just would like to use Daniel's email to reply some interesting
information. Although everybody here expect that reverting some patch
solves this issue, I'm afraid it is not solved yet. I hope Daniel is
talking about:
2004-10-25 Kenneth Zadeck <zadeck@naturalbridge.com>
* gcc/Makefile.in: removed ggc for cgraphunit.
* gcc/cgraph.c.dump_cgraph_node: removed static var analysis.
* gcc/cgraph.h: removed static var analysis data structures and calls.
* gcc/cgraphunit.c: cgraph_mark_local_and_external_functions:changed name to
cgraph_mark_local_functions
(print_order,convert_UIDs_in_bitmap,new_static_vars_info,
cgraph_reset_static_var_maps,get_global_static_vars_info,
get_global_statics_not_read,get_global_statics_not_written,searchc,
cgraph_reduced_inorder,has_proper_scope_for_analysis,check_rhs_var,
check_lhs_var,get_asm_expr_operands,process_call_for_static_vars,
scan_for_static_refs,cgraph_characterize_statics_local,
clear_static_vars_maps,cgraph_propagate_bits,cgraph_characterize_statics):
removed.
(cgraph_optimize,init_cgraph): removed calls to static vars analysis
* gcc/tree-dfa.c find_referenced_vars: removed call to static vars
analysis
* gcc/tree-flow.h static_vars_info: removed
* gcc/tree-ssa-operands.c (add_call_clobber_ops,add_call_read_ops):
removed calls to static vars analysis.
get_call_expr_operands: removed callee variable.
If so, then I have some maybe interesting information for you. With new
gcc 4.0.0 20041026, I'm able to get to compilation time 67sec from 69
provided by 2 days old trunk in comparison with 49 provided by 3.4.2. So
there is still huge regression here (I'm talking about -O0 example here).
-ftime and -fmem reports for all three compilers are below. ir.cc file
preprocessed by 4.0.0 is attached to PR#13776. Also I would like to point
out that while 3.4.2 uses about 99MB RAM max, 4.0.0 goes up to 230-240MB
on my 512 MB RAM machine, so this is also memory consumption regression.
Thanks,
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
GCC 4.0.0 20041026:
$ c++ -ftime-report -fmem-report -I../include -time -O0 -Wall -DPIC -fPIC -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size Allocated Used Overhead
8 1504k 799k 35k
16 2244k 2028k 35k
32 17M 6581k 209k
64 868k 849k 8680
128 732k 731k 6588
256 4196k 4193k 32k
512 1096k 1093k 8768
1024 2368k 2366k 18k
2048 436k 434k 3488
4096 168k 164k 1344
8192 48k 48k 192
16384 224k 224k 448
32768 96k 96k 96
65536 128k 128k 64
131072 256k 256k 64
262144 256k 256k 32
1048576 1024k 1024k 32
52 9812k 5566k 95k
116 41M 28M 373k
24 19M 9045k 250k
36 2372k 1140k 25k
12 8404k 5743k 147k
40 3532k 3516k 37k
Total 117M 73M 1291k
String pool
entries 76252
identifiers 76252 (100.00%)
slots 131072
bytes 1362k (84k overhead)
table size 512k
coll/search 0.8555
ins/search 0.1407
avg. entry 18.30 bytes (+/- 20.49)
longest entry 571
??? tree nodes created
(No per-node statistics)
Type hash: size 32749, 18744 elements, 1.196057 collisions
no search statistics
Execution times (seconds)
garbage collection : 6.71 (10%) usr 0.06 ( 2%) sys 7.02 (10%) wall
callgraph construction: 3.17 ( 5%) usr 0.41 (12%) sys 3.71 ( 5%) wall
callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
cfg cleanup : 0.24 ( 0%) usr 0.01 ( 0%) sys 0.24 ( 0%) wall
trivially dead code : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
life analysis : 1.82 ( 3%) usr 0.00 ( 0%) sys 1.87 ( 3%) wall
life info update : 0.71 ( 1%) usr 0.01 ( 0%) sys 0.78 ( 1%) wall
register scan : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.34 ( 0%) wall
rebuild jump labels : 0.35 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall
preprocessing : 0.40 ( 1%) usr 0.25 ( 7%) sys 0.98 ( 1%) wall
parser : 13.15 (20%) usr 1.10 (31%) sys 14.74 (20%) wall
name lookup : 1.97 ( 3%) usr 0.91 (26%) sys 2.99 ( 4%) wall
tree gimplify : 3.03 ( 5%) usr 0.12 ( 3%) sys 3.14 ( 4%) wall
tree eh : 0.89 ( 1%) usr 0.00 ( 0%) sys 0.87 ( 1%) wall
tree CFG construction : 1.94 ( 3%) usr 0.02 ( 1%) sys 2.01 ( 3%) wall
tree CFG cleanup : 0.90 ( 1%) usr 0.01 ( 0%) sys 0.93 ( 1%) wall
expand : 7.97 (12%) usr 0.09 ( 3%) sys 8.37 (11%) wall
varconst : 0.82 ( 1%) usr 0.03 ( 1%) sys 0.82 ( 1%) wall
jump : 0.33 ( 0%) usr 0.01 ( 0%) sys 0.54 ( 1%) wall
flow analysis : 0.65 ( 1%) usr 0.02 ( 1%) sys 0.46 ( 1%) wall
mode switching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
local alloc : 5.63 ( 8%) usr 0.09 ( 3%) sys 5.65 ( 8%) wall
global alloc : 8.24 (12%) usr 0.14 ( 4%) sys 9.26 (13%) wall
flow 2 : 0.84 ( 1%) usr 0.01 ( 0%) sys 0.83 ( 1%) wall
shorten branches : 1.73 ( 3%) usr 0.01 ( 0%) sys 1.93 ( 3%) wall
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
final : 3.42 ( 5%) usr 0.13 ( 4%) sys 3.60 ( 5%) wall
symout : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.15 ( 0%) wall
rest of compilation : 1.50 ( 2%) usr 0.04 ( 1%) sys 1.46 ( 2%) wall
TOTAL : 67.21 3.50 73.38
# cc1plus 67.22 3.54
# as 5.73 0.27
$
GCC 4.0.0 20041024:
$ c++ -ftime-report -fmem-report -I../include -time -O0 -Wall -DPIC -fPIC -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size Allocated Used Overhead
8 1572k 845k 36k
16 2740k 2494k 42k
32 17M 6520k 209k
64 872k 836k 8720
128 720k 716k 6480
256 4108k 4105k 32k
512 1068k 1067k 8544
1024 2324k 2322k 18k
2048 432k 430k 3456
4096 168k 164k 1344
8192 48k 48k 192
16384 224k 224k 448
32768 96k 96k 96
65536 128k 128k 64
131072 256k 256k 64
262144 256k 256k 32
1048576 1024k 1024k 32
52 10080k 5725k 98k
116 41M 28M 373k
24 19M 9005k 249k
36 2460k 1250k 26k
12 8636k 5651k 151k
40 3468k 3450k 37k
Total 117M 73M 1304k
String pool
entries 76246
identifiers 76246 (100.00%)
slots 131072
bytes 1362k (84k overhead)
table size 512k
coll/search 0.8489
ins/search 0.1407
avg. entry 18.30 bytes (+/- 20.49)
longest entry 571
??? tree nodes created
(No per-node statistics)
Type hash: size 32749, 18731 elements, 1.163446 collisions
no search statistics
Execution times (seconds)
garbage collection : 6.73 (10%) usr 0.07 ( 2%) sys 6.98 ( 9%) wall
callgraph construction: 3.16 ( 5%) usr 0.42 (12%) sys 3.68 ( 5%) wall
callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
cfg cleanup : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall
trivially dead code : 0.39 ( 1%) usr 0.03 ( 1%) sys 0.19 ( 0%) wall
life analysis : 1.68 ( 2%) usr 0.00 ( 0%) sys 2.03 ( 3%) wall
life info update : 0.74 ( 1%) usr 0.00 ( 0%) sys 0.72 ( 1%) wall
register scan : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 1%) wall
rebuild jump labels : 0.28 ( 0%) usr 0.02 ( 1%) sys 0.30 ( 0%) wall
preprocessing : 0.43 ( 1%) usr 0.14 ( 4%) sys 1.26 ( 2%) wall
parser : 12.98 (19%) usr 1.32 (36%) sys 14.79 (19%) wall
name lookup : 2.27 ( 3%) usr 0.74 (20%) sys 3.20 ( 4%) wall
tree gimplify : 3.07 ( 4%) usr 0.12 ( 3%) sys 3.05 ( 4%) wall
tree eh : 0.88 ( 1%) usr 0.01 ( 0%) sys 0.95 ( 1%) wall
tree CFG construction : 2.05 ( 3%) usr 0.04 ( 1%) sys 2.21 ( 3%) wall
tree CFG cleanup : 0.90 ( 1%) usr 0.01 ( 0%) sys 0.89 ( 1%) wall
expand : 8.18 (12%) usr 0.09 ( 2%) sys 8.72 (11%) wall
varconst : 2.09 ( 3%) usr 0.22 ( 6%) sys 2.46 ( 3%) wall
jump : 0.57 ( 1%) usr 0.01 ( 0%) sys 0.56 ( 1%) wall
flow analysis : 0.54 ( 1%) usr 0.02 ( 1%) sys 0.56 ( 1%) wall
local alloc : 5.78 ( 8%) usr 0.06 ( 2%) sys 6.07 ( 8%) wall
global alloc : 8.12 (12%) usr 0.13 ( 4%) sys 8.52 (11%) wall
flow 2 : 0.79 ( 1%) usr 0.00 ( 0%) sys 0.77 ( 1%) wall
machine dep reorg : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
shorten branches : 1.56 ( 2%) usr 0.01 ( 0%) sys 1.85 ( 2%) wall
reg stack : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
final : 3.70 ( 5%) usr 0.07 ( 2%) sys 3.66 ( 5%) wall
symout : 0.12 ( 0%) usr 0.03 ( 1%) sys 0.13 ( 0%) wall
rest of compilation : 1.53 ( 2%) usr 0.04 ( 1%) sys 1.45 ( 2%) wall
TOTAL : 69.15 3.63 75.91
# cc1plus 69.16 3.66
# as 5.80 0.27
$
GCC 3.4.2:
$ c++ -ftime-report -fmem-report -I../include -time -O0 -Wall -DPIC -fPIC -c ir.cc -o ir.pic.o
Memory still allocated at the end of the compilation process
Size Allocated Used Overhead
8 480k 160k 10k
16 1288k 878k 18k
32 2100k 1065k 22k
64 3688k 2720k 32k
128 16k 3584 128
256 1296k 631k 9072
512 3488k 3066k 23k
1024 1000k 783k 7000
2048 8192 6144 56
4096 8192 8192 56
8192 40k 40k 140
16384 16k 16k 28
32768 352k 352k 308
131072 640k 640k 140
108 25M 23M 201k
20 30M 11M 391k
24 4172k 2341k 48k
12 3212k 348k 53k
40 4340k 2320k 42k
Total 80M 50M 863k
String pool
entries 28046
identifiers 28046 (100.00%)
slots 65536
bytes 856k (37k overhead)
table size 256k
coll/search 0.9733
ins/search 0.0577
avg. entry 31.28 bytes (+/- 30.84)
longest entry 571
??? tree nodes created
(No per-node statistics)
Type hash: size 32749, 19967 elements, 2.130366 collisions
no search statistics
Execution times (seconds)
garbage collection : 6.26 (13%) usr 0.00 ( 0%) sys 6.45 (12%) wall
cfg construction : 0.68 ( 1%) usr 0.01 ( 0%) sys 0.68 ( 1%) wall
cfg cleanup : 0.26 ( 1%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall
trivially dead code : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
life analysis : 1.45 ( 3%) usr 0.00 ( 0%) sys 1.28 ( 2%) wall
life info update : 0.62 ( 1%) usr 0.00 ( 0%) sys 0.54 ( 1%) wall
register scan : 0.42 ( 1%) usr 0.02 ( 1%) sys 0.38 ( 1%) wall
rebuild jump labels : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
preprocessing : 0.53 ( 1%) usr 0.29 ( 8%) sys 1.16 ( 2%) wall
parser : 15.12 (31%) usr 0.91 (27%) sys 16.47 (30%) wall
name lookup : 5.04 (10%) usr 1.82 (53%) sys 7.31 (13%) wall
expand : 3.94 ( 8%) usr 0.02 ( 1%) sys 3.99 ( 7%) wall
varconst : 0.55 ( 1%) usr 0.04 ( 1%) sys 0.63 ( 1%) wall
integration : 0.40 ( 1%) usr 0.01 ( 0%) sys 0.42 ( 1%) wall
jump : 0.33 ( 1%) usr 0.04 ( 1%) sys 0.41 ( 1%) wall
flow analysis : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
scheduling : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
local alloc : 2.42 ( 5%) usr 0.03 ( 1%) sys 2.49 ( 5%) wall
global alloc : 4.35 ( 9%) usr 0.03 ( 1%) sys 4.88 ( 9%) wall
flow 2 : 0.63 ( 1%) usr 0.00 ( 0%) sys 0.62 ( 1%) wall
shorten branches : 0.87 ( 2%) usr 0.01 ( 0%) sys 1.09 ( 2%) wall
reg stack : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
final : 2.74 ( 6%) usr 0.10 ( 3%) sys 2.64 ( 5%) wall
symout : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.15 ( 0%) wall
rest of compilation : 1.97 ( 4%) usr 0.05 ( 1%) sys 2.13 ( 4%) wall
TOTAL : 49.14 3.42 54.70
# cc1plus 49.14 3.43
# as 4.44 0.21