This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [tree-ssa] Removal of gotos from cfg based ir
> On Fri, 2003-11-14 at 10:20, Jan Hubicka wrote:
> > > On Thu, 2003-11-13 at 19:37, Zdenek Dvorak wrote:
> > > > Hello,
>
> > I do have the RTL expansion pass that use CFG (to preserve profile
> > mainly). In longer run, I would like to enter RTL already in cfglayout
> > mode, so we avoid need to invent/kill the gotos on the way.
> >
>
> That means you will have to build a CFG in order to generate code...
> even with no optimization on, or you need expanders to can't deal with
> either version of the IL.
Hi,
I went ahead and did some testing with my CFG based expansion.
I modified tree-optimize.c to do:
/* Invoke the SSA tree optimizer. */
if (optimize >= 1 && !flag_disable_tree_ssa)
optimize_function_tree (fndecl, &chain);
else
{
/* Build the flowgraph. */
init_flow ();
build_tree_cfg (&chain);
cleanup_tree_cfg ();
cfg_remove_useless_stmts ();
}
And continue by common path expanding from CFG.
I tested it on compiling preprocessed files of C frontend. The timmings without the patch are:
Size Allocated Used Overhead
8 912k 704k 20k
16 780k 242k 11k
32 1212k 508k 13k
64 3192k 2547k 28k
128 8192 5376 64
256 1968k 1913k 13k
512 1328k 1297k 9296
1024 2548k 2516k 17k
2048 8192 4096 56
16384 64k 64k 112
32768 320k 320k 280
65536 64k 64k 28
131072 128k 128k 28
116 26M 24M 211k
24 10M 4467k 128k
12 2488k 561k 41k
40 1676k 976k 16k
Total 53M 40M 511k
String pool
entries 29592
identifiers 29592 (100.00%)
slots 65536
bytes 337k (32k overhead)
table size 256k
coll/search 0.4203
ins/search 0.0395
avg. entry 11.67 bytes (+/- 7.05)
longest entry 62
??? tree nodes created
(No per-node statistics)
Type hash: size 32749, 15320 elements, 2.008041 collisions
Execution times (seconds)
garbage collection : 1.61 ( 7%) usr 0.01 ( 0%) sys 1.63 ( 6%) wall
cfg construction : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
cfg cleanup : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
trivially dead code : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
life analysis : 0.61 ( 3%) usr 0.00 ( 0%) sys 0.57 ( 2%) wall
life info update : 0.25 ( 1%) usr 0.00 ( 0%) sys 0.24 ( 1%) wall
register scan : 0.21 ( 1%) usr 0.01 ( 0%) sys 0.20 ( 1%) wall
rebuild jump labels : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
preprocessing : 2.02 ( 8%) usr 0.69 (20%) sys 2.80 (10%) wall
lexical analysis : 1.99 ( 8%) usr 1.57 (46%) sys 3.37 (12%) wall
parser : 7.76 (32%) usr 0.86 (25%) sys 9.26 (33%) wall
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
tree gimplify : 1.29 ( 5%) usr 0.01 ( 0%) sys 1.22 ( 4%) wall
tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
expand : 1.15 ( 5%) usr 0.04 ( 1%) sys 1.01 ( 4%) wall
varconst : 0.05 ( 0%) usr 0.02 ( 1%) sys 0.12 ( 0%) wall
jump : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall
flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
local alloc : 0.97 ( 4%) usr 0.00 ( 0%) sys 0.97 ( 3%) wall
global alloc : 3.11 (13%) usr 0.02 ( 1%) sys 3.19 (11%) wall
flow 2 : 0.23 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 1%) wall
machine dep reorg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
shorten branches : 0.41 ( 2%) usr 0.00 ( 0%) sys 0.37 ( 1%) wall
final : 0.69 ( 3%) usr 0.02 ( 1%) sys 0.79 ( 3%) wall
symout : 0.04 ( 0%) usr 0.11 ( 3%) sys 0.09 ( 0%) wall
rest of compilation : 0.99 ( 4%) usr 0.00 ( 0%) sys 1.06 ( 4%) wall
TOTAL : 24.23 3.41 28.26
And with the patch:
Size Allocated Used Overhead
8 920k 711k 20k
16 736k 260k 10k
32 1284k 505k 13k
64 3160k 2541k 27k
128 8192 5376 64
256 1956k 1913k 13k
512 1328k 1297k 9296
1024 2536k 2516k 17k
2048 8192 4096 56
16384 64k 64k 112
32768 320k 320k 280
65536 64k 64k 28
131072 128k 128k 28
116 26M 24M 213k
24 10M 4467k 126k
12 2552k 617k 42k
40 1764k 1013k 17k
Total 53M 40M 512k
String pool
entries 29592
identifiers 29592 (100.00%)
slots 65536
bytes 337k (32k overhead)
table size 256k
coll/search 0.4095
ins/search 0.0399
avg. entry 11.67 bytes (+/- 7.05)
longest entry 62
??? tree nodes created
(No per-node statistics)
Type hash: size 32749, 15320 elements, 2.165665 collisions
Execution times (seconds)
garbage collection : 1.61 ( 7%) usr 0.01 ( 0%) sys 1.62 ( 6%) wall
cfg construction : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall
cfg cleanup : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
trivially dead code : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
life analysis : 0.60 ( 2%) usr 0.01 ( 0%) sys 0.62 ( 2%) wall
life info update : 0.27 ( 1%) usr 0.00 ( 0%) sys 0.32 ( 1%) wall
register scan : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
rebuild jump labels : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
preprocessing : 1.92 ( 8%) usr 0.67 (20%) sys 2.90 (10%) wall
lexical analysis : 2.06 ( 9%) usr 1.62 (48%) sys 3.45 (12%) wall
parser : 7.87 (33%) usr 0.82 (24%) sys 9.39 (33%) wall
integration : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall
tree gimplify : 1.18 ( 5%) usr 0.01 ( 0%) sys 1.17 ( 4%) wall
tree eh : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall
tree CFG construction : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
tree CFG cleanup : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
expand : 1.25 ( 5%) usr 0.01 ( 0%) sys 1.26 ( 4%) wall
varconst : 0.08 ( 0%) usr 0.03 ( 1%) sys 0.07 ( 0%) wall
jump : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
mode switching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
scheduling : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
local alloc : 0.89 ( 4%) usr 0.00 ( 0%) sys 0.82 ( 3%) wall
global alloc : 3.01 (12%) usr 0.02 ( 1%) sys 3.13 (11%) wall
flow 2 : 0.26 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 1%) wall
shorten branches : 0.37 ( 2%) usr 0.00 ( 0%) sys 0.42 ( 1%) wall
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
final : 0.88 ( 4%) usr 0.02 ( 1%) sys 0.80 ( 3%) wall
symout : 0.11 ( 0%) usr 0.05 ( 1%) sys 0.14 ( 0%) wall
rest of compilation : 0.86 ( 4%) usr 0.01 ( 0%) sys 0.84 ( 3%) wall
TOTAL : 24.16 3.41 28.41
As you can se the results are even (the difference is lost in noise) The 0.19
seconds of CFG construction and cleanup pays back by the savings of RTL garbage
we produce. (I wonder why one run of tree cfg_cleanup takes more time than 3
runs of RTL cfg_cleanup)
Note that this is very perliminary result and I expect it to get significantly
better. At the moment I do not use the CFG produced at RTL as it gets
invalidated by early RTL manipulation passes so the extra time spent by
updating it for RTL are lost (you may expect 0.10 secs of CFG cleanups to go
away)
Iti s also important that instead of 130000 lines of assembly we produce 121000
lines (7%). I would expect also pass doing trivially dead code removal on
trees (without requring SSA form) would make a lot of sense too. We may then
end up avoiding need to re-do it at RTL level (RTL expansion does not introduce
that many dead insns by itself) saving one cfgcleanup invocation too.
And finally you can remove a lot of code from expanders. So in the end, I
would expect the compiler to be actually more noticeably faster when building
CFG early even in non-optimizing compilation. (and of course we get the
profile and kill RTL CFG builder in foreseable future too)
Both runs allocate 40MB of memory (with my patch we get slightly more), but
with Zdenek's changes I would expect the CFG based run to get less memory by
avoiding need for gotos (I guess we lose memory here by lowering function body
and adding GOTOs to both arms of condionals)
Honza