This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] Removal of gotos from cfg based ir


> On Fri, 2003-11-14 at 10:20, Jan Hubicka wrote:
> > > On Thu, 2003-11-13 at 19:37, Zdenek Dvorak wrote:
> > > > Hello,
> 
> > I do have the RTL expansion pass that use CFG (to preserve profile
> > mainly).  In longer run, I would like to enter RTL already in cfglayout
> > mode, so we avoid need to invent/kill the gotos on the way.
> > 
> 
> That means you will have to build a CFG in order to generate code...
> even with no optimization on, or you need expanders to can't deal with
> either version of the IL.

Hi,
I went ahead and did some testing with my CFG based expansion.
I modified tree-optimize.c to do:
  /* Invoke the SSA tree optimizer.  */
  if (optimize >= 1 && !flag_disable_tree_ssa)
    optimize_function_tree (fndecl, &chain);
  else
    {
      /* Build the flowgraph.  */
      init_flow ();

      build_tree_cfg (&chain);
      cleanup_tree_cfg ();
      cfg_remove_useless_stmts ();
    }
And continue by common path expanding from CFG.
I tested it on compiling preprocessed files of C frontend.  The timmings without the patch are:

Size   Allocated        Used    Overhead
8            912k        704k         20k
16           780k        242k         11k
32          1212k        508k         13k
64          3192k       2547k         28k
128         8192        5376          64 
256         1968k       1913k         13k
512         1328k       1297k       9296 
1024        2548k       2516k         17k
2048        8192        4096          56 
16384         64k         64k        112 
32768        320k        320k        280 
65536         64k         64k         28 
131072        128k        128k         28 
116           26M         24M        211k
24            10M       4467k        128k
12          2488k        561k         41k
40          1676k        976k         16k
Total         53M         40M        511k

String pool
entries		29592
identifiers	29592 (100.00%)
slots		65536
bytes		337k (32k overhead)
table size	256k
coll/search	0.4203
ins/search	0.0395
avg. entry	11.67 bytes (+/- 7.05)
longest entry	62

??? tree nodes created

(No per-node statistics)
Type hash: size 32749, 15320 elements, 2.008041 collisions

Execution times (seconds)
 garbage collection    :   1.61 ( 7%) usr   0.01 ( 0%) sys   1.63 ( 6%) wall
 cfg construction      :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 cfg cleanup           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 trivially dead code   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 life analysis         :   0.61 ( 3%) usr   0.00 ( 0%) sys   0.57 ( 2%) wall
 life info update      :   0.25 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall
 register scan         :   0.21 ( 1%) usr   0.01 ( 0%) sys   0.20 ( 1%) wall
 rebuild jump labels   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 preprocessing         :   2.02 ( 8%) usr   0.69 (20%) sys   2.80 (10%) wall
 lexical analysis      :   1.99 ( 8%) usr   1.57 (46%) sys   3.37 (12%) wall
 parser                :   7.76 (32%) usr   0.86 (25%) sys   9.26 (33%) wall
 integration           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 tree gimplify         :   1.29 ( 5%) usr   0.01 ( 0%) sys   1.22 ( 4%) wall
 tree eh               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 expand                :   1.15 ( 5%) usr   0.04 ( 1%) sys   1.01 ( 4%) wall
 varconst              :   0.05 ( 0%) usr   0.02 ( 1%) sys   0.12 ( 0%) wall
 jump                  :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall
 flow analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 local alloc           :   0.97 ( 4%) usr   0.00 ( 0%) sys   0.97 ( 3%) wall
 global alloc          :   3.11 (13%) usr   0.02 ( 1%) sys   3.19 (11%) wall
 flow 2                :   0.23 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall
 machine dep reorg     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 shorten branches      :   0.41 ( 2%) usr   0.00 ( 0%) sys   0.37 ( 1%) wall
 final                 :   0.69 ( 3%) usr   0.02 ( 1%) sys   0.79 ( 3%) wall
 symout                :   0.04 ( 0%) usr   0.11 ( 3%) sys   0.09 ( 0%) wall
 rest of compilation   :   0.99 ( 4%) usr   0.00 ( 0%) sys   1.06 ( 4%) wall
 TOTAL                 :  24.23             3.41            28.26

And with the patch:
Size   Allocated        Used    Overhead
8            920k        711k         20k
16           736k        260k         10k
32          1284k        505k         13k
64          3160k       2541k         27k
128         8192        5376          64 
256         1956k       1913k         13k
512         1328k       1297k       9296 
1024        2536k       2516k         17k
2048        8192        4096          56 
16384         64k         64k        112 
32768        320k        320k        280 
65536         64k         64k         28 
131072        128k        128k         28 
116           26M         24M        213k
24            10M       4467k        126k
12          2552k        617k         42k
40          1764k       1013k         17k
Total         53M         40M        512k

String pool
entries		29592
identifiers	29592 (100.00%)
slots		65536
bytes		337k (32k overhead)
table size	256k
coll/search	0.4095
ins/search	0.0399
avg. entry	11.67 bytes (+/- 7.05)
longest entry	62

??? tree nodes created

(No per-node statistics)
Type hash: size 32749, 15320 elements, 2.165665 collisions

Execution times (seconds)
 garbage collection    :   1.61 ( 7%) usr   0.01 ( 0%) sys   1.62 ( 6%) wall
 cfg construction      :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 cfg cleanup           :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 trivially dead code   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 life analysis         :   0.60 ( 2%) usr   0.01 ( 0%) sys   0.62 ( 2%) wall
 life info update      :   0.27 ( 1%) usr   0.00 ( 0%) sys   0.32 ( 1%) wall
 register scan         :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 rebuild jump labels   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 preprocessing         :   1.92 ( 8%) usr   0.67 (20%) sys   2.90 (10%) wall
 lexical analysis      :   2.06 ( 9%) usr   1.62 (48%) sys   3.45 (12%) wall
 parser                :   7.87 (33%) usr   0.82 (24%) sys   9.39 (33%) wall
 integration           :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall
 tree gimplify         :   1.18 ( 5%) usr   0.01 ( 0%) sys   1.17 ( 4%) wall
 tree eh               :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall
 tree CFG construction :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 tree CFG cleanup      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 expand                :   1.25 ( 5%) usr   0.01 ( 0%) sys   1.26 ( 4%) wall
 varconst              :   0.08 ( 0%) usr   0.03 ( 1%) sys   0.07 ( 0%) wall
 jump                  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 flow analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 mode switching        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 scheduling            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :   0.89 ( 4%) usr   0.00 ( 0%) sys   0.82 ( 3%) wall
 global alloc          :   3.01 (12%) usr   0.02 ( 1%) sys   3.13 (11%) wall
 flow 2                :   0.26 ( 1%) usr   0.00 ( 0%) sys   0.25 ( 1%) wall
 shorten branches      :   0.37 ( 2%) usr   0.00 ( 0%) sys   0.42 ( 1%) wall
 reg stack             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 final                 :   0.88 ( 4%) usr   0.02 ( 1%) sys   0.80 ( 3%) wall
 symout                :   0.11 ( 0%) usr   0.05 ( 1%) sys   0.14 ( 0%) wall
 rest of compilation   :   0.86 ( 4%) usr   0.01 ( 0%) sys   0.84 ( 3%) wall
 TOTAL                 :  24.16             3.41            28.41

As you can se the results are even (the difference is lost in noise) The 0.19
seconds of CFG construction and cleanup pays back by the savings of RTL garbage
we produce.  (I wonder why one run of tree cfg_cleanup takes more time than 3
runs of RTL cfg_cleanup)

Note that this is very perliminary result and I expect it to get significantly
better.  At the moment I do not use the CFG produced at RTL as it gets
invalidated by early RTL manipulation passes so the extra time spent by
updating it for RTL are lost (you may expect 0.10 secs of CFG cleanups to go
away)

Iti s also important that instead of 130000 lines of assembly we produce 121000
lines (7%).  I would expect also pass doing trivially dead code removal on
trees (without requring SSA form) would make a lot of sense too.  We may then
end up avoiding need to re-do it at RTL level (RTL expansion does not introduce
that many dead insns by itself) saving one cfgcleanup invocation too.

And finally you can remove a lot of code from expanders.  So in the end, I
would expect the compiler to be actually more noticeably faster when building
CFG early even in non-optimizing compilation.  (and of course we get the
profile and kill RTL CFG builder in foreseable future too)

Both runs allocate 40MB of memory (with my patch we get slightly more), but
with Zdenek's changes I would expect the CFG based run to get less memory by
avoiding need for gotos (I guess we lose memory here by lowering function body
and adding GOTOs to both arms of condionals)

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]