[patch] Speed up phi node insertion

Thu Aug 17 09:59:00 GMT 2006

> Zdenek Dvorak wrote on 07/30/06 06:53:
> 
> > 	PR rtl-optimization/28071
> > 	* basic-block.h (bb_dom_dfs_in, bb_dom_dfs_out): Declare.
> > 	* dominance.c (bb_dom_dfs_in, bb_dom_dfs_out): New functions.
> > 	* tree-into-ssa.c (struct dom_dfsnum): New.
> > 	(cmp_dfsnum, find_dfsnum_interval, prune_unused_phi_nodes): New
> > 	functions.
> > 	(insert_phi_nodes_for): Use prune_unused_phi_nodes instead of
> > 	compute_global_livein.
> > 	(prepare_block_for_update, prepare_use_sites_for): Mark the uses
> > 	in phi nodes in the correct blocks.
> > 
> OK.  Nice catch, thanks.

Nice ineed:

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 146456k
    Peak memory use before GGC: 95412k
    Peak memory use after GGC: 58507k
    Maximum of released memory in single GGC run: 45493k
    Garbage: 163295k
    Leak: 7142k
    Overhead: 29023k
    GGC runs: 87

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 428348k -> 430308k
    Peak memory use before GGC: 201177k
    Peak memory use after GGC: 196173k
    Maximum of released memory in single GGC run: 100203k
    Garbage: 279198k
    Leak: 47195k
    Overhead: 31459k
    GGC runs: 105

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 350296k -> 350424k
    Peak memory use before GGC: 208293k
    Peak memory use after GGC: 196536k
    Maximum of released memory in single GGC run: 101565k
    Garbage: 394891k
    Leak: 47778k
    Overhead: 49054k
    GGC runs: 111

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
  Ovarall memory allocated via mmap and sbrk decreased from 781364k to 535696k, overall -45.86%
    Overall memory needed: 781364k -> 535696k
    Peak memory use before GGC: 314602k
    Peak memory use after GGC: 292946k
    Maximum of released memory in single GGC run: 163430k
    Garbage: 494953k
    Leak: 65110k
    Overhead: 60330k
    GGC runs: 100

I wonder what to do about the PR tree-optimization/27865.  The memory
consumption seems to be more or less under control (ICC needs 200MB to
compile that, but it is 32bit binary, so peak 530MB is not bad and it is
better than any older version I tested except for 2.95 peaking at about
120MB by not inlining, but it needs inadequate compilation time).
Compilation time is not _that_ bad either.

Only remaining problems are the scheduler quadratic compilation time
(not too serius) for -O2 and the PRE memory explosion (-O3) and
compilation time (-O2) issues.  Daniel, do you plan to do something
about it in 4.2 timeframe (your patch you sent me worked well for -O2)?
Otherwise I guess we can retarged the bug to 4.3 and stop it from
holding stage3...

Honza