Pre-inline optimization
Richard Guenther
richard.guenther@gmail.com
Fri Jan 5 10:25:00 GMT 2007
On 1/5/07, Jan Hubicka <jh@suse.cz> wrote:
> Hi,
> this patch enables optimization before inlining. As discussed with Diego and
> Danny, at least in short term we don't want to have aliasing infromation built
> (because it is expensive to hold it for whole program) and thus we do simple
> trick of modifying tree-ssa-operands to mark all loads and stores as volatile
> statements so optimizers works only on SSA registers. In addition
> tree-tailcall is testing call cloberness that is not computed yet and thus I
> added TREE_ADDRESSABLE check that is how the code used to work before recent
> rewrites.
>
> I've benchmarked the patch on SPECint and there are just small off-noise
> differences. Compilation time seems unchanged, that match my benchmarks on C++
> (ie I got 562->561s for SPECint build time and 171->172s for SPECfp build time,
> similarly I got one second saving on bootstrap time on checking disabled
> compiler. Situation is wrose with checking enabled since verify_ssa expense
> shows).
>
> Performance is generally unchanged too, but this is expected: I do not updated
> computing of inlining parameters so optimizations are not taken into account at
> all for inlining decisions. Only possible callgraph simplification by
> cleanup_cfg might affect inliner. On IPA branch I measured little changes on
> SPEC -O3 with early inlining too, however with intermodule the benefits was
> already measurable: 7-15% code size savings and 1.8-2.5% speedups. This is
> because our inliner seems already good enough for simple C programs, I hope
> this to show up in practice for C++ even before LTO arrives, since about any
> clue helps here. Off-noise seems 2.4% speedup on EON, 1% speedup on MGRID and
> 0.4% slowdown on ART.
>
> Compile time peak memory savings seems to be somewhere in between 1-4%
> depending on the testcase: the early optimization is done at the SSA build time
> so we don't need to store whole unit in unoptimized SSA form that is costy.
>
> For tramp3d I got 26% compilation time speedup, but it might be caused by
> slight swapping the non-early optimizing compiler does on my setup.
> Binarry looks almost identical.
>
> However this is just entry point to pre-inline optimization. What I would like
> to do progressively next is:
>
> 1) Make inliner size esitmates computed after optimizing and retune inliner
> 2) Experiment with more passes (FRE should be nice, perhaps VRP or DOM
> or some of loop optimizations
> I know that FRE breaks with current setup, I am not sure yet whether
> it is bug in it's handling of volatile flag or it depends on virtual
> operands some tricky way. But it is reason why I didn't removed PROP_alias
> requirement from all the passes, just from ones I tested.
I wonder how you came to the current set of passes and their ordering. I
also wonder if we can adjust some of the initial scalar cleanup passes now.
You have
+ NEXT_PASS (pass_rename_ssa_copies);
+ NEXT_PASS (pass_ccp);
+ NEXT_PASS (pass_forwprop);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_merge_phi);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_tail_recursion);
and in initial scalar cleanups we have
/* Initial scalar cleanups. */
NEXT_PASS (pass_ccp);
NEXT_PASS (pass_fre);
NEXT_PASS (pass_dce);
NEXT_PASS (pass_forwprop);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_vrp);
NEXT_PASS (pass_dce);
NEXT_PASS (pass_dominator);
I wonder why you need two copyprop passes added? I would also
suggest changing the ccp and copy_prop passes in the initial
scalar cleanup section to the store_ccp and store_copy_prop
variants and re-ordering the forwprop pass there until after the
first dominator pass and remove the first invocation of dce.
What kind of propagation does the inliner now do? I remember
something about const and copy propagation?
Do we need the second tail recursion pass?
I know it's always easy to add passes and very hard to argue we can
remove/re-order some - but the point of adding some should be
the easiest time to argue to remove/re-order some others ;)
Also maybe PR23346 is fixed by your patch?
Thanks,
Richard.
More information about the Gcc-patches
mailing list