Pre-inline optimization

Fri Jan 5 10:25:00 GMT 2007

On 1/5/07, Jan Hubicka <jh@suse.cz> wrote:
> Hi,
> this patch enables optimization before inlining.  As discussed with Diego and
> Danny, at least in short term we don't want to have aliasing infromation built
> (because it is expensive to hold it for whole program) and thus we do simple
> trick of modifying tree-ssa-operands to mark all loads and stores as volatile
> statements so optimizers works only on SSA registers.  In addition
> tree-tailcall is testing call cloberness that is not computed yet and thus I
> added TREE_ADDRESSABLE check that is how the code used to work before recent
> rewrites.
>
> I've benchmarked the patch on SPECint and there are just small off-noise
> differences. Compilation time seems unchanged, that match my benchmarks on C++
> (ie I got 562->561s for SPECint build time and 171->172s for SPECfp build time,
> similarly I got one second saving on bootstrap time on checking disabled
> compiler.  Situation is wrose with checking enabled since verify_ssa expense
> shows).
>
> Performance is generally unchanged too, but this is expected: I do not updated
> computing of inlining parameters so optimizations are not taken into account at
> all for inlining decisions.  Only possible callgraph simplification by
> cleanup_cfg might affect inliner.  On IPA branch I measured little changes on
> SPEC -O3 with early inlining too, however with intermodule the benefits was
> already measurable: 7-15% code size savings and 1.8-2.5% speedups.  This is
> because our inliner seems already good enough for simple C programs, I hope
> this to show up in practice for C++ even before LTO arrives, since about any
> clue helps here.  Off-noise seems 2.4% speedup on EON, 1% speedup on MGRID and
> 0.4% slowdown on ART.
>
> Compile time peak memory savings seems to be somewhere in between 1-4%
> depending on the testcase: the early optimization is done at the SSA build time
> so we don't need to store whole unit in unoptimized SSA form that is costy.
>
> For tramp3d I got 26% compilation time speedup, but it might be caused by
> slight swapping the non-early optimizing compiler does on my setup.
> Binarry looks almost identical.
>
> However this is just entry point to pre-inline optimization.  What I would like
> to do progressively next is:
>
>  1) Make inliner size esitmates computed after optimizing and retune inliner
>  2) Experiment with more passes (FRE should be nice, perhaps VRP or DOM
>     or some of loop optimizations
>     I know that FRE breaks with current setup, I am not sure yet whether
>     it is bug in it's handling of volatile flag or it depends on virtual
>     operands some tricky way.  But it is reason why I didn't removed PROP_alias
>     requirement from all the passes, just from ones I tested.

I wonder how you came to the current set of passes and their ordering.  I
also wonder if we can adjust some of the initial scalar cleanup passes now.
You have

+   NEXT_PASS (pass_rename_ssa_copies);
+   NEXT_PASS (pass_ccp);
+   NEXT_PASS (pass_forwprop);
+   NEXT_PASS (pass_copy_prop);
+   NEXT_PASS (pass_merge_phi);
+   NEXT_PASS (pass_copy_prop);
+   NEXT_PASS (pass_dce);
+   NEXT_PASS (pass_tail_recursion);

and in initial scalar cleanups we have

  /* Initial scalar cleanups.  */
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_fre);
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_forwprop);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_vrp);
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_dominator);

I wonder why you need two copyprop passes added?  I would also
suggest changing the ccp and copy_prop passes in the initial
scalar cleanup section to the store_ccp and store_copy_prop
variants and re-ordering the forwprop pass there until after the
first dominator pass and remove the first invocation of dce.

What kind of propagation does the inliner now do?  I remember
something about const and copy propagation?

Do we need the second tail recursion pass?

I know it's always easy to add passes and very hard to argue we can
remove/re-order some - but the point of adding some should be
the easiest time to argue to remove/re-order some others ;)

Also maybe PR23346 is fixed by your patch?

Thanks,
Richard.