[PATCH] Add capability to run several iterations of early optimizations

Fri Oct 28 11:12:00 GMT 2011

On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> Richard,
>
> Just as Matt posted his findings about the effect of iterating early optimizations, I've got the new patch ready.  This patch is essentially a complete rewrite and addresses the comments you made.
>
> On 18/10/2011, at 9:56 PM, Richard Guenther wrote:
>
>>>>
>>>> If we'd want to iterate early optimizations we'd want to do it by iterating
>>>> an IPA pass so that we benefit from more precise size estimates
>>>> when trying to inline a function the second time.
>>>
>>> Could you elaborate on this a bit?  Early optimizations are gimple passes, so I'm missing your point here.
>>
>> pass_early_local_passes is an IPA pass, you want to iterate
>> fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better
>> inlining.  Thus you need to split pass_early_local_passes into pieces
>> so you can iterate one of the IPA pieces.
>
> Early_local_passes are now split into _main, _iter and _late parts.  To avoid changing the default case, _late part is merged into _main when no iterative optimizations are requested.
>
>>
>>>> Also statically
>>>> scheduling the passes will mess up dump files and you have no
>>>> chance of say, noticing that nothing changed for function f and its
>>>> callees in iteration N and thus you can skip processing them in
>>>> iteration N + 1.
>>>
>>> Yes, these are the shortcomings.  The dump files name changes can be fixed, e.g., by adding a suffix to the passes on iterations after the first one.  The analysis to avoid unnecessary iterations is more complex problem.
>
> To avoid changing the dump file names the patch appends "_iter" suffix to the dumps of iterative passes.
>
>>
>> Sure.  I analyzed early passes by manually duplicating them and
>> test that they do nothing for tramp3d, which they pretty much all did
>> at some point.
>>
>>>>
>>>> So, at least you should split the pass_early_local_passes IPA pass
>>>> into three, you'd iterate over the 2nd (definitely not over pass_split_functions
>>>> though), the third would be pass_profile and pass_split_functions only.
>>>> And you'd iterate from the place the 2nd IPA pass is executed, not
>>>> by scheduling them N times.
>>>
>>> OK, I will look into this.
>
> Done.
>
>>>
>>>>
>>>> Then you'd have to analyze the compile-time impact of the IPA
>>>> splitting on its own when not iterating.
>
> I decided to avoid this and keep the pass pipeline effectively the same when not running iterative optimizations.  This is achieved by scheduling pass_early_optimizations_late in different places in the pipeline depending on whether iterative optimizations are enabled or not.
>
> The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with 3 iterations enabled by default.  The only failures are 5 scan-dump tests that are due to more functions being inlined than expected.  With iterative optimizations disabled there is no change.
>
> I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance effect of the patch, and those will be posted in the same Google Docs spreadsheet in several days.
>
> OK for trunk?

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index f056d3d..4738b28 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
            tree_lowering_passes (fndecl);
            bitmap_obstack_initialize (NULL);
            if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
-             execute_pass_list (pass_early_local_passes.pass.sub);
+             execute_early_local_passes_for_current_function ();
            bitmap_obstack_release (NULL);
            pop_cfun ();
            current_function_decl = NULL;
@@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
        gimple_register_cfg_hooks ();
        bitmap_obstack_initialize (NULL);
        if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
-         execute_pass_list (pass_early_local_passes.pass.sub);
+         execute_early_local_passes_for_current_function ();

I think these should only execute the lowering pieces of early local passes,
let me see if that's properly split ...

@@ -255,7 +255,7 @@ cgraph_process_new_functions (void)
              /* When not optimizing, be sure we run early local passes anyway
                 to expand OMP.  */
              || !optimize)
-           execute_pass_list (pass_early_local_passes.pass.sub);
+           execute_early_local_passes_for_current_function ();

similar.

About all this -suffix stuff, I'd like to have the iterations simply re-use
the existing dump-files, thus statically sub-divide pass_early_local_passes
like

NEXT_PASS (pass_early_local_lowering_passes);
  {
      NEXT_PASS (pass_fixup_cfg);
      NEXT_PASS (pass_init_datastructures);
      NEXT_PASS (pass_expand_omp);

      NEXT_PASS (pass_referenced_vars);
      NEXT_PASS (pass_build_ssa);
      NEXT_PASS (pass_lower_vector);
      NEXT_PASS (pass_early_warn_uninitialized);
  }
/* The following you maybe iterate.  */
NEXT_PASS (pass_early_local_optimizations);
  {
      NEXT_PASS (pass_rebuild_cgraph_edges);
      NEXT_PASS (pass_inline_parameters);
      NEXT_PASS (pass_early_inline);
      NEXT_PASS (pass_all_early_optimizations);
        {
          struct opt_pass **p = &pass_all_early_optimizations.pass.sub;
          NEXT_PASS (pass_remove_cgraph_callee_edges);
          NEXT_PASS (pass_rename_ssa_copies);
          NEXT_PASS (pass_ccp);
          NEXT_PASS (pass_forwprop);
          /* pass_build_ealias is a dummy pass that ensures that we
             execute TODO_rebuild_alias at this point.  Re-building
             alias information also rewrites no longer addressed
             locals into SSA form if possible.  */
          NEXT_PASS (pass_build_ealias);
          NEXT_PASS (pass_sra_early);
          NEXT_PASS (pass_fre);
          NEXT_PASS (pass_copy_prop);
          NEXT_PASS (pass_merge_phi);
          NEXT_PASS (pass_cd_dce);
          NEXT_PASS (pass_early_ipa_sra);
          NEXT_PASS (pass_tail_recursion);
          NEXT_PASS (pass_convert_switch);
          NEXT_PASS (pass_cleanup_eh);
          NEXT_PASS (pass_profile);
          NEXT_PASS (pass_local_pure_const);
       }
   }
NEXT_PASS (pass_late_early_local_optimizations)
  {
          /* Split functions creates parts that are not run through
             early optimizations again.  It is thus good idea to do this
             late.  */
          NEXT_PASS (pass_split_functions);
      NEXT_PASS (pass_release_ssa_names);
      NEXT_PASS (pass_rebuild_cgraph_edges);
      NEXT_PASS (pass_inline_parameters);
  }

+++ b/gcc/toplev.c
@@ -1228,7 +1228,6 @@ general_init (const char *argv0)
   /* This must be done after global_init_params but before argument
      processing.  */
   init_ggc_heuristics();
-  init_optimization_passes ();
   statistics_early_init ();
   finish_params ();
 }
@@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv)
                  save_decoded_options, save_decoded_options_count,
                  UNKNOWN_LOCATION, global_dc);

+  init_optimization_passes ();
+
   handle_common_deferred_options ();

   init_local_tick ();

any reason for this?

+  /* Don't recurse or wonder on to the next pass when running
+     execute_ipa_pass_list below.  */
+  current->execute = NULL;
+  current->next = NULL;

that looks awkward ;)  Shouldn't instead

+    execute_ipa_pass_list (current);

not do what it does?  Thus, split that into two pieces, one that we
can iterate w/o the fiddling with current->?

I like this variant a lot better than the last one - still it lacks any
analysis-based justification for iteration (see my reply to Matt on
what I discussed with Honza).  Thus, I don't think we want to
merge this in its current form or in this stage1.

I'd also like to know _which_ early passes are worth iterating
(and if iterating is only worth if the iterated early inlining inlined
something).  I guess that goes back to the point that iterating
should be driven by the early inliner and/or  the fact whether
in the previous iteration new direct calls were discovered.

Richard.


> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>