[PATCH] Add capability to run several iterations of early optimizations
Richard Guenther
richard.guenther@gmail.com
Fri Oct 28 11:12:00 GMT 2011
On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> Richard,
>
> Just as Matt posted his findings about the effect of iterating early optimizations, I've got the new patch ready. This patch is essentially a complete rewrite and addresses the comments you made.
>
> On 18/10/2011, at 9:56 PM, Richard Guenther wrote:
>
>>>>
>>>> If we'd want to iterate early optimizations we'd want to do it by iterating
>>>> an IPA pass so that we benefit from more precise size estimates
>>>> when trying to inline a function the second time.
>>>
>>> Could you elaborate on this a bit? Early optimizations are gimple passes, so I'm missing your point here.
>>
>> pass_early_local_passes is an IPA pass, you want to iterate
>> fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better
>> inlining. Thus you need to split pass_early_local_passes into pieces
>> so you can iterate one of the IPA pieces.
>
> Early_local_passes are now split into _main, _iter and _late parts. To avoid changing the default case, _late part is merged into _main when no iterative optimizations are requested.
>
>>
>>>> Also statically
>>>> scheduling the passes will mess up dump files and you have no
>>>> chance of say, noticing that nothing changed for function f and its
>>>> callees in iteration N and thus you can skip processing them in
>>>> iteration N + 1.
>>>
>>> Yes, these are the shortcomings. The dump files name changes can be fixed, e.g., by adding a suffix to the passes on iterations after the first one. The analysis to avoid unnecessary iterations is more complex problem.
>
> To avoid changing the dump file names the patch appends "_iter" suffix to the dumps of iterative passes.
>
>>
>> Sure. I analyzed early passes by manually duplicating them and
>> test that they do nothing for tramp3d, which they pretty much all did
>> at some point.
>>
>>>>
>>>> So, at least you should split the pass_early_local_passes IPA pass
>>>> into three, you'd iterate over the 2nd (definitely not over pass_split_functions
>>>> though), the third would be pass_profile and pass_split_functions only.
>>>> And you'd iterate from the place the 2nd IPA pass is executed, not
>>>> by scheduling them N times.
>>>
>>> OK, I will look into this.
>
> Done.
>
>>>
>>>>
>>>> Then you'd have to analyze the compile-time impact of the IPA
>>>> splitting on its own when not iterating.
>
> I decided to avoid this and keep the pass pipeline effectively the same when not running iterative optimizations. This is achieved by scheduling pass_early_optimizations_late in different places in the pipeline depending on whether iterative optimizations are enabled or not.
>
> The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with 3 iterations enabled by default. The only failures are 5 scan-dump tests that are due to more functions being inlined than expected. With iterative optimizations disabled there is no change.
>
> I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance effect of the patch, and those will be posted in the same Google Docs spreadsheet in several days.
>
> OK for trunk?
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index f056d3d..4738b28 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
tree_lowering_passes (fndecl);
bitmap_obstack_initialize (NULL);
if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
- execute_pass_list (pass_early_local_passes.pass.sub);
+ execute_early_local_passes_for_current_function ();
bitmap_obstack_release (NULL);
pop_cfun ();
current_function_decl = NULL;
@@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
gimple_register_cfg_hooks ();
bitmap_obstack_initialize (NULL);
if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
- execute_pass_list (pass_early_local_passes.pass.sub);
+ execute_early_local_passes_for_current_function ();
I think these should only execute the lowering pieces of early local passes,
let me see if that's properly split ...
@@ -255,7 +255,7 @@ cgraph_process_new_functions (void)
/* When not optimizing, be sure we run early local passes anyway
to expand OMP. */
|| !optimize)
- execute_pass_list (pass_early_local_passes.pass.sub);
+ execute_early_local_passes_for_current_function ();
similar.
About all this -suffix stuff, I'd like to have the iterations simply re-use
the existing dump-files, thus statically sub-divide pass_early_local_passes
like
NEXT_PASS (pass_early_local_lowering_passes);
{
NEXT_PASS (pass_fixup_cfg);
NEXT_PASS (pass_init_datastructures);
NEXT_PASS (pass_expand_omp);
NEXT_PASS (pass_referenced_vars);
NEXT_PASS (pass_build_ssa);
NEXT_PASS (pass_lower_vector);
NEXT_PASS (pass_early_warn_uninitialized);
}
/* The following you maybe iterate. */
NEXT_PASS (pass_early_local_optimizations);
{
NEXT_PASS (pass_rebuild_cgraph_edges);
NEXT_PASS (pass_inline_parameters);
NEXT_PASS (pass_early_inline);
NEXT_PASS (pass_all_early_optimizations);
{
struct opt_pass **p = &pass_all_early_optimizations.pass.sub;
NEXT_PASS (pass_remove_cgraph_callee_edges);
NEXT_PASS (pass_rename_ssa_copies);
NEXT_PASS (pass_ccp);
NEXT_PASS (pass_forwprop);
/* pass_build_ealias is a dummy pass that ensures that we
execute TODO_rebuild_alias at this point. Re-building
alias information also rewrites no longer addressed
locals into SSA form if possible. */
NEXT_PASS (pass_build_ealias);
NEXT_PASS (pass_sra_early);
NEXT_PASS (pass_fre);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_cd_dce);
NEXT_PASS (pass_early_ipa_sra);
NEXT_PASS (pass_tail_recursion);
NEXT_PASS (pass_convert_switch);
NEXT_PASS (pass_cleanup_eh);
NEXT_PASS (pass_profile);
NEXT_PASS (pass_local_pure_const);
}
}
NEXT_PASS (pass_late_early_local_optimizations)
{
/* Split functions creates parts that are not run through
early optimizations again. It is thus good idea to do this
late. */
NEXT_PASS (pass_split_functions);
NEXT_PASS (pass_release_ssa_names);
NEXT_PASS (pass_rebuild_cgraph_edges);
NEXT_PASS (pass_inline_parameters);
}
+++ b/gcc/toplev.c
@@ -1228,7 +1228,6 @@ general_init (const char *argv0)
/* This must be done after global_init_params but before argument
processing. */
init_ggc_heuristics();
- init_optimization_passes ();
statistics_early_init ();
finish_params ();
}
@@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv)
save_decoded_options, save_decoded_options_count,
UNKNOWN_LOCATION, global_dc);
+ init_optimization_passes ();
+
handle_common_deferred_options ();
init_local_tick ();
any reason for this?
+ /* Don't recurse or wonder on to the next pass when running
+ execute_ipa_pass_list below. */
+ current->execute = NULL;
+ current->next = NULL;
that looks awkward ;) Shouldn't instead
+ execute_ipa_pass_list (current);
not do what it does? Thus, split that into two pieces, one that we
can iterate w/o the fiddling with current->?
I like this variant a lot better than the last one - still it lacks any
analysis-based justification for iteration (see my reply to Matt on
what I discussed with Honza). Thus, I don't think we want to
merge this in its current form or in this stage1.
I'd also like to know _which_ early passes are worth iterating
(and if iterating is only worth if the iterated early inlining inlined
something). I guess that goes back to the point that iterating
should be driven by the early inliner and/or the fact whether
in the previous iteration new direct calls were discovered.
Richard.
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>
More information about the Gcc-patches
mailing list