[GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use
Teresa Johnson
tejohnson@google.com
Sun Oct 19 00:10:00 GMT 2014
On Sat, Oct 18, 2014 at 4:26 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> The difference in instrumentation runtime is huge -- as topn profiler
>> >> is pretty expensive to run.
>> >>
>> >> With FDO, it is probably better to make early inlining more aggressive
>> >> in order to get more context sensitive profiling.
>> >
>> > I agree with that, I just would like to understand where increasing the iterations
>> > helps and if we can handle it without iterating (because Richi originally requested to
>> > drop the iteration for correcness issues)
>> > Do you have some examples?
>>
>> We can do FDO experiment by shutting down einline. (Note that
>> increasing iteration to 2 did not actually improve performance with
>> our benchmarks).
>
> I would be more interested in case where increasing iteration to 2 actually
> improves train run perfomrance. (einline was originally invented to make
> profiling useable on tramp3d ;)
> It seems to me that the cases handled by iteration are rather rare, so I am
> suprised you get important benefit from these. Perhaps we miss something
> obvious here.
The specific case was actually a call to upper_bound in
bits/stl_algo.h with a specialized compare function. In the more
recent versions of upper_bound, the call to the comparator was
outlined into __upper_bound. With only one iteration of early
inlining, we were inlining __upper_bound into upper_bound and into the
caller. But the indirect call to the comparator was not promoted until
the fre2 pass, so it didn't get early inlined. With 2 iterations of
early inlining, enough optimization is apparently done between
iterations to propagate the actual target and promote the indirect
call after we inline __upper_bound and upper_bound that it is inlined
in the second iteration.
Thanks,
Teresa
>
> Honza
>>
>> David
>>
>> > Honza
>> >>
>> >> David
>> >>
>> >> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> >> Increasing the number of early inliner iterations from 1 to 2 enables more
>> >> >> indirect calls to be promoted/inlined before instrumentation. This in turn
>> >> >> reduces the instrumentation overhead, particularly for more expensive indirect
>> >> >> call topn profiling.
>> >> >
>> >> > How much difference you get here? One posibility would be also to run specialized
>> >> > ipa-cp before profile instrumentation.
>> >> >
>> >> > Honza
>> >> >>
>> >> >> Passes internal testing and regression tests. Ok for google/4_9?
>> >> >>
>> >> >> 2014-10-18 Teresa Johnson <tejohnson@google.com>
>> >> >>
>> >> >> Google ref b/17934523
>> >> >> * opts.c (finish_options): Increase max-early-inliner-iterations to 2
>> >> >> for profile-gen and profile-use builds.
>> >> >>
>> >> >> Index: opts.c
>> >> >> ===================================================================
>> >> >> --- opts.c (revision 216286)
>> >> >> +++ opts.c (working copy)
>> >> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
>> >> >> opts->x_param_values, opts_set->x_param_values);
>> >> >> }
>> >> >>
>> >> >> + if (opts->x_profile_arc_flag
>> >> >> + || opts->x_flag_branch_probabilities)
>> >> >> + {
>> >> >> + maybe_set_param_value
>> >> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>> >> >> + opts->x_param_values, opts_set->x_param_values);
>> >> >> + }
>> >> >> +
>> >> >> if (!(opts->x_flag_auto_profile
>> >> >> || (opts->x_profile_arc_flag || opts->x_flag_branch_probabilities)))
>> >> >> {
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
More information about the Gcc-patches
mailing list