[GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

Sun Oct 19 00:10:00 GMT 2014

On Sat, Oct 18, 2014 at 4:26 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> The difference in instrumentation runtime is huge -- as topn profiler
>> >> is pretty expensive to run.
>> >>
>> >> With FDO, it is probably better to make early inlining more aggressive
>> >> in order to get more context sensitive profiling.
>> >
>> > I agree with that, I just would like to understand where increasing the iterations
>> > helps and if we can handle it without iterating (because Richi originally requested to
>> > drop the iteration for correcness issues)
>> > Do you have some examples?
>>
>> We can do FDO experiment by shutting down einline. (Note that
>> increasing iteration to 2 did not actually improve performance with
>> our benchmarks).
>
> I would be more interested in case where increasing iteration to 2 actually
> improves train run perfomrance. (einline was originally invented to make
> profiling useable on tramp3d ;)
> It seems to me that the cases handled by iteration are rather rare, so I am
> suprised you get important benefit from these. Perhaps we miss something
> obvious here.

The specific case was actually a call to upper_bound in
bits/stl_algo.h with a specialized compare function. In the more
recent versions of upper_bound, the call to the comparator was
outlined into __upper_bound. With only one iteration of early
inlining, we were inlining __upper_bound into upper_bound and into the
caller. But the indirect call to the comparator was not promoted until
the fre2 pass, so it didn't get early inlined. With 2 iterations of
early inlining, enough optimization is apparently done between
iterations to propagate the actual target and promote the indirect
call after we inline __upper_bound and upper_bound that it is inlined
in the second iteration.

Thanks,
Teresa

>
> Honza
>>
>> David
>>
>> > Honza
>> >>
>> >> David
>> >>
>> >> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> >> Increasing the number of early inliner iterations from 1 to 2 enables more
>> >> >> indirect calls to be promoted/inlined before instrumentation. This in turn
>> >> >> reduces the instrumentation overhead, particularly for more expensive indirect
>> >> >> call topn profiling.
>> >> >
>> >> > How much difference you get here? One posibility would be also to run specialized
>> >> > ipa-cp before profile instrumentation.
>> >> >
>> >> > Honza
>> >> >>
>> >> >> Passes internal testing and regression tests. Ok for google/4_9?
>> >> >>
>> >> >> 2014-10-18  Teresa Johnson  <tejohnson@google.com>
>> >> >>
>> >> >>         Google ref b/17934523
>> >> >>         * opts.c (finish_options): Increase max-early-inliner-iterations to 2
>> >> >>         for profile-gen and profile-use builds.
>> >> >>
>> >> >> Index: opts.c
>> >> >> ===================================================================
>> >> >> --- opts.c      (revision 216286)
>> >> >> +++ opts.c      (working copy)
>> >> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
>> >> >>          opts->x_param_values, opts_set->x_param_values);
>> >> >>      }
>> >> >>
>> >> >> +  if (opts->x_profile_arc_flag
>> >> >> +      || opts->x_flag_branch_probabilities)
>> >> >> +    {
>> >> >> +      maybe_set_param_value
>> >> >> +       (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>> >> >> +        opts->x_param_values, opts_set->x_param_values);
>> >> >> +    }
>> >> >> +
>> >> >>    if (!(opts->x_flag_auto_profile
>> >> >>          || (opts->x_profile_arc_flag || opts->x_flag_branch_probabilities)))
>> >> >>      {
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413