This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] AutoFDO patch for trunk


The updated patch attached. Will commit the patch in 2~3 hours if no
objection is received.

Thanks,
Dehao

On Sun, Oct 19, 2014 at 2:58 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> +/* Member functions for string_table.  */
>> >> +
>> >> +string_table *
>> >> +string_table::create ()
>> >
>> > Why this is not a constructor?
>>
>> We use static initializer because it's not suggested to put too much
>> logic in constructor.
>
> Why not? :)

You're right. Updated to use constructor instead.

>> >> +    }
>> >
>> > The two hunks probably can be unified.
>> > Why get_index_by_decl does not already use dwarf name and why do you need to consider both?
>>
>> get_index_by_decl is actually a wrapper of get_index. It tolerates
>> error when assembler name is not emitted in the debug binary (mostly
>> for functions that are fully inlined). In this case, we will first try
>> to find the index of the assembler name, if not found, then bfd name.
>> If still not found, then we try to find name from its abstract
>> location.
>
> Yep, I am just somewhat concerned that you need to try different variations of the name.
> I would expect the assembler name (minus the random seed mangling) to be sufficient.

In theory, assembler name should be good enough. But in practice, we
still see that for some functions, assembler name is not available.
This function purely exists to tolerate faults like this. And should
be simplified once debug info becomes perfect.

>>
>> >
>> >> +  if (DECL_ABSTRACT_ORIGIN (decl))
>> >> +    return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN (decl));
>> >
>> > What really happens for ipa-cp function clones and for split functions?
>>
>> Their name suffix will be stripped before matching names.
>
> I see and profile merged, that seems resonable.
>> > I am not sure it is a win to have the things represented as VPT histograms when you shoot
>> > for quite special purpose problem.  But lets deal with these incrementally, too.
>> > (in general I do not see why you try to share so much with the gcov based VTP - it seems
>> > easier to do the speculation by hand)
>>
>> Basically there are 2 problem:
>>
>> * before annotation. I agree this is a special purpose problem
>> * after annotation. This should be the same problem as VPT.
>
> Yep, I think adding VPT histograms to get devirtualization happen inter-procedurally
> with LTO is a good idea.
> You will need (incrementally I guess) to translate your indexes into the profile-id used
> or you will need to initialize profile-ids accordingly, so the IPA pass is able to identify
> the target cross-module.

Sure, let's do this incrementally. But for the symbol look up, we
actually use func_name instead of profile-id. So I guess we don't have
the problem you mentioned here? (except that function name my
collide).

>> >> +{
>> >> +  if (gcov_open (auto_profile_file, 1) == 0)
>> >> +    error ("Cannot open profile file %s.", auto_profile_file);
>> >> +
>> >> +  if (gcov_read_unsigned () != GCOV_DATA_MAGIC)
>> >> +    error ("AutoFDO profile magic number does not mathch.");
>> >> +
>> >> +  /* Skip the version number.  */
>> >> +  gcov_read_unsigned ();
>> >> +
>> >> +  /* Skip the empty integer.  */
>> >> +  gcov_read_unsigned ();
>> >
>> > Perhaps we can just check the values?
>>
>> What do you mean? Check the value against 0?
>
> Yes, if you have version in there it seems to make sense to check it ;)

Done

> I guess it is there so you can add extra stuff into the file format that will make
> it incompatible with current one, so probably we should reject all versions greater
> than 0. (especially because the tool is off-tree)
>> >
>> > Does this have chance to work inter-module?
>>
>> Yes, it works fine with LIPO.
>
> Important (for me) is to make it work with LTO, because LIPO is apparently not landing to 5.0.

I think we only need to handle function name colliding case. This
could be simply be resolved by coupling function name with file name.

>>
>> Yes, this could be done.
>>
>> After a second thought, I think we actually need to expose einliner
>> interface to autofdo (instead of doing vpt in early inliner). This is
>> because we need to mark icall as "promoted", so that later vpt passes
>> will not try to promote it again. Currently in AutoFDO, we use a
>> self-contained way (use a "set" to mark all promoted stmts). If we do
>> vpt in einline, then we will need some flag attached to gimple to
>> indicate whether it's already promoted.
>
> If we manage to remove the iteration loop that repeativly applies the inline plan, you only
> need way to mark edge as promoted.  I guess you can just test the speculative flag on the edge
> to skip ones you dealt with earlier.
>
> Lets do that incrementally though.

Agree, let's do it incrementally.

>> +
>> +@item -fauto-profile
>> +@itemx -fauto-profile=@var{path}
>> +@opindex fauto-profile
>> +Enable sampling based feedback directed optimizations, and optimizations
>> +generally profitable only with profile feedback available.
>> +
>> +The following options are enabled: @code{-fbranch-probabilities}, @code{-fvpt},
>> +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, @code{-ftree-vectorize},
>> +@code{ftree-loop-distribute-patterns}
>
> I wonder about loop peeling - we do not enale it by default because we do not believe
> our static branch predictor can well identify loops that have low expected trip count
> (except ones that can be fully inlined).
> I am not sure auto-FDO is much better here - how reliable the info about number of iteratoins
> is?

With discriminator, the loop iter estimate is quite accurate. so
loop-peeling should be turned on.

>
> But this is again something that can be handled separately.
>
> Documentation seems out of date though - it seems to enable also
> -finline, -fipa-cp and cloning, predictive commoning, unswitching, gcse-after-reload,
>
> You do not want to enable reorder-functions because autoFDO does not include the time profiler.
> Also I think speculative devirtualization should not be disabled.

Why reorder function should be disabled? In google branch reorder
function actually buys us a lot. We use both prefix based reordering
and call graph based reordering.

>> +
>> +If @var{path} is specified, GCC looks at the @var{path} to find
>> +the profile feedback data files.
>> +
>> +In order to collect AutoFDO profile, you need to have:
>> +
>> +1. A linux system with linux perf support
>> +2. An Intel processor with last branch record (LBR) support
>
> Is it really a requirement or will it only make profiles more accurate?

It's optional, updated the doc.

>> +
>> +To collect the profile, first use linux perf to collect raw profile
>> +(see @uref{https://perf.wiki.kernel.org/}).
>> +
>> +E.g.
>> +perf record -e br_inst_retired:near_taken -b -o perf.data -- your_program
>
> You need some @ stuff to mark the example.

Done.
>> +
>> +Then use create_gcov tool, which takes raw profile and unstripped binary to
>> +generate AutoFDO profile that can be used by GCC.
>> +(see @uref{https://github.com/google/autofdo}).
>> +
>> +E.g.
>> +create_gcov --binary=your_program.unstripped --profile=perf.data --gcov=profile.afdo
>
> Same here.

Done.
>
> I see the AutoFDO tool is under apache licence.  I wonder if we can not share it in tree
> like we do for some other dependencies (asan) for easier use.

I'm not an expert on license. But it'd be great if someone can
volunteer to move the tool into GCC.

>
>> Index: gcc/opts.c
>> ===================================================================
>> --- gcc/opts.c        (revision 215826)
>> +++ gcc/opts.c        (working copy)
>> @@ -1279,6 +1279,57 @@ print_specific_help (unsigned int include_flags,
>>                      opts->x_help_columns, opts, lang_mask);
>>  }
>>
>> +/* Enable FDO-related flags.  */
>> +
>> +static void
>> +enable_fdo_optimizations (struct gcc_options *opts,
>> +                       struct gcc_options *opts_set,
>> +                       int value)
>> +{
>> +  if (!opts_set->x_flag_branch_probabilities)
>> +    opts->x_flag_branch_probabilities = value;
>> +  if (!opts_set->x_flag_profile_values)
>> +    opts->x_flag_profile_values = value;
>> +  if (!opts_set->x_flag_unroll_loops)
>> +    opts->x_flag_unroll_loops = value;
>> +  if (!opts_set->x_flag_peel_loops)
>> +    opts->x_flag_peel_loops = value;
>> +  if (!opts_set->x_flag_tracer)
>> +    opts->x_flag_tracer = value;
>> +  if (!opts_set->x_flag_value_profile_transformations)
>> +    opts->x_flag_value_profile_transformations = value;
>> +  if (!opts_set->x_flag_inline_functions)
>> +    opts->x_flag_inline_functions = value;
>> +  if (!opts_set->x_flag_ipa_cp)
>> +    opts->x_flag_ipa_cp = value;
>> +  if (!opts_set->x_flag_ipa_cp_clone
>> +      && value && opts->x_flag_ipa_cp)
>> +    opts->x_flag_ipa_cp_clone = value;
>> +  if (!opts_set->x_flag_predictive_commoning)
>> +    opts->x_flag_predictive_commoning = value;
>> +  if (!opts_set->x_flag_unswitch_loops)
>> +    opts->x_flag_unswitch_loops = value;
>> +  if (!opts_set->x_flag_gcse_after_reload)
>> +    opts->x_flag_gcse_after_reload = value;
>> +  if (!opts_set->x_flag_tree_loop_vectorize
>> +      && !opts_set->x_flag_tree_vectorize)
>> +    opts->x_flag_tree_loop_vectorize = value;
>> +  if (!opts_set->x_flag_tree_slp_vectorize
>> +      && !opts_set->x_flag_tree_vectorize)
>> +    opts->x_flag_tree_slp_vectorize = value;
>> +  if (!opts_set->x_flag_vect_cost_model)
>> +    opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC;
>> +  if (!opts_set->x_flag_tree_loop_distribute_patterns)
>> +    opts->x_flag_tree_loop_distribute_patterns = value;
>> +  if (!opts_set->x_flag_profile_reorder_functions)
>> +    opts->x_flag_profile_reorder_functions = value;
>> +  /* Indirect call profiling should do all useful transformations
>> +     speculative devirtualization does.  */
>> +  if (!opts_set->x_flag_devirtualize_speculatively
>> +      && opts->x_flag_value_profile_transformations)
>> +    opts->x_flag_devirtualize_speculatively = false;
> Please move x_flag_devirtualize_speculatively and x_flag_profile_reorder_functions
> to -fprofile-use only.

Done.

>> +}
>> +
>>  /* Handle target- and language-independent options.  Return zero to
>>     generate an "unknown option" message.  Only options that need
>>     extra handling need to be listed here; if you simply want
>> @@ -1746,50 +1797,23 @@ common_handle_option (struct gcc_options *opts,
>>        value = true;
>>        /* No break here - do -fprofile-use processing. */
>>      case OPT_fprofile_use:
>> -      if (!opts_set->x_flag_branch_probabilities)
>> -     opts->x_flag_branch_probabilities = value;
>> -      if (!opts_set->x_flag_profile_values)
>> -     opts->x_flag_profile_values = value;
>> -      if (!opts_set->x_flag_unroll_loops)
>> -     opts->x_flag_unroll_loops = value;
>> -      if (!opts_set->x_flag_peel_loops)
>> -     opts->x_flag_peel_loops = value;
>> -      if (!opts_set->x_flag_tracer)
>> -     opts->x_flag_tracer = value;
>> -      if (!opts_set->x_flag_value_profile_transformations)
>> -     opts->x_flag_value_profile_transformations = value;
>> -      if (!opts_set->x_flag_inline_functions)
>> -     opts->x_flag_inline_functions = value;
>> -      if (!opts_set->x_flag_ipa_cp)
>> -     opts->x_flag_ipa_cp = value;
>> -      if (!opts_set->x_flag_ipa_cp_clone
>> -       && value && opts->x_flag_ipa_cp)
>> -     opts->x_flag_ipa_cp_clone = value;
>> -      if (!opts_set->x_flag_predictive_commoning)
>> -     opts->x_flag_predictive_commoning = value;
>> -      if (!opts_set->x_flag_unswitch_loops)
>> -     opts->x_flag_unswitch_loops = value;
>> -      if (!opts_set->x_flag_gcse_after_reload)
>> -     opts->x_flag_gcse_after_reload = value;
>> -      if (!opts_set->x_flag_tree_loop_vectorize
>> -          && !opts_set->x_flag_tree_vectorize)
>> -     opts->x_flag_tree_loop_vectorize = value;
>> -      if (!opts_set->x_flag_tree_slp_vectorize
>> -          && !opts_set->x_flag_tree_vectorize)
>> -     opts->x_flag_tree_slp_vectorize = value;
>> -      if (!opts_set->x_flag_vect_cost_model)
>> -     opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC;
>> -      if (!opts_set->x_flag_tree_loop_distribute_patterns)
>> -     opts->x_flag_tree_loop_distribute_patterns = value;
>> -      if (!opts_set->x_flag_profile_reorder_functions)
>> -     opts->x_flag_profile_reorder_functions = value;
>> -      /* Indirect call profiling should do all useful transformations
>> -      speculative devirtualization does.  */
>> -      if (!opts_set->x_flag_devirtualize_speculatively
>> -       && opts->x_flag_value_profile_transformations)
>> -     opts->x_flag_devirtualize_speculatively = false;
>> +      enable_fdo_optimizations (opts, opts_set, value);
>>        break;
>>
>> +    case OPT_fauto_profile_:
>> +      opts->x_auto_profile_file = xstrdup (arg);
>> +      opts->x_flag_auto_profile = true;
>> +      value = true;
>> +      /* No break here - do -fauto-profile processing. */
>> +    case OPT_fauto_profile:
>> +      enable_fdo_optimizations (opts, opts_set, value);
>> +      if (!opts_set->x_flag_profile_correction)
>> +     opts->x_flag_profile_correction = value;
>> +      maybe_set_param_value (
>> +     PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>> +     opts->x_param_values, opts_set->x_param_values);
>> +      break;
>> +
>>      case OPT_fprofile_generate_:
>>        opts->x_profile_data_prefix = xstrdup (arg);
>>        value = true;
>
> Patch is OK with these changes. Please also update news.html and perhaps even
> news on the main page? It is quite nice user visible change.

Where is news.html?

Thanks,
Dehao
>
> Honza

Attachment: patch.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]