This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [google] AutoFDO implementation


On Sat, Oct 6, 2012 at 10:55 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi,
>>
>> This patch implements the fine-graind AutoFDO optimizations for GCC.
>> It uses linux perf to collect sample profiles, and uses debug info to
>> represent the profile. In GCC, it uses the profile to annotate CFG to
>> drive FDO. This can bring 50% to 110% of the speedup derived by
>> traditional instrumentation based FDO. (Average is between 70% to 80%
>> for many CPU intensive applications). Comparing with traditional FDO,
>> AutoFDO does not require instrumentation. It just need to have an
>> optimized binary with debug info to collect the profile.
>>
>> This patch has passed bootstrap and gcc regression tests as well as
>> tested with crosstool. Okay for google branches?
>>
>> If people in up-stream find this feature interesting, I'll spend some
>> time to port this to trunk and try to opensource the tool to generate
>> profile data file.
>
> I think it is useful feature, yes (and was in my TODO list for quite some
> time). Unlike edge profiles, these profiles should be also more independent of
> source code/configuration changes.

Thanks for your feedback and interest. Yes, in AutoFDO the coupling
between the profiling build and fdo build are much loosen.

>
> Just few quick questions from first glance over the patch...
>>
>> Dehao
>>
>> The patch can also be viewed from:
>>
>> http://codereview.appspot.com/6567079
>>
>> gcc/ChangeLog.google-4_7:
>> 2012-09-28  Dehao Chen  <dehao@dehao.com>
>>
>> * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile.
>> (rebuild_cgraph_edges): Likewise.
>> * cgraph.c (cgraph_clone_node): Likewise.
>> (clone_function_name): Likewise.
>> * cgraph.h (cgraph_node): New field.
>> * tree-pass.h (pass_ipa_auto_profile): New pass.
>> * cfghooks.c (make_forwarder_block): Handle AutoFDO profile.
>> * ipa-inline-transform.c (clone_inlined_nodes): Likewise.
>> * toplev.c (compile_file): Likewise.
>> (process_options): Likewise.
>> * debug.h (auto_profile_debug_hooks): New.
>> * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO
>> profile.
>> (cgraph_copy_node_for_versioning): Likewise.
>> * regs.h (REG_FREQ_FROM_BB): Likewise.
>> * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New.
>> (GCOV_TAG_AFDO_FUNCTION): New.
>> (GCOV_TAG_AFDO_MODULE_GROUPING): New.
>> * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile.
>> * ipa-inline.c (edge_hot_enough_p): Likewise.
>> (edge_badness): Likewise.
>> (inline_small_functions): Likewise.
>> * dwarf2out.c (auto_profile_debug_hooks): New.
>> * opts.c (common_handle_option): Handle AutoFDO profile.
>> * timevar.def (TV_IPA_AUTOFDO): New.
>> * predict.c (compute_function_frequency): Handle AutoFDO profile.
>> (rebuild_frequencies): Handle AutoFDO profile.
>> * auto-profile.c (struct gcov_callsite_pos): New.
>> (struct gcov_callsite): New.
>> (struct gcov_stack): New.
>> (struct gcov_function): New.
>> (struct afdo_bfd_name): New.
>> (struct afdo_module): New.
>> (afdo_get_filename): New.
>> (afdo_get_original_name_size): New.
>> (afdo_get_bfd_name): New.
>> (afdo_read_bfd_names): New.
>> (afdo_stack_hash): New.
>> (afdo_stack_eq): New.
>> (afdo_function_hash): New.
>> (afdo_function_eq): New.
>> (afdo_bfd_name_hash): New.
>> (afdo_bfd_name_eq): New.
>> (afdo_bfd_name_del): New.
>> (afdo_module_hash): New.
>> (afdo_module_eq): New.
>> (afdo_module_num_strings): New.
>> (afdo_add_module): New.
>> (read_aux_modules): New.
>> (get_inline_stack_size_by_stmt): New.
>> (get_inline_stack_size_by_edge): New.
>> (get_function_name_from_block): New.
>> (get_inline_stack_by_stmt): New.
>> (get_inline_stack_by_edge): New.
>> (afdo_get_function_count): New.
>> (afdo_set_current_function_count): New.
>> (afdo_add_bfd_name_mapping): New.
>> (afdo_add_copy_scale): New.
>> (get_stack_count): New.
>> (get_stmt_count): New.
>> (afdo_get_callsite_count): New.
>> (afdo_get_bb_count): New.
>> (afdo_annotate_cfg): New.
>> (read_profile): New.
>> (process_auto_profile): New.
>> (init_auto_profile): New.
>> (end_auto_profile): New.
>> (afdo_find_equiv_class): New.
>> (afdo_propagate_single_edge): New.
>> (afdo_propagate_multi_edge): New.
>> (afdo_propagate_circuit): New.
>> (afdo_propagate): New.
>> (afdo_calculate_branch_prob): New.
>> (auto_profile): New.
>> (gate_auto_profile_ipa): New.
>> (struct simple_ipa_opt_pass): New.
>> * auto-profile.h (init_auto_profile): New.
>> (end_auto_profile): New.
>> (process_auto_profile): New.
>> (afdo_set_current_function_count): New.
>> (afdo_add_bfd_name_mapping): New.
>> (afdo_add_copy_scale): New.
>> (afdo_calculate_branch_prob): New.
>> (afdo_get_callsite_count): New.
>> (afdo_get_bb_count): New.
>> * profile.c (compute_branch_probabilities): Handle AutoFDO profile.
>> (branch_prob): Likeise.
>> * loop-unroll.c (decide_unroll_runtime_iterations): Likewise.
>> * coverage.c (coverage_init): Likewise.
>> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise.
>> * common.opt (fauto-profile): New.
>> * tree-inline.c (copy_bb): Handle AutoFDO profile.
>> (copy_edges_for_bb): Likewise.
>> (copy_cfg_body): Likewise.
>> * tree-profile.c (direct_call_profiling): Likewise.
>> (gate_tree_profile_ipa): Likewise.
>> * basic-block.h (EDGE_ANNOTATED): New field.
>> (BB_ANNOTATED): New field.
>> * tree-cfg.c (gimple_merge_blocks): Handle AutoFDO profile.
>> * passes.c (init_optimization_passes): Handle AutoFDO profile.
>
>> Index: gcc/cgraphbuild.c
>> ===================================================================
>> --- gcc/cgraphbuild.c (revision 191813)
>> +++ gcc/cgraphbuild.c (working copy)
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "except.h"
>>  #include "l-ipo.h"
>>  #include "ipa-inline.h"
>> +#include "auto-profile.h"
>>
>>  /* Context of record_reference.  */
>>  struct record_reference_ctx
>> @@ -497,6 +498,9 @@ build_cgraph_edges (void)
>>    tree decl;
>>    unsigned ix;
>>
>> +  if (flag_auto_profile)
>> +    afdo_set_current_function_count ();
>> +
>>    /* Create the callgraph edges and record the nodes referenced by the function.
>>       body.  */
>>    FOR_EACH_BB (bb)
>> @@ -607,8 +611,9 @@ rebuild_cgraph_edges (void)
>>    cgraph_node_remove_callees (node);
>>    ipa_remove_all_references (&node->ref_list);
>>
>> -  node->count = ENTRY_BLOCK_PTR->count;
>> -  node->max_bb_count = 0;
>> +  if (!flag_auto_profile)
>> +    node->count = ENTRY_BLOCK_PTR->count;
>> +  node->max_bb_count = node->count;
>
> We probably could read profile at the same time we read edge profiles avoiding
> need to maintain in across cgrpah build/rebuilds?
>> @@ -2268,6 +2276,9 @@ clone_function_name (tree decl, const char *suffix
>>    prefix[len] = '_';
>>  #endif
>>    ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>> +  if (flag_auto_profile)
>> +    afdo_add_bfd_name_mapping (xstrdup (tmp_name),
>> +                            xstrdup (lang_hooks.dwarf_name (decl, 0)));
>
> You probably want to unify this with lto_record_renamed_decl.

Got it, I'll look into that.

>> Index: gcc/cfghooks.c
>> ===================================================================
>> --- gcc/cfghooks.c    (revision 191813)
>> +++ gcc/cfghooks.c    (working copy)
>> @@ -775,6 +775,19 @@ make_forwarder_block (basic_block bb, bool (*redir
>>          }
>>      }
>>
>> +  if (flag_auto_profile)
>> +    {
>> +      dummy->frequency = 0;
>> +      dummy->count = 0;
>> +      for (ei = ei_start (dummy->preds); (e = ei_safe_edge (ei)); ei_next (&ei))
>> +     {
>> +       dummy->frequency += EDGE_FREQUENCY (e);
>> +       dummy->count += e->count;
>> +     }
>> +      if (dummy->frequency > REG_BR_PROB_BASE)
>> +     dummy->frequency = REG_BR_PROB_BASE;
>> +    }
>> +
>
> I do not see why the profiles are different here?

I wrote this a bit while ago. I'll need to look into it in more
detail, and I'll add a comment to this code.

>> @@ -478,6 +480,9 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>>  {
>>    if (cgraph_maybe_hot_edge_p (edge))
>>      return true;
>> +  if (flag_auto_profile
>> +      && maybe_hot_count_p (afdo_get_callsite_count (edge)))
>> +    return true;
>
> Why the edge counts and efdo counts are not the same?

Because for callsites that was inlined in the profiling binary, the
callsite count may not be available. So we have a
afdo_get_callsite_count function to handle this specially.

I've modified the patch a little bit to add more comment and change
some logic. I'll also have a gcc wiki page to describe the design
choices and the differences from FDO. Will send the new patch and the
link to the wiki soon.

Thanks,
Dehao

>
> Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]