[google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
Sriraman Tallam
tmsriram@google.com
Wed Jun 8 16:41:00 GMT 2011
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li <davidxl@google.com> wrote:
> ok for google/main.
Thanks, the patch is now committed.
>
> David
>
> On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> +davidxl
>>
>> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Patch Description:
>>> =================
>>>
>>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>>>
>>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>>>
>>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>>>
>>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
>>> ****************************
>>> .section .note.callgraph.text._Z3foov,"",@progbits
>>> .string "Function _Z3foov"
>>> .string "_Z3barv"
>>> .string "100"
>>> .string "_Z3zapv"
>>> .string "50"
>>> ***************************
>>>
>>> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>>>
>>> Google ref 41940
>>>
>>> 2011-06-07 Sriraman Tallam <tmsriram@google.com>
>>>
>>> * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>>> * final.c (dump_cgraph_profiles): New function.
>>> (rest_of_handle_final): Create new section '.note.callgraph.text'
>>> with compiler flag -fcallgraph-profiles-sections
>>> * common.opt: New option -fcallgraph-profiles-sections.
>>> * params.def (DEFPARAM): New param
>>> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>>>
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi (revision 174789)
>>> +++ doc/invoke.texi (working copy)
>>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>>> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>>> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>>> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
>>> --fcheck-data-deps -fclone-hot-version-paths @gol
>>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>>> -fcombine-stack-adjustments -fconserve-stack @gol
>>> -fcompare-elim -fcprop-registers -fcrossjumping @gol
>>> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>>> @opindex fripa-verbose
>>> Enable printing of verbose information about dynamic inter-procedural optimizations.
>>> This is used in conjunction with the @option{-fripa}.
>>> +
>>> +@item -fcallgraph-profiles-sections
>>> +@opindex fcallgraph-profiles-sections
>>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
>>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
>>> +section is created for each function. This section lists every callee and the
>>> +number of times it is called. The params variable
>>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
>>> +certain threshold.
>>> @end table
>>>
>>> The following options control compiler behavior regarding floating
>>> Index: final.c
>>> ===================================================================
>>> --- final.c (revision 174789)
>>> +++ final.c (working copy)
>>> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>>> symbol_queue_size = 0;
>>> }
>>> }
>>> -
>>> +
>>> +/* List the call graph profiled edges whise value is greater than
>>> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
>>> + ".note.callgraph.text" section. */
>>> +static void
>>> +dump_cgraph_profiles (void)
>>> +{
>>> + struct cgraph_node *node = cgraph_node (current_function_decl);
>>> + struct cgraph_edge *e;
>>> + struct cgraph_node *callee;
>>> +
>>> + for (e = node->callees; e != NULL; e = e->next_callee)
>>> + {
>>> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
>>> + continue;
>>> + callee = e->callee;
>>> + fprintf (asm_out_file, "\t.string \"%s\"\n",
>>> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
>>> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
>>> + e->count);
>>> + }
>>> +}
>>> +
>>> /* Turn the RTL into assembly. */
>>> static unsigned int
>>> rest_of_handle_final (void)
>>> {
>>> rtx x;
>>> const char *fnname;
>>> + char *profile_fnname;
>>> + unsigned int flags;
>>>
>>> /* Get the function's name, as described by its RTL. This may be
>>> different from the DECL_NAME name used in the source file. */
>>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>>> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>>> decl_fini_priority_lookup
>>> (current_function_decl));
>>> +
>>> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing
>>> + profiling information. */
>>> + if (flag_callgraph_profiles_sections
>>> + && flag_profile_use
>>> + && cgraph_node (current_function_decl) != NULL)
>>> + {
>>> + flags = SECTION_DEBUG;
>>> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
>>> + switch_to_section (get_section (profile_fnname, flags, NULL));
>>> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
>>> + dump_cgraph_profiles ();
>>> + free (profile_fnname);
>>> + }
>>> +
>>> return 0;
>>> }
>>>
>>> Index: common.opt
>>> ===================================================================
>>> --- common.opt (revision 174789)
>>> +++ common.opt (working copy)
>>> @@ -907,6 +907,10 @@ fcaller-saves
>>> Common Report Var(flag_caller_saves) Optimization
>>> Save registers around function calls
>>>
>>> +fcallgraph-profiles-sections
>>> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
>>> +Generate .note.callgraph.text sections listing callees and edge counts.
>>> +
>>> fcheck-data-deps
>>> Common Report Var(flag_check_data_deps)
>>> Compare the results of several data dependence analyzers.
>>> Index: params.def
>>> ===================================================================
>>> --- params.def (revision 174789)
>>> +++ params.def (working copy)
>>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>>> "maximum length of the call graph path to be cloned "
>>> "while doing multiversioning",
>>> 2, 0, 5)
>>> +
>>> +/* Only output those call graph edges in .note.callgraph.text sections
>>> + whose count is greater than this value. */
>>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
>>> + "note-cgraph-section-edge-threshold",
>>> + "minimum call graph edge count for inclusion in "
>>> + ".note.callgraph.text section",
>>> + 0, 0, 0)
>>> +
>>> /*
>>> Local variables:
>>> mode:c
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/4591045
>>>
>>
>
More information about the Gcc-patches
mailing list