This is the mail archive of the
mailing list for the GCC project.
Re: Bring function profiles to callgraph to make them WHOPR ready
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Sun, 11 Jul 2010 18:14:10 -0700
- Subject: Re: Bring function profiles to callgraph to make them WHOPR ready
- References: <20100426131018.GB9094@kam.mff.cuni.cz> <email@example.com>
On Mon, Apr 26, 2010 at 3:36 PM, H.J. Lu <firstname.lastname@example.org> wrote:
> On Mon, Apr 26, 2010 at 6:10 AM, Jan Hubicka <email@example.com> wrote:
>> this patch moves cfun->function_frequency into cgraph_node->frequency. ?This is
>> neccessary for WHOPR to use it and it is where it really belongs anyway since
>> frequencies are not same across all clones.
>> The patch rises a need for current_cgraph_node that is similar to
>> cfun/crtl/current_function_decl I will propose with incremental patch
>> (I intend to cleanup the function switching API anyway).
>> Patch also adds new function frequency called EXECUTED_ONCE. ?Currently it is
>> set for main(), for functions marked noreturn and for static
>> constructors/destructors. ?Such functions are optimized for size on everything
>> except for code inside loops. ?So the patch has minor effect on code size of
>> programs per se.
>> On pretty-ipa I have ipa-profile pass propagating this knowledge across
>> callgraph that helps to shave off couple percents off the resulting binaries.
>> This unfortunately affect mostly simple programs where this is not that
>> important, but at -flto (-fwhopr) and ?-fwhole-program we have chance to
>> propagate into more significant portion of program. On SPEC GCC we mark couple
>> houndred functions this way, resulting code size savings are not that important
>> anyway, usually just slightly over 1% at SPEC. But Still I guess worth the very
>> simple and cheap pass.
>> Main advantage of this code is that it can actually prove the coldness of
>> instructions isntead of just guessing. ?Currently we guess based on fixed
>> threshold that makes us sometimes to misguess code to be unlikely when it is
>> not. ?With some improvements (i.e. marking basic blocks that have no path to
>> exit as executed once and propagating this to calls) we can have bit better
>> It is possible to do more guesswork per Wu/Larus paper (i.e. promote function local
>> profile estimates to global level) that is done by some compilers (Open64), but
>> I am affraid of increasing the probability of doing a mistake and misjudging hot
>> code to be cold.
>> Simple improvements that also ran across my mind while implementing is that at
>> -O0 we probably could default to size optimization defaults. ?While mostly we
>> do not care, it still affects some code expansion and might lead to faster
>> compile times?
>> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>> ? ? ? ?* cgraph.c (cgraph_create_node): Set node frequency to normal.
>> ? ? ? ?(cgraph_clone_node): Copy function frequency.
>> ? ? ? ?* cgraph.h (node_frequency): New enum
>> ? ? ? ?(struct cgraph_node): Add.
>> ? ? ? ?* final.c (rest_of_clean_state): Update.
>> ? ? ? ?* lto-cgraph.c (lto_output_node): Output node frequency.
>> ? ? ? ?(input_overwrite_node): Input node frequency.
>> ? ? ? ?* tre-ssa-loop-ivopts (computation_cost): Update.
>> ? ? ? ?* lto-streamer-out.c (output_function): Do not output function frequency.
>> ? ? ? ?* predict.c (maybe_hot_frequency_p): Update and handle functions executed once.
>> ? ? ? ?(cgraph_maybe_hot_edge_p): Likewise; use cgraph frequency instead of
>> ? ? ? ?attribute lookup.
>> ? ? ? ?(probably_never_executed_bb_p, optimize_function_for_size_p): Update.
>> ? ? ? ?(compute_function_frequency): Set noreturn functions to be executed once.
>> ? ? ? ?(choose_function_section): Update.
>> ? ? ? ?* lto-streamer-in.c (input_function): Do not input function frequency.
>> ? ? ? ?* function.c (allocate_struct_function): Do not initialize function frequency.
>> ? ? ? ?* function.h (function_frequency): Remove.
>> ? ? ? ?(struct function): Remove function frequency.
>> ? ? ? ?* ipa-profile.c (CGRAPH_NODE_FREQUENCY): Remove.
>> ? ? ? ?(try_update): Update.
>> ? ? ? ?* tree-inline.c (initialize_cfun): Do not update function frequency.
>> ? ? ? ?* passes.c (pass_init_dump_file): Update.
>> ? ? ? ?* i386.c (ix86_compute_frame_layout): Update.
>> ? ? ? ?(ix86_pad_returns): Update.
> This caused:
This patch fixes:
on trunk. Is this the real fix or does it just make it latent?