This is the mail archive of the
mailing list for the GCC project.
Re: Bring function profiles to callgraph to make them WHOPR ready
- From: Richard Guenther <richard dot guenther at gmail dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org
- Date: Mon, 12 Jul 2010 09:55:51 +0200
- Subject: Re: Bring function profiles to callgraph to make them WHOPR ready
- References: <20100426131018.GB9094@kam.mff.cuni.cz> <email@example.com> <AANLkTilRzspq9B8BUIi65hhCNGORQbm0kP_RJUVW5huo@mail.gmail.com>
On Mon, Jul 12, 2010 at 3:14 AM, H.J. Lu <firstname.lastname@example.org> wrote:
> On Mon, Apr 26, 2010 at 3:36 PM, H.J. Lu <email@example.com> wrote:
>> On Mon, Apr 26, 2010 at 6:10 AM, Jan Hubicka <firstname.lastname@example.org> wrote:
>>> this patch moves cfun->function_frequency into cgraph_node->frequency. ?This is
>>> neccessary for WHOPR to use it and it is where it really belongs anyway since
>>> frequencies are not same across all clones.
>>> The patch rises a need for current_cgraph_node that is similar to
>>> cfun/crtl/current_function_decl I will propose with incremental patch
>>> (I intend to cleanup the function switching API anyway).
>>> Patch also adds new function frequency called EXECUTED_ONCE. ?Currently it is
>>> set for main(), for functions marked noreturn and for static
>>> constructors/destructors. ?Such functions are optimized for size on everything
>>> except for code inside loops. ?So the patch has minor effect on code size of
>>> programs per se.
>>> On pretty-ipa I have ipa-profile pass propagating this knowledge across
>>> callgraph that helps to shave off couple percents off the resulting binaries.
>>> This unfortunately affect mostly simple programs where this is not that
>>> important, but at -flto (-fwhopr) and ?-fwhole-program we have chance to
>>> propagate into more significant portion of program. On SPEC GCC we mark couple
>>> houndred functions this way, resulting code size savings are not that important
>>> anyway, usually just slightly over 1% at SPEC. But Still I guess worth the very
>>> simple and cheap pass.
>>> Main advantage of this code is that it can actually prove the coldness of
>>> instructions isntead of just guessing. ?Currently we guess based on fixed
>>> threshold that makes us sometimes to misguess code to be unlikely when it is
>>> not. ?With some improvements (i.e. marking basic blocks that have no path to
>>> exit as executed once and propagating this to calls) we can have bit better
>>> It is possible to do more guesswork per Wu/Larus paper (i.e. promote function local
>>> profile estimates to global level) that is done by some compilers (Open64), but
>>> I am affraid of increasing the probability of doing a mistake and misjudging hot
>>> code to be cold.
>>> Simple improvements that also ran across my mind while implementing is that at
>>> -O0 we probably could default to size optimization defaults. ?While mostly we
>>> do not care, it still affects some code expansion and might lead to faster
>>> compile times?
>>> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>>> ? ? ? ?* cgraph.c (cgraph_create_node): Set node frequency to normal.
>>> ? ? ? ?(cgraph_clone_node): Copy function frequency.
>>> ? ? ? ?* cgraph.h (node_frequency): New enum
>>> ? ? ? ?(struct cgraph_node): Add.
>>> ? ? ? ?* final.c (rest_of_clean_state): Update.
>>> ? ? ? ?* lto-cgraph.c (lto_output_node): Output node frequency.
>>> ? ? ? ?(input_overwrite_node): Input node frequency.
>>> ? ? ? ?* tre-ssa-loop-ivopts (computation_cost): Update.
>>> ? ? ? ?* lto-streamer-out.c (output_function): Do not output function frequency.
>>> ? ? ? ?* predict.c (maybe_hot_frequency_p): Update and handle functions executed once.
>>> ? ? ? ?(cgraph_maybe_hot_edge_p): Likewise; use cgraph frequency instead of
>>> ? ? ? ?attribute lookup.
>>> ? ? ? ?(probably_never_executed_bb_p, optimize_function_for_size_p): Update.
>>> ? ? ? ?(compute_function_frequency): Set noreturn functions to be executed once.
>>> ? ? ? ?(choose_function_section): Update.
>>> ? ? ? ?* lto-streamer-in.c (input_function): Do not input function frequency.
>>> ? ? ? ?* function.c (allocate_struct_function): Do not initialize function frequency.
>>> ? ? ? ?* function.h (function_frequency): Remove.
>>> ? ? ? ?(struct function): Remove function frequency.
>>> ? ? ? ?* ipa-profile.c (CGRAPH_NODE_FREQUENCY): Remove.
>>> ? ? ? ?(try_update): Update.
>>> ? ? ? ?* tree-inline.c (initialize_cfun): Do not update function frequency.
>>> ? ? ? ?* passes.c (pass_init_dump_file): Update.
>>> ? ? ? ?* i386.c (ix86_compute_frame_layout): Update.
>>> ? ? ? ?(ix86_pad_returns): Update.
>> This caused:
> This patch fixes:
> on trunk. Is this the real fix or does it just make it latent?
It makes it latent.