This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: cgraph node profile update in cgraph_rebuild_references causes a performance issue
- From: Xinliang David Li <davidxl at google dot com>
- To: Wei Mi <wmi at google dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, GCC Development <gcc at gcc dot gnu dot org>, Cong Hou <congh at google dot com>, Rong Xu <xur at google dot com>, Dehao Chen <dehao at google dot com>
- Date: Thu, 30 Oct 2014 21:37:03 -0700
- Subject: Re: cgraph node profile update in cgraph_rebuild_references causes a performance issue
- Authentication-results: sourceware.org; auth=none
- References: <CA+4CFy5Mh8HOZfviL1bcdZ6gY=eQsjdRp2TF+0WUuN-bf1MSRA at mail dot gmail dot com>
Something seems wrong:
in tree_function_version:
initialize_cfun (new_decl, old_decl,
old_entry_block->count);
>From the above we can see new_decl's entry BB's count will be the same
as old_decl (no scaling).
In copy_bb, new BB's profile count will also be the same as old BBs?
Since new_decl's entry count is wrong, updating the cgraph_node's
count in cgraph_rebuild_reference will be wrong:
node->count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; // cfun
corresponds to new_decl
David
On Thu, Oct 30, 2014 at 8:49 PM, Wei Mi <wmi@google.com> wrote:
> We have a program like this:
>
> A() { // hot func
> ...
> }
>
> B() {
> A(); // very hot
> if (i) {
> A(); // very cold
> }
> }
>
> Both callsites of A will be inlined into B. In gcc func
> save_inline_function_body in inline_transform stage, A's first clone
> will be choosen and turned into a real func. For our case, the clone
> node choosen corresponds to the cold callsite of A.
> cgraph_rebuild_references in tree_function_versioning will reset the
> cgraph node count of the choosen clone to the entry bb count of func A
> (A is hot). So the cgraph node count of the choosen clone becomes hot
> while its inline edge count is still cold. It breaks the assumption
> described here:
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html:
> for inline node, bb->count == edge->count == edge->callee->count
>
> For the patch committed in the thread above (it is listed below),
> cg_edge->callee->count is used for profile update to its inline
> instance, which leads to a hot BB in func B which is actually very
> cold. The wrong profile information causes performance regression in
> one of our internal benchmarks.
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c (revision 210535)
> +++ gcc/tree-inline.c (working copy)
> @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c
> function in any way before this point, as this CALL_EXPR may be
> a self-referential call; if we're calling ourselves, we need to
> duplicate our body before altering anything. */
> - copy_body (id, bb->count,
> + copy_body (id, cg_edge->callee->count,
> GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE),
> bb, return_block, NULL);
>
> I don't understand why we need to reset the cgraph node count of the
> first clone to entry bb count in cgraph_rebuild_references, could you
> shed some light on it? The above patch may just uncover the problem?
>
> Thanks,
> Wei.