This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cgraph node profile update in cgraph_rebuild_references causes a performance issue


Something seems wrong:

in tree_function_version:

 initialize_cfun (new_decl, old_decl,
                        old_entry_block->count);

>From the above we can see new_decl's entry BB's count will be the same
as old_decl (no scaling).
In copy_bb, new BB's profile count will also be the same as old BBs?

Since new_decl's entry count is wrong, updating the cgraph_node's
count in cgraph_rebuild_reference will be wrong:


  node->count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; // cfun
corresponds to new_decl

David




On Thu, Oct 30, 2014 at 8:49 PM, Wei Mi <wmi@google.com> wrote:
> We have a program like this:
>
> A() {    // hot func
>   ...
> }
>
> B() {
>   A();    // very hot
>   if (i) {
>     A();  // very cold
>   }
> }
>
> Both callsites of A will be inlined into B. In gcc func
> save_inline_function_body in inline_transform stage, A's first clone
> will be choosen and turned into a real func. For our case, the clone
> node choosen corresponds to the cold callsite of A.
> cgraph_rebuild_references in tree_function_versioning will reset the
> cgraph node count of the choosen clone to the entry bb count of func A
> (A is hot). So the cgraph node count of the choosen clone becomes hot
> while its inline edge count is still cold. It breaks the assumption
> described here:
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html:
> for inline node, bb->count == edge->count == edge->callee->count
>
> For the patch committed in the thread above (it is listed below),
> cg_edge->callee->count is used for profile update to its inline
> instance, which leads to a hot BB in func B which is actually very
> cold. The wrong profile information causes performance regression in
> one of our internal benchmarks.
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c (revision 210535)
> +++ gcc/tree-inline.c (working copy)
> @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c
>       function in any way before this point, as this CALL_EXPR may be
>       a self-referential call; if we're calling ourselves, we need to
>       duplicate our body before altering anything.  */
> -  copy_body (id, bb->count,
> +  copy_body (id, cg_edge->callee->count,
>         GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE),
>       bb, return_block, NULL);
>
> I don't understand why we need to reset the cgraph node count of the
> first clone to entry bb count in cgraph_rebuild_references, could you
> shed some light on it? The above patch may just uncover the problem?
>
> Thanks,
> Wei.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]