[PATCH] Handle more COMDAT profiling issues

Teresa Johnson tejohnson@google.com
Wed Feb 12 01:36:00 GMT 2014


On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
> Why is call graph needed to determine whether to drop the profile?

Because we detect this situation by looking for cases where the call
edge counts greatly exceed the callee node count.

>
> If that is needed, it might be possible to leverage the ipa_profile
> pass as it will walk through all function nodes to do profile
> annotation. With this you can make decision to drop callee profile in
> caller's context.

There are 2 ipa profiling passes, which are somewhat confusingly named
(to me at least. =). This is being done during the first.

The first is pass_ipa_tree_profile in tree-profile.c, but is a
SIMPLE_IPA_PASS and has the name "profile" in the dump. The second is
pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the
name "profile_estimate" in the dump. I assume you are suggesting to
move this into the latter? But I'm not clear on what benefit that
gives - the functions are not being traversed in order, so there is
still the issue of needing to rebuild the cgraph after dropping
profiles, which might be best done earlier as I have in the patch.

Teresa

>
> David
>
> On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Is it better to add some logic in counts_to_freq to determine if the
>>> profile count needs to be dropped completely to force profile
>>> estimation?
>>
>> This is the problem I was mentioning below where we call
>> counts_to_freqs before we have the cgraph and can tell that we will
>> need to drop the profile. When we were only dropping the profile for
>> functions with all 0 counts, we simply avoided doing the
>> counts_to_freqs when the counts were all 0, which works since the 0
>> counts don't leave things in an inconsistent state (counts vs
>> estimated frequencies).
>>
>> Teresa
>>
>>>
>>> David
>>>
>>> On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> This patch attempts to address the lost profile issue for COMDATs in
>>>> more circumstances, exposed by function splitting.
>>>>
>>>> My earlier patch handled the case where the comdat had 0 counts since
>>>> the linker kept the copy in a different module. In that case we
>>>> prevent the guessed frequencies on 0-count functions from being
>>>> dropped by counts_to_freq, and then later mark any reached via
>>>> non-zero callgraph edges as guessed. Finally, when one such 0-count
>>>> comdat is inlined the call count is propagated to the callee blocks
>>>> using the guessed probabilities.
>>>>
>>>> However, in this case, there was a comdat that had a very small
>>>> non-zero count, that was being inlined to a much hotter callsite. This
>>>> could happen when there was a copy that was ipa-inlined
>>>> in the profile gen compile, so the copy in that module gets some
>>>> non-zero counts from the ipa inlined instance, but when the out of
>>>> line copy was eliminated by the linker (selected from a different
>>>> module). In this case the inliner was scaling the bb counts up quite a
>>>> lot when inlining. The problem is that you most likely can't trust
>>>> that the 0 count bbs in such a case are really not executed by the
>>>> callsite it is being into, since the counts are very small and
>>>> correspond to a different callsite. In some internal C++ applications
>>>> I am seeing that we execute out of the split cold portion of COMDATs
>>>> for this reason.
>>>>
>>>> This problem is more complicated to address than the 0-count instance,
>>>> because we need the cgraph to determine which functions to drop the
>>>> profile on, and at that point the estimated frequencies have already
>>>> been overwritten by counts_to_freqs. To handle this broader case, I
>>>> have changed the drop_profile routine to simply re-estimate the
>>>> probabilities/frequencies (and translate these into counts scaled from
>>>> the incoming call edge counts). This unfortunately necessitates
>>>> rebuilding the cgraph, to propagate the new synthesized counts and
>>>> avoid checking failures downstream. But it will only be rebuilt if we
>>>> dropped any profiles. With this solution, some of the older approach
>>>> can be removed (e.g. propagating counts using the guessed
>>>> probabilities during inlining).
>>>>
>>>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
>>>> Also tested on
>>>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 2014-02-10  Teresa Johnson  <tejohnson@google.com>
>>>>
>>>>         * graphite.c (graphite_finalize): Pass new parameter.
>>>>         * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
>>>>         * predict.c (tree_estimate_probability): New parameter.
>>>>         (tree_estimate_probability_worker): Renamed from
>>>>         tree_estimate_probability_driver.
>>>>         (tree_reestimate_probability): New function.
>>>>         (tree_estimate_probability_driver): Invoke
>>>>         tree_estimate_probability_worker.
>>>>         (freqs_to_counts): Move from tree-inline.c.
>>>>         (drop_profile): Re-estimated profiles when dropping counts.
>>>>         (handle_missing_profiles): Drop for some non-zero functions as well.
>>>>         (counts_to_freqs): Remove code obviated by reestimation.
>>>>         * predict.h (handle_missing_profiles): Update declartion.
>>>>         (tree_estimate_probability): Ditto.
>>>>         * tree-inline.c (freqs_to_counts): Move to predict.c.
>>>>         (copy_cfg_body): Remove code obviated by reestimation.
>>>>         * tree-profile.c (gimple_gen_ior_profiler):
>>>>         (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph.
>>>>         (tree_profiling): Invoke rebuild_cgraph as needed.
>>>>
>>>> Index: graphite.c
>>>> ===================================================================
>>>> --- graphite.c  (revision 207436)
>>>> +++ graphite.c  (working copy)
>>>> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p)
>>>>        cleanup_tree_cfg ();
>>>>        profile_status_for_fn (cfun) = PROFILE_ABSENT;
>>>>        release_recorded_exits ();
>>>> -      tree_estimate_probability ();
>>>> +      tree_estimate_probability (false);
>>>>      }
>>>>
>>>>    cloog_state_free (cloog_state);
>>>> Index: params.def
>>>> ===================================================================
>>>> --- params.def  (revision 207436)
>>>> +++ params.def  (working copy)
>>>> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
>>>>           "Maximal estimated outcome of branch considered predictable",
>>>>           2, 0, 50)
>>>>
>>>> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
>>>> +         "min-caller-reestimate-ratio",
>>>> +         "Minimum caller-to-callee node count ratio to force
>>>> reestimated branch "
>>>> +          "probabilities in callee (where 0 means only when callee
>>>> count is 0)",
>>>> +         10, 0, 0)
>>>> +
>>>>  DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
>>>>           "inline-min-speedup",
>>>>           "The minimal estimated speedup allowing inliner to ignore
>>>> inline-insns-single and inline-isnsns-auto",
>>>> Index: predict.c
>>>> ===================================================================
>>>> --- predict.c   (revision 207436)
>>>> +++ predict.c   (working copy)
>>>> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb)
>>>>
>>>>  /* Predict branch probabilities and estimate profile of the tree CFG.
>>>>     This function can be called from the loop optimizers to recompute
>>>> -   the profile information.  */
>>>> +   the profile information.  When REDO is true then we are forcing
>>>> +   re-estimation of the probabilities because the profile was deemed
>>>> +   insufficient.  */
>>>>
>>>>  void
>>>> -tree_estimate_probability (void)
>>>> +tree_estimate_probability (bool redo)
>>>>  {
>>>>    basic_block bb;
>>>>
>>>> @@ -2390,7 +2392,8 @@ void
>>>>    connect_infinite_loops_to_exit ();
>>>>    /* We use loop_niter_by_eval, which requires that the loops have
>>>>       preheaders.  */
>>>> -  create_preheaders (CP_SIMPLE_PREHEADERS);
>>>> +  if (!redo)
>>>> +    create_preheaders (CP_SIMPLE_PREHEADERS);
>>>>    calculate_dominance_info (CDI_POST_DOMINATORS);
>>>>
>>>>    bb_predictions = pointer_map_create ();
>>>> @@ -2412,16 +2415,16 @@ void
>>>>    pointer_map_destroy (bb_predictions);
>>>>    bb_predictions = NULL;
>>>>
>>>> -  estimate_bb_frequencies (false);
>>>> +  estimate_bb_frequencies (redo);
>>>>    free_dominance_info (CDI_POST_DOMINATORS);
>>>>    remove_fake_exit_edges ();
>>>>  }
>>>>
>>>>  /* Predict branch probabilities and estimate profile of the tree CFG.
>>>> -   This is the driver function for PASS_PROFILE.  */
>>>> +   When REDO is true, we are forcing reestimation of the probabilities.  */
>>>>
>>>> -static unsigned int
>>>> -tree_estimate_probability_driver (void)
>>>> +static void
>>>> +tree_estimate_probability_worker (bool redo)
>>>>  {
>>>>    unsigned nb_loops;
>>>>
>>>> @@ -2435,7 +2438,7 @@ void
>>>>    if (nb_loops > 1)
>>>>      scev_initialize ();
>>>>
>>>> -  tree_estimate_probability ();
>>>> +  tree_estimate_probability (redo);
>>>>
>>>>    if (nb_loops > 1)
>>>>      scev_finalize ();
>>>> @@ -2445,6 +2448,34 @@ void
>>>>      gimple_dump_cfg (dump_file, dump_flags);
>>>>    if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>>>>      profile_status_for_fn (cfun) = PROFILE_GUESSED;
>>>> +}
>>>> +
>>>> +/* Force re-estimation of the probabilities, because the profile was
>>>> +   deemed insufficient.  */
>>>> +
>>>> +static void
>>>> +tree_reestimate_probability (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  edge e;
>>>> +  edge_iterator ei;
>>>> +
>>>> +  /* Need to reset the counts to ensure probabilities are recomputed.  */
>>>> +  FOR_EACH_BB_FN (bb, cfun)
>>>> +    {
>>>> +      bb->count = 0;
>>>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> +        e->count = 0;
>>>> +    }
>>>> +  tree_estimate_probability_worker (true);
>>>> +}
>>>> +
>>>> +/* Estimate probabilities.
>>>> +   This is the driver function for PASS_PROFILE.  */
>>>> +static unsigned int
>>>> +tree_estimate_probability_driver (void)
>>>> +{
>>>> +  tree_estimate_probability_worker (false);
>>>>    return 0;
>>>>  }
>>>>  ^L
>>>> @@ -2765,6 +2796,28 @@ estimate_loops (void)
>>>>    BITMAP_FREE (tovisit);
>>>>  }
>>>>
>>>> +/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>> +   with each bb's frequency. Used when NODE has an entry count that
>>>> +   is much lower than the caller edges reaching it.  See the comments
>>>> +   for handle_missing_profiles() for when this can happen for COMDATs.  */
>>>> +
>>>> +void
>>>> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>> +{
>>>> +  basic_block bb;
>>>> +  edge_iterator ei;
>>>> +  edge e;
>>>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>> +
>>>> +  FOR_ALL_BB_FN(bb, fn)
>>>> +    {
>>>> +      bb->count = apply_scale (count,
>>>> +                               GCOV_COMPUTE_SCALE (bb->frequency,
>>>> BB_FREQ_MAX));
>>>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> +        e->count = apply_probability (e->src->count, e->probability);
>>>> +    }
>>>> +}
>>>> +
>>>>  /* Drop the profile for NODE to guessed, and update its frequency based on
>>>>     whether it is expected to be hot given the CALL_COUNT.  */
>>>>
>>>> @@ -2772,6 +2825,9 @@ static void
>>>>  drop_profile (struct cgraph_node *node, gcov_type call_count)
>>>>  {
>>>>    struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>> +
>>>> +  if (profile_status_for_fn (fn) == PROFILE_GUESSED)
>>>> +    return;
>>>>    /* In the case where this was called by another function with a
>>>>       dropped profile, call_count will be 0. Since there are no
>>>>       non-zero call counts to this function, we don't know for sure
>>>> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>
>>>>    if (dump_file)
>>>>      fprintf (dump_file,
>>>> -             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>>>> +             "Dropping %ld profile for %s/%i. %s based on calls.\n",
>>>> +             node->count,
>>>>               node->name (), node->order,
>>>>               hot ? "Function is hot" : "Function is normal");
>>>>    /* We only expect to miss profiles for functions that are reached
>>>> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>                   node->name (), node->order);
>>>>      }
>>>>
>>>> +  /* Re-estimate the probabilities for function and use the estimated
>>>> +     frequencies to compute the counts.  */
>>>> +  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>> +  tree_reestimate_probability ();
>>>> +  freqs_to_counts (node, call_count);
>>>> +  if (dump_file)
>>>> +    {
>>>> +      fprintf (dump_file, "After re-estimating probabilies and counts\n");
>>>> +      gimple_dump_cfg (dump_file,
>>>> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS);
>>>> +    }
>>>> +  pop_cfun ();
>>>> +
>>>>    profile_status_for_fn (fn)
>>>>        = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>>>    node->frequency
>>>> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>  /* In the case of COMDAT routines, multiple object files will contain the same
>>>>     function and the linker will select one for the binary. In that case
>>>>     all the other copies from the profile instrument binary will be missing
>>>> -   profile counts. Look for cases where this happened, due to non-zero
>>>> +   profile counts. This can confuse downstream optimizations such as
>>>> +   function splitting.
>>>> +
>>>> +   Look for cases where this happened, due to non-zero
>>>>     call counts going to 0-count functions, and drop the profile to guessed
>>>>     so that we can use the estimated probabilities and avoid optimizing only
>>>> -   for size.
>>>> +   for size. In the case where the COMDAT was inlined in some locations
>>>> +   within the file and not others, the profile count will be non-zero due
>>>> +   to the inlined instances, but may still be significantly smaller than the
>>>> +   call edges for the non-inlined instances. Detect that case when requested
>>>> +   and reestimate probabilities, since the counts will not necessarily reflect
>>>> +   the behavior along the more frequent call paths.
>>>>
>>>>     The other case where the profile may be missing is when the routine
>>>>     is not going to be emitted to the object file, e.g. for "extern template"
>>>>     class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>>>>     all other cases of non-zero calls to 0-count functions.  */
>>>>
>>>> -void
>>>> +bool
>>>>  handle_missing_profiles (void)
>>>>  {
>>>>    struct cgraph_node *node;
>>>>    int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>>>>    vec<struct cgraph_node *> worklist;
>>>>    worklist.create (64);
>>>> +  int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO);
>>>> +  bool changed = false;
>>>>
>>>> -  /* See if 0 count function has non-0 count callers.  In this case we
>>>> -     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>>> +  /* See if 0 or low count function has higher count caller edges.  In this
>>>> +     case we lost some profile.  Drop its function profile to
>>>> +     PROFILE_GUESSED.  */
>>>>    FOR_EACH_DEFINED_FUNCTION (node)
>>>>      {
>>>>        struct cgraph_edge *e;
>>>> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void)
>>>>        gcov_type max_tp_first_run = 0;
>>>>        struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>
>>>> -      if (node->count)
>>>> -        continue;
>>>>        for (e = node->callers; e; e = e->next_caller)
>>>>        {
>>>>          call_count += e->count;
>>>> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void)
>>>>           max_tp_first_run = e->caller->tp_first_run;
>>>>        }
>>>>
>>>> +      /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop
>>>> +         profiles for 0-count functions called by non-zero call edges.  */
>>>> +      if ((!min_reest_ratio && node->count > 0)
>>>> +          || (min_reest_ratio && node->count * min_reest_ratio > call_count))
>>>> +        continue;
>>>> +
>>>>        /* If time profile is missing, let assign the maximum that comes from
>>>>          caller functions.  */
>>>>        if (!node->tp_first_run && max_tp_first_run)
>>>> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void)
>>>>            && (call_count * unlikely_count_fraction >= profile_info->runs))
>>>>          {
>>>>            drop_profile (node, call_count);
>>>> +          changed = true;
>>>>            worklist.safe_push (node);
>>>>          }
>>>>      }
>>>>
>>>> -  /* Propagate the profile dropping to other 0-count COMDATs that are
>>>> +  /* Propagate the profile dropping to other low-count COMDATs that are
>>>>       potentially called by COMDATs we already dropped the profile on.  */
>>>>    while (worklist.length () > 0)
>>>>      {
>>>> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void)
>>>>            struct cgraph_node *callee = e->callee;
>>>>            struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>>>>
>>>> -          if (callee->count > 0)
>>>> +          /* When min_reest_ratio is non-zero, if we get here we dropped
>>>> +             a caller's profile since it was significantly smaller than its
>>>> +             call edge. Drop the profile on any callees whose node count is
>>>> +             now exceeded by the new caller node count.  */
>>>> +          if ((!min_reest_ratio && callee->count > 0)
>>>> +              || (min_reest_ratio && callee->count >= node->count))
>>>>              continue;
>>>> +
>>>> +          gcov_type call_count = 0;
>>>> +          if (min_reest_ratio > 0)
>>>> +            {
>>>> +              struct cgraph_edge *e2;
>>>> +              for (e2 = node->callers; e2; e2 = e2->next_caller)
>>>> +                call_count += e2->count;
>>>> +            }
>>>> +
>>>>            if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>>>>                && profile_status_for_fn (fn) == PROFILE_READ)
>>>>              {
>>>> -              drop_profile (node, 0);
>>>> +              drop_profile (node, call_count);
>>>> +              changed = true;
>>>>                worklist.safe_push (callee);
>>>>              }
>>>>          }
>>>>      }
>>>>    worklist.release ();
>>>> +  return changed;
>>>>  }
>>>>
>>>>  /* Convert counts measured by profile driven feedback to frequencies.
>>>> @@ -2900,12 +3001,6 @@ counts_to_freqs (void)
>>>>    gcov_type count_max, true_count_max = 0;
>>>>    basic_block bb;
>>>>
>>>> -  /* Don't overwrite the estimated frequencies when the profile for
>>>> -     the function is missing.  We may drop this function PROFILE_GUESSED
>>>> -     later in drop_profile ().  */
>>>> -  if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
>>>> -    return 0;
>>>> -
>>>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
>>>>      true_count_max = MAX (bb->count, true_count_max);
>>>>
>>>> Index: predict.h
>>>> ===================================================================
>>>> --- predict.h   (revision 207436)
>>>> +++ predict.h   (working copy)
>>>> @@ -47,11 +47,11 @@ enum prediction
>>>>
>>>>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>>>>  extern int counts_to_freqs (void);
>>>> -extern void handle_missing_profiles (void);
>>>> +extern bool handle_missing_profiles (void);
>>>>  extern void estimate_bb_frequencies (bool);
>>>>  extern const char *predictor_name (enum br_predictor);
>>>>  extern tree build_predict_expr (enum br_predictor, enum prediction);
>>>> -extern void tree_estimate_probability (void);
>>>> +extern void tree_estimate_probability (bool);
>>>>  extern void compute_function_frequency (void);
>>>>  extern void rebuild_frequencies (void);
>>>>
>>>> Index: tree-inline.c
>>>> ===================================================================
>>>> --- tree-inline.c       (revision 207436)
>>>> +++ tree-inline.c       (working copy)
>>>> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo
>>>>      }
>>>>  }
>>>>
>>>> -/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>> -   with each bb's frequency. Used when NODE has a 0-weight entry
>>>> -   but we are about to inline it into a non-zero count call bb.
>>>> -   See the comments for handle_missing_profiles() in predict.c for
>>>> -   when this can happen for COMDATs.  */
>>>> -
>>>> -void
>>>> -freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>> -{
>>>> -  basic_block bb;
>>>> -  edge_iterator ei;
>>>> -  edge e;
>>>> -  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>> -
>>>> -  FOR_ALL_BB_FN(bb, fn)
>>>> -    {
>>>> -      bb->count = apply_scale (count,
>>>> -                               GCOV_COMPUTE_SCALE (bb->frequency,
>>>> BB_FREQ_MAX));
>>>> -      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> -        e->count = apply_probability (e->src->count, e->probability);
>>>> -    }
>>>> -}
>>>> -
>>>>  /* Make a copy of the body of FN so that it can be inserted inline in
>>>>     another function.  Walks FN via CFG, returns new fndecl.  */
>>>>
>>>> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
>>>>    int incoming_frequency = 0;
>>>>    gcov_type incoming_count = 0;
>>>>
>>>> -  /* This can happen for COMDAT routines that end up with 0 counts
>>>> -     despite being called (see the comments for handle_missing_profiles()
>>>> -     in predict.c as to why). Apply counts to the blocks in the callee
>>>> -     before inlining, using the guessed edge frequencies, so that we don't
>>>> -     end up with a 0-count inline body which can confuse downstream
>>>> -     optimizations such as function splitting.  */
>>>> -  if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count)
>>>> -    {
>>>> -      /* Apply the larger of the call bb count and the total incoming
>>>> -         call edge count to the callee.  */
>>>> -      gcov_type in_count = 0;
>>>> -      struct cgraph_edge *in_edge;
>>>> -      for (in_edge = id->src_node->callers; in_edge;
>>>> -           in_edge = in_edge->next_caller)
>>>> -        in_count += in_edge->count;
>>>> -      freqs_to_counts (id->src_node, count > in_count ? count : in_count);
>>>> -    }
>>>> -
>>>>    if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count)
>>>>      count_scale
>>>>          = GCOV_COMPUTE_SCALE (count,
>>>> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
>>>>    else
>>>>      count_scale = REG_BR_PROB_BASE;
>>>>
>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +    fprintf (dump_file,
>>>> +             "Scaling entry count %ld to %ld with scale %ld while inlining "
>>>> +             "%s into %s\n",
>>>> +             count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale,
>>>> +             id->src_node->name (), id->dst_node->name ());
>>>> +
>>>>    /* Register specific tree functions.  */
>>>>    gimple_register_cfg_hooks ();
>>>>
>>>> Index: tree-profile.c
>>>> ===================================================================
>>>> --- tree-profile.c      (revision 207436)
>>>> +++ tree-profile.c      (working copy)
>>>> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un
>>>>    gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>>>>  }
>>>>
>>>> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph edges.  */
>>>> +
>>>> +static void
>>>> +rebuild_cgraph (bool update_calls)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +
>>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>>> +    {
>>>> +      basic_block bb;
>>>> +
>>>> +      if (!gimple_has_body_p (node->decl)
>>>> +         || !(!node->clone_of
>>>> +         || node->decl != node->clone_of->decl))
>>>> +       continue;
>>>> +
>>>> +      /* Don't profile functions produced for builtin stuff.  */
>>>> +      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>> +       continue;
>>>> +
>>>> +      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>> +
>>>> +      if (update_calls)
>>>> +        {
>>>> +          FOR_EACH_BB_FN (bb, cfun)
>>>> +            {
>>>> +              gimple_stmt_iterator gsi;
>>>> +              for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +                {
>>>> +                  gimple stmt = gsi_stmt (gsi);
>>>> +                  if (is_gimple_call (stmt))
>>>> +                    update_stmt (stmt);
>>>> +                }
>>>> +            }
>>>> +
>>>> +          /* re-merge split blocks.  */
>>>> +          cleanup_tree_cfg ();
>>>> +          update_ssa (TODO_update_ssa);
>>>> +        }
>>>> +
>>>> +      rebuild_cgraph_edges ();
>>>> +
>>>> +      pop_cfun ();
>>>> +    }
>>>> +}
>>>> +
>>>>  /* Profile all functions in the callgraph.  */
>>>>
>>>>  static unsigned int
>>>> @@ -622,43 +668,14 @@ tree_profiling (void)
>>>>      }
>>>>
>>>>    /* Update call statements and rebuild the cgraph.  */
>>>> -  FOR_EACH_DEFINED_FUNCTION (node)
>>>> -    {
>>>> -      basic_block bb;
>>>> +  rebuild_cgraph (true);
>>>>
>>>> -      if (!gimple_has_body_p (node->decl)
>>>> -         || !(!node->clone_of
>>>> -         || node->decl != node->clone_of->decl))
>>>> -       continue;
>>>> +  /* If the profiles were dropped on any functions, unfortunately we
>>>> +     need to rebuild the cgraph to propagate the new reestimated counts
>>>> +     and avoid sanity failures due to inconsistencies.  */
>>>> +  if (handle_missing_profiles ())
>>>> +    rebuild_cgraph (false);
>>>>
>>>> -      /* Don't profile functions produced for builtin stuff.  */
>>>> -      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>> -       continue;
>>>> -
>>>> -      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>> -
>>>> -      FOR_EACH_BB_FN (bb, cfun)
>>>> -       {
>>>> -         gimple_stmt_iterator gsi;
>>>> -         for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> -           {
>>>> -             gimple stmt = gsi_stmt (gsi);
>>>> -             if (is_gimple_call (stmt))
>>>> -               update_stmt (stmt);
>>>> -           }
>>>> -       }
>>>> -
>>>> -      /* re-merge split blocks.  */
>>>> -      cleanup_tree_cfg ();
>>>> -      update_ssa (TODO_update_ssa);
>>>> -
>>>> -      rebuild_cgraph_edges ();
>>>> -
>>>> -      pop_cfun ();
>>>> -    }
>>>> -
>>>> -  handle_missing_profiles ();
>>>> -
>>>>    del_node_map ();
>>>>    return 0;
>>>>  }
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



More information about the Gcc-patches mailing list