This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Handle more COMDAT profiling issues


This patch attempts to address the lost profile issue for COMDATs in
more circumstances, exposed by function splitting.

My earlier patch handled the case where the comdat had 0 counts since
the linker kept the copy in a different module. In that case we
prevent the guessed frequencies on 0-count functions from being
dropped by counts_to_freq, and then later mark any reached via
non-zero callgraph edges as guessed. Finally, when one such 0-count
comdat is inlined the call count is propagated to the callee blocks
using the guessed probabilities.

However, in this case, there was a comdat that had a very small
non-zero count, that was being inlined to a much hotter callsite. This
could happen when there was a copy that was ipa-inlined
in the profile gen compile, so the copy in that module gets some
non-zero counts from the ipa inlined instance, but when the out of
line copy was eliminated by the linker (selected from a different
module). In this case the inliner was scaling the bb counts up quite a
lot when inlining. The problem is that you most likely can't trust
that the 0 count bbs in such a case are really not executed by the
callsite it is being into, since the counts are very small and
correspond to a different callsite. In some internal C++ applications
I am seeing that we execute out of the split cold portion of COMDATs
for this reason.

This problem is more complicated to address than the 0-count instance,
because we need the cgraph to determine which functions to drop the
profile on, and at that point the estimated frequencies have already
been overwritten by counts_to_freqs. To handle this broader case, I
have changed the drop_profile routine to simply re-estimate the
probabilities/frequencies (and translate these into counts scaled from
the incoming call edge counts). This unfortunately necessitates
rebuilding the cgraph, to propagate the new synthesized counts and
avoid checking failures downstream. But it will only be rebuilt if we
dropped any profiles. With this solution, some of the older approach
can be removed (e.g. propagating counts using the guessed
probabilities during inlining).

Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
Also tested on
a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?

Thanks,
Teresa

2014-02-10  Teresa Johnson  <tejohnson@google.com>

        * graphite.c (graphite_finalize): Pass new parameter.
        * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
        * predict.c (tree_estimate_probability): New parameter.
        (tree_estimate_probability_worker): Renamed from
        tree_estimate_probability_driver.
        (tree_reestimate_probability): New function.
        (tree_estimate_probability_driver): Invoke
        tree_estimate_probability_worker.
        (freqs_to_counts): Move from tree-inline.c.
        (drop_profile): Re-estimated profiles when dropping counts.
        (handle_missing_profiles): Drop for some non-zero functions as well.
        (counts_to_freqs): Remove code obviated by reestimation.
        * predict.h (handle_missing_profiles): Update declartion.
        (tree_estimate_probability): Ditto.
        * tree-inline.c (freqs_to_counts): Move to predict.c.
        (copy_cfg_body): Remove code obviated by reestimation.
        * tree-profile.c (gimple_gen_ior_profiler):
        (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph.
        (tree_profiling): Invoke rebuild_cgraph as needed.

Index: graphite.c
===================================================================
--- graphite.c  (revision 207436)
+++ graphite.c  (working copy)
@@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p)
       cleanup_tree_cfg ();
       profile_status_for_fn (cfun) = PROFILE_ABSENT;
       release_recorded_exits ();
-      tree_estimate_probability ();
+      tree_estimate_probability (false);
     }

   cloog_state_free (cloog_state);
Index: params.def
===================================================================
--- params.def  (revision 207436)
+++ params.def  (working copy)
@@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
          "Maximal estimated outcome of branch considered predictable",
          2, 0, 50)

+DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
+         "min-caller-reestimate-ratio",
+         "Minimum caller-to-callee node count ratio to force
reestimated branch "
+          "probabilities in callee (where 0 means only when callee
count is 0)",
+         10, 0, 0)
+
 DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
          "inline-min-speedup",
          "The minimal estimated speedup allowing inliner to ignore
inline-insns-single and inline-isnsns-auto",
Index: predict.c
===================================================================
--- predict.c   (revision 207436)
+++ predict.c   (working copy)
@@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb)

 /* Predict branch probabilities and estimate profile of the tree CFG.
    This function can be called from the loop optimizers to recompute
-   the profile information.  */
+   the profile information.  When REDO is true then we are forcing
+   re-estimation of the probabilities because the profile was deemed
+   insufficient.  */

 void
-tree_estimate_probability (void)
+tree_estimate_probability (bool redo)
 {
   basic_block bb;

@@ -2390,7 +2392,8 @@ void
   connect_infinite_loops_to_exit ();
   /* We use loop_niter_by_eval, which requires that the loops have
      preheaders.  */
-  create_preheaders (CP_SIMPLE_PREHEADERS);
+  if (!redo)
+    create_preheaders (CP_SIMPLE_PREHEADERS);
   calculate_dominance_info (CDI_POST_DOMINATORS);

   bb_predictions = pointer_map_create ();
@@ -2412,16 +2415,16 @@ void
   pointer_map_destroy (bb_predictions);
   bb_predictions = NULL;

-  estimate_bb_frequencies (false);
+  estimate_bb_frequencies (redo);
   free_dominance_info (CDI_POST_DOMINATORS);
   remove_fake_exit_edges ();
 }

 /* Predict branch probabilities and estimate profile of the tree CFG.
-   This is the driver function for PASS_PROFILE.  */
+   When REDO is true, we are forcing reestimation of the probabilities.  */

-static unsigned int
-tree_estimate_probability_driver (void)
+static void
+tree_estimate_probability_worker (bool redo)
 {
   unsigned nb_loops;

@@ -2435,7 +2438,7 @@ void
   if (nb_loops > 1)
     scev_initialize ();

-  tree_estimate_probability ();
+  tree_estimate_probability (redo);

   if (nb_loops > 1)
     scev_finalize ();
@@ -2445,6 +2448,34 @@ void
     gimple_dump_cfg (dump_file, dump_flags);
   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
     profile_status_for_fn (cfun) = PROFILE_GUESSED;
+}
+
+/* Force re-estimation of the probabilities, because the profile was
+   deemed insufficient.  */
+
+static void
+tree_reestimate_probability (void)
+{
+  basic_block bb;
+  edge e;
+  edge_iterator ei;
+
+  /* Need to reset the counts to ensure probabilities are recomputed.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      bb->count = 0;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        e->count = 0;
+    }
+  tree_estimate_probability_worker (true);
+}
+
+/* Estimate probabilities.
+   This is the driver function for PASS_PROFILE.  */
+static unsigned int
+tree_estimate_probability_driver (void)
+{
+  tree_estimate_probability_worker (false);
   return 0;
 }
 ^L
@@ -2765,6 +2796,28 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Convert estimated frequencies into counts for NODE, scaling COUNT
+   with each bb's frequency. Used when NODE has an entry count that
+   is much lower than the caller edges reaching it.  See the comments
+   for handle_missing_profiles() for when this can happen for COMDATs.  */
+
+void
+freqs_to_counts (struct cgraph_node *node, gcov_type count)
+{
+  basic_block bb;
+  edge_iterator ei;
+  edge e;
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  FOR_ALL_BB_FN(bb, fn)
+    {
+      bb->count = apply_scale (count,
+                               GCOV_COMPUTE_SCALE (bb->frequency,
BB_FREQ_MAX));
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        e->count = apply_probability (e->src->count, e->probability);
+    }
+}
+
 /* Drop the profile for NODE to guessed, and update its frequency based on
    whether it is expected to be hot given the CALL_COUNT.  */

@@ -2772,6 +2825,9 @@ static void
 drop_profile (struct cgraph_node *node, gcov_type call_count)
 {
   struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (profile_status_for_fn (fn) == PROFILE_GUESSED)
+    return;
   /* In the case where this was called by another function with a
      dropped profile, call_count will be 0. Since there are no
      non-zero call counts to this function, we don't know for sure
@@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type

   if (dump_file)
     fprintf (dump_file,
-             "Dropping 0 profile for %s/%i. %s based on calls.\n",
+             "Dropping %ld profile for %s/%i. %s based on calls.\n",
+             node->count,
              node->name (), node->order,
              hot ? "Function is hot" : "Function is normal");
   /* We only expect to miss profiles for functions that are reached
@@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type
                  node->name (), node->order);
     }

+  /* Re-estimate the probabilities for function and use the estimated
+     frequencies to compute the counts.  */
+  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
+  tree_reestimate_probability ();
+  freqs_to_counts (node, call_count);
+  if (dump_file)
+    {
+      fprintf (dump_file, "After re-estimating probabilies and counts\n");
+      gimple_dump_cfg (dump_file,
dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS);
+    }
+  pop_cfun ();
+
   profile_status_for_fn (fn)
       = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
   node->frequency
@@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type
 /* In the case of COMDAT routines, multiple object files will contain the same
    function and the linker will select one for the binary. In that case
    all the other copies from the profile instrument binary will be missing
-   profile counts. Look for cases where this happened, due to non-zero
+   profile counts. This can confuse downstream optimizations such as
+   function splitting.
+
+   Look for cases where this happened, due to non-zero
    call counts going to 0-count functions, and drop the profile to guessed
    so that we can use the estimated probabilities and avoid optimizing only
-   for size.
+   for size. In the case where the COMDAT was inlined in some locations
+   within the file and not others, the profile count will be non-zero due
+   to the inlined instances, but may still be significantly smaller than the
+   call edges for the non-inlined instances. Detect that case when requested
+   and reestimate probabilities, since the counts will not necessarily reflect
+   the behavior along the more frequent call paths.

    The other case where the profile may be missing is when the routine
    is not going to be emitted to the object file, e.g. for "extern template"
    class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
    all other cases of non-zero calls to 0-count functions.  */

-void
+bool
 handle_missing_profiles (void)
 {
   struct cgraph_node *node;
   int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
   vec<struct cgraph_node *> worklist;
   worklist.create (64);
+  int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO);
+  bool changed = false;

-  /* See if 0 count function has non-0 count callers.  In this case we
-     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  /* See if 0 or low count function has higher count caller edges.  In this
+     case we lost some profile.  Drop its function profile to
+     PROFILE_GUESSED.  */
   FOR_EACH_DEFINED_FUNCTION (node)
     {
       struct cgraph_edge *e;
@@ -2842,8 +2922,6 @@ handle_missing_profiles (void)
       gcov_type max_tp_first_run = 0;
       struct function *fn = DECL_STRUCT_FUNCTION (node->decl);

-      if (node->count)
-        continue;
       for (e = node->callers; e; e = e->next_caller)
       {
         call_count += e->count;
@@ -2852,6 +2930,12 @@ handle_missing_profiles (void)
          max_tp_first_run = e->caller->tp_first_run;
       }

+      /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop
+         profiles for 0-count functions called by non-zero call edges.  */
+      if ((!min_reest_ratio && node->count > 0)
+          || (min_reest_ratio && node->count * min_reest_ratio > call_count))
+        continue;
+
       /* If time profile is missing, let assign the maximum that comes from
         caller functions.  */
       if (!node->tp_first_run && max_tp_first_run)
@@ -2862,11 +2946,12 @@ handle_missing_profiles (void)
           && (call_count * unlikely_count_fraction >= profile_info->runs))
         {
           drop_profile (node, call_count);
+          changed = true;
           worklist.safe_push (node);
         }
     }

-  /* Propagate the profile dropping to other 0-count COMDATs that are
+  /* Propagate the profile dropping to other low-count COMDATs that are
      potentially called by COMDATs we already dropped the profile on.  */
   while (worklist.length () > 0)
     {
@@ -2878,17 +2963,33 @@ handle_missing_profiles (void)
           struct cgraph_node *callee = e->callee;
           struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);

-          if (callee->count > 0)
+          /* When min_reest_ratio is non-zero, if we get here we dropped
+             a caller's profile since it was significantly smaller than its
+             call edge. Drop the profile on any callees whose node count is
+             now exceeded by the new caller node count.  */
+          if ((!min_reest_ratio && callee->count > 0)
+              || (min_reest_ratio && callee->count >= node->count))
             continue;
+
+          gcov_type call_count = 0;
+          if (min_reest_ratio > 0)
+            {
+              struct cgraph_edge *e2;
+              for (e2 = node->callers; e2; e2 = e2->next_caller)
+                call_count += e2->count;
+            }
+
           if (DECL_COMDAT (callee->decl) && fn && fn->cfg
               && profile_status_for_fn (fn) == PROFILE_READ)
             {
-              drop_profile (node, 0);
+              drop_profile (node, call_count);
+              changed = true;
               worklist.safe_push (callee);
             }
         }
     }
   worklist.release ();
+  return changed;
 }

 /* Convert counts measured by profile driven feedback to frequencies.
@@ -2900,12 +3001,6 @@ counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

-  /* Don't overwrite the estimated frequencies when the profile for
-     the function is missing.  We may drop this function PROFILE_GUESSED
-     later in drop_profile ().  */
-  if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
-    return 0;
-
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 207436)
+++ predict.h   (working copy)
@@ -47,11 +47,11 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
-extern void handle_missing_profiles (void);
+extern bool handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
-extern void tree_estimate_probability (void);
+extern void tree_estimate_probability (bool);
 extern void compute_function_frequency (void);
 extern void rebuild_frequencies (void);

Index: tree-inline.c
===================================================================
--- tree-inline.c       (revision 207436)
+++ tree-inline.c       (working copy)
@@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo
     }
 }

-/* Convert estimated frequencies into counts for NODE, scaling COUNT
-   with each bb's frequency. Used when NODE has a 0-weight entry
-   but we are about to inline it into a non-zero count call bb.
-   See the comments for handle_missing_profiles() in predict.c for
-   when this can happen for COMDATs.  */
-
-void
-freqs_to_counts (struct cgraph_node *node, gcov_type count)
-{
-  basic_block bb;
-  edge_iterator ei;
-  edge e;
-  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
-
-  FOR_ALL_BB_FN(bb, fn)
-    {
-      bb->count = apply_scale (count,
-                               GCOV_COMPUTE_SCALE (bb->frequency,
BB_FREQ_MAX));
-      FOR_EACH_EDGE (e, ei, bb->succs)
-        e->count = apply_probability (e->src->count, e->probability);
-    }
-}
-
 /* Make a copy of the body of FN so that it can be inserted inline in
    another function.  Walks FN via CFG, returns new fndecl.  */

@@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
   int incoming_frequency = 0;
   gcov_type incoming_count = 0;

-  /* This can happen for COMDAT routines that end up with 0 counts
-     despite being called (see the comments for handle_missing_profiles()
-     in predict.c as to why). Apply counts to the blocks in the callee
-     before inlining, using the guessed edge frequencies, so that we don't
-     end up with a 0-count inline body which can confuse downstream
-     optimizations such as function splitting.  */
-  if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count)
-    {
-      /* Apply the larger of the call bb count and the total incoming
-         call edge count to the callee.  */
-      gcov_type in_count = 0;
-      struct cgraph_edge *in_edge;
-      for (in_edge = id->src_node->callers; in_edge;
-           in_edge = in_edge->next_caller)
-        in_count += in_edge->count;
-      freqs_to_counts (id->src_node, count > in_count ? count : in_count);
-    }
-
   if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count)
     count_scale
         = GCOV_COMPUTE_SCALE (count,
@@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
   else
     count_scale = REG_BR_PROB_BASE;

+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file,
+             "Scaling entry count %ld to %ld with scale %ld while inlining "
+             "%s into %s\n",
+             count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale,
+             id->src_node->name (), id->dst_node->name ());
+
   /* Register specific tree functions.  */
   gimple_register_cfg_hooks ();

Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 207436)
+++ tree-profile.c      (working copy)
@@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }

+/* Update call statements when UPDATE_CALLS, and rebuild the cgraph edges.  */
+
+static void
+rebuild_cgraph (bool update_calls)
+{
+  struct cgraph_node *node;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      basic_block bb;
+
+      if (!gimple_has_body_p (node->decl)
+         || !(!node->clone_of
+         || node->decl != node->clone_of->decl))
+       continue;
+
+      /* Don't profile functions produced for builtin stuff.  */
+      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
+       continue;
+
+      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
+
+      if (update_calls)
+        {
+          FOR_EACH_BB_FN (bb, cfun)
+            {
+              gimple_stmt_iterator gsi;
+              for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+                {
+                  gimple stmt = gsi_stmt (gsi);
+                  if (is_gimple_call (stmt))
+                    update_stmt (stmt);
+                }
+            }
+
+          /* re-merge split blocks.  */
+          cleanup_tree_cfg ();
+          update_ssa (TODO_update_ssa);
+        }
+
+      rebuild_cgraph_edges ();
+
+      pop_cfun ();
+    }
+}
+
 /* Profile all functions in the callgraph.  */

 static unsigned int
@@ -622,43 +668,14 @@ tree_profiling (void)
     }

   /* Update call statements and rebuild the cgraph.  */
-  FOR_EACH_DEFINED_FUNCTION (node)
-    {
-      basic_block bb;
+  rebuild_cgraph (true);

-      if (!gimple_has_body_p (node->decl)
-         || !(!node->clone_of
-         || node->decl != node->clone_of->decl))
-       continue;
+  /* If the profiles were dropped on any functions, unfortunately we
+     need to rebuild the cgraph to propagate the new reestimated counts
+     and avoid sanity failures due to inconsistencies.  */
+  if (handle_missing_profiles ())
+    rebuild_cgraph (false);

-      /* Don't profile functions produced for builtin stuff.  */
-      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
-       continue;
-
-      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
-
-      FOR_EACH_BB_FN (bb, cfun)
-       {
-         gimple_stmt_iterator gsi;
-         for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-           {
-             gimple stmt = gsi_stmt (gsi);
-             if (is_gimple_call (stmt))
-               update_stmt (stmt);
-           }
-       }
-
-      /* re-merge split blocks.  */
-      cleanup_tree_cfg ();
-      update_ssa (TODO_update_ssa);
-
-      rebuild_cgraph_edges ();
-
-      pop_cfun ();
-    }
-
-  handle_missing_profiles ();
-
   del_node_map ();
   return 0;
 }



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]