This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Handle more COMDAT profiling issues
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Teresa Johnson <tejohnson at google dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Jan Hubicka <hubicka at ucw dot cz>, David Li <davidxl at google dot com>
- Date: Mon, 26 May 2014 07:43:23 +0200
- Subject: Re: [PATCH] Handle more COMDAT profiling issues
- Authentication-results: sourceware.org; auth=none
- References: <CAAe5K+UKOY2UHLpTRpZsW8+T8a9=SnUrDhOJSoqt5N97weXD5g at mail dot gmail dot com>
> This patch attempts to address the lost profile issue for COMDATs in
> more circumstances, exposed by function splitting.
> My earlier patch handled the case where the comdat had 0 counts since
> the linker kept the copy in a different module. In that case we
> prevent the guessed frequencies on 0-count functions from being
> dropped by counts_to_freq, and then later mark any reached via
> non-zero callgraph edges as guessed. Finally, when one such 0-count
> comdat is inlined the call count is propagated to the callee blocks
> using the guessed probabilities.
> However, in this case, there was a comdat that had a very small
> non-zero count, that was being inlined to a much hotter callsite. This
> could happen when there was a copy that was ipa-inlined
> in the profile gen compile, so the copy in that module gets some
> non-zero counts from the ipa inlined instance, but when the out of
> line copy was eliminated by the linker (selected from a different
> module). In this case the inliner was scaling the bb counts up quite a
> lot when inlining. The problem is that you most likely can't trust
> that the 0 count bbs in such a case are really not executed by the
> callsite it is being into, since the counts are very small and
> correspond to a different callsite. In some internal C++ applications
> I am seeing that we execute out of the split cold portion of COMDATs
> for this reason.
With all the problems we have with COMDATs, it still seems to me that perhaps
best approach would be to merge them in runtime, as we discussed some time ago
and incrementally handle the problem how to get callstack sensitive profiles.
> This problem is more complicated to address than the 0-count instance,
> because we need the cgraph to determine which functions to drop the
> profile on, and at that point the estimated frequencies have already
> been overwritten by counts_to_freqs. To handle this broader case, I
> have changed the drop_profile routine to simply re-estimate the
> probabilities/frequencies (and translate these into counts scaled from
> the incoming call edge counts). This unfortunately necessitates
> rebuilding the cgraph, to propagate the new synthesized counts and
> avoid checking failures downstream. But it will only be rebuilt if we
> dropped any profiles. With this solution, some of the older approach
> can be removed (e.g. propagating counts using the guessed
> probabilities during inlining).
> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
> Also tested on
> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
2014-02-12 Teresa Johnson <firstname.lastname@example.org>
* graphite.c (graphite_finalize): Pass new parameter.
* params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
* predict.c (tree_estimate_probability): New parameter.
(tree_estimate_probability_worker): Renamed from
(tree_reestimate_probability): New function.
(freqs_to_counts): Move here from tree-inline.c.
(drop_profile): Re-estimated profiles when dropping counts.
(handle_missing_profiles): Drop for some non-zero functions as well,
get counts from bbs to support invocation before cgraph rebuild.
(counts_to_freqs): Remove code obviated by reestimation.
* predict.h (tree_estimate_probability): Update declaration.
* tree-inline.c (freqs_to_counts): Move to predict.c.
(copy_cfg_body): Remove code obviated by reestimation.
* tree-profile.c (tree_profiling): Invoke handle_missing_profiles
before cgraph rebuild.
--- params.def (revision 207436)
+++ params.def (working copy)
@@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
"Maximal estimated outcome of branch considered predictable",
2, 0, 50)
+ "Minimum caller-to-callee node count ratio to force reestimated branch "
+ "probabilities in callee (where 0 means only when callee count is 0)",
+ 10, 0, 0)
OK, so if callee count is 10% of call site count, we do the recomputation?
@@ -2390,7 +2392,8 @@ void
/* We use loop_niter_by_eval, which requires that the loops have
- create_preheaders (CP_SIMPLE_PREHEADERS);
+ if (!redo)
+ create_preheaders (CP_SIMPLE_PREHEADERS);
So you disable preheaders construction because this all happens at inlining
time when we do not want to change BBs because callgraph is built, right?
Are you sure we do have preheaders built at IPA time for all functions reliably?
(It would make sense to do so, but I think loop analysis is done rather randomly
by IPA analysis passes, such as inliner).
Otherwise the SCEV infrastructure will explode, if I remember correctly.
+/* Force re-estimation of the probabilities, because the profile was
+ deemed insufficient. */
+ basic_block bb;
+ edge e;
+ edge_iterator ei;
+ /* Need to reset the counts to ensure probabilities are recomputed. */
+ FOR_EACH_BB_FN (bb, cfun)
+ bb->count = 0;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+ e->count = 0;
+ tree_estimate_probability_worker (true);
You also want to remove recorded loop count estimates, since they are also
driven by profile we chose to not trust.
@@ -2842,48 +2927,77 @@ handle_missing_profiles (void)
gcov_type max_tp_first_run = 0;
struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
- if (node->count)
for (e = node->callers; e; e = e->next_caller)
- call_count += e->count;
+ call_count += cgraph_function_with_gimple_body_p (e->caller)
+ ? gimple_bb (e->call_stmt)->count : 0;
I am quite confused by this change - we you don't want to trust edge counts?
Other parts of the patch seems quite resonable, but I don't really follow here.