This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Teresa Johnson <tejohnson at google dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, Bernhard Reutner-Fischer <rep dot dot dot nop at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Steven Bosscher <stevenb dot gcc at gmail dot com>, Jeff Law <law at redhat dot com>, "marxin.liska" <marxin dot liska at gmail dot com>, Sriraman Tallam <tmsriram at google dot com>
- Date: Wed, 21 Aug 2013 17:25:27 +0200
- Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
- References: <CAAe5K+WToUznwFFfm5beapXAOOrOgxHR8LXmYBTL70C4VVsT+w at mail dot gmail dot com> <20130808222332 dot GA31755 at kam dot mff dot cuni dot cz> <CAAe5K+W+8borbPkt4BB1ayRgDbFBtd6oyZsGuUiC854o9t0Rjg at mail dot gmail dot com> <20130809095843 dot GC31755 at kam dot mff dot cuni dot cz> <CAAe5K+XXT6t5CXBDXPWMNSrLWwqfw8F_J2fNUAN2afqb5qPhzQ at mail dot gmail dot com> <20130809152804 dot GA6579 at kam dot mff dot cuni dot cz> <CAAe5K+UcMevsDcXOeq5tu0+u-FrLVVWR6Wd20ZhBHNWdNsQ4Zw at mail dot gmail dot com> <CAAe5K+W+VEERrsaCCjCD=n40_MO2EPashDp5qnKoS8SLaSjBjQ at mail dot gmail dot com> <20130817204408 dot GA16557 at kam dot mff dot cuni dot cz> <CAAe5K+U70iqe6c8HUnd=__vFYQ_OeBKHTZ096FA+RrrPDvzrhQ at mail dot gmail dot com>
> >
> > Because offline COMDAT functoin will be porduced for every COMDAT used, I think
> > it is bad to porduce any COMDAT (or any reachable function via calls with non-0
> > count) that has empty profile (either because it got lost by COMDAT merging
> > or because of reading mismatch).
>
> The approach this patch takes is to simply treat those functions the
> same as we would if we didn't feed back profile data in the first
> place, by using the frequencies. This is sufficient except when one is
> inlined, which is why I have the special handling in the inliner
> itself.
Yes, my orignal plan was to have per-function profile_status that
specify if profile is read, guessed or absent and do function local
decision sanely with each setting.
Here we read the function, we set profile to READ (with all counts being 0).
We should drop it to GUESSED when we see that there are non-0 count edges
calling the function in question and probably we should see if it is obviously
hot (i.e. reachable by a hot call) and promote its function profile to HOT
then to get code placement less bad...
> >
> > Since new direct calls can be discovered later, inline may want to do that
> > again each time it inlines non-0 count call of COMDAT with 0 count...
>
> How about an approach like this:
> - Invoke init_and_estimate_bb_frequencies as I am doing to guess the
> profiles at profile read time for functions with 0 counts.
I see, here we are out of sync.
We always used to go with estimated frequencies for functions with 0 counts,
but it seems that this code broke when prediction was moved before profiling.
(we also should keep edge probabilities from predict.c in that case)
The esitmated profile is already there before reading the profile in, so we
only do not want to overwrite it. Does the following work for you?
Index: tree-profile.c
===================================================================
--- tree-profile.c (revision 201838)
+++ tree-profile.c (working copy)
@@ -604,6 +604,34 @@
pop_cfun ();
}
+ /* See if 0 count function has non-0 count callers. In this case we
+ lost some profile. Drop its function profile to PROFILE_GUESSED. */
+ FOR_EACH_DEFINED_FUNCTION (node)
+ {
+ struct cgraph_edge *e;
+ bool called = false;
+ if (node->count)
+ continue;
+ for (e = node->callers; e; e = e->next_caller)
+ {
+ if (e->count)
+ called = true;
+ if (cgraph_maybe_hot_edge_p (e))
+ break;
+ }
+ if (e || called
+ && profile_status_for_function
+ (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+ {
+ if (dump_file)
+ fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+ cgraph_node_name (node), node->symbol.order,
+ e ? "function is hot" : "function is normal");
+ profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+ = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+ node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+ }
+ }
del_node_map();
return 0;
Index: predict.c
===================================================================
--- predict.c (revision 201838)
+++ predict.c (working copy)
@@ -2715,6 +2715,9 @@
gcov_type count_max, true_count_max = 0;
basic_block bb;
+ if (!ENTRY_BLOCK_PTR->count)
+ return 0;
+
FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
true_count_max = MAX (bb->count, true_count_max);
> - At inline time, invoke some kind of freqs_to_counts routine for any
> 0-count routine that is reached by non-zero call edges. It would take
We should not need that since frequencies ought to be there.
> the sum of all incoming call edge counts and synthesize counts for the
> bbs using the guessed profile frequencies applied earlier by
> init_and_estimate_bb_frequencies. Then the inliner can do its normal
> bb count scaling.
Yes, i guess we should go this way. Still we will need to watch overly large values.
Recrusive inlining can probably easily produce quite a nonsense here.
We wil also need to solve problem that in this case cgraph edges will have 0 profile.
We probably want to play the game there and just do the scaling for edge count,
since IPA passes probably do not want to care about partial profiles.
>
> Does that seem like a reasonable approach?
>
> There is one other fix in this patch:
> - The clone_inlined_nodes/update_noncloned_frequencies changes below
> are handling a different case: 0-count call edge in this module, with
> non-zero callee node count due to calls from other modules. It will
> allow update_noncloned_frequencies to scale down the edge counts in
> callee's cloned call tree. This was a fix I made for the
> callgraph-based linker plugin function reordering, and not splitting
> (since it is using both the node and edge weights to make ordering
> decisions). Here's a description of the issue when I was debugging it:
Yes, it seems resonable. I did not really care about comdats at a time
I was writting this function..
> >> Index: ipa-inline-transform.c
> >> ===================================================================
> >> --- ipa-inline-transform.c (revision 201644)
> >> +++ ipa-inline-transform.c (working copy)
> >> @@ -51,7 +51,7 @@ int nfunctions_inlined;
> >>
> >> static void
> >> update_noncloned_frequencies (struct cgraph_node *node,
> >> - int freq_scale)
> >> + gcov_type count_scale, int freq_scale)
> >> {
> >> struct cgraph_edge *e;
> >>
> >> @@ -60,14 +60,16 @@ update_noncloned_frequencies (struct cgraph_node *
> >> freq_scale = 1;
> >> for (e = node->callees; e; e = e->next_callee)
> >> {
> >> + e->count = apply_scale (e->count, count_scale);
> >> e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE;
> >> if (e->frequency > CGRAPH_FREQ_MAX)
> >> e->frequency = CGRAPH_FREQ_MAX;
> >> if (!e->inline_failed)
> >> - update_noncloned_frequencies (e->callee, freq_scale);
> >> + update_noncloned_frequencies (e->callee, count_scale, freq_scale);
> >> }
> >> for (e = node->indirect_calls; e; e = e->next_callee)
> >> {
> >> + e->count = apply_scale (e->count, count_scale);
> >> e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE;
> >> if (e->frequency > CGRAPH_FREQ_MAX)
> >> e->frequency = CGRAPH_FREQ_MAX;
> >> @@ -169,7 +171,13 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
> >> }
> >> duplicate = false;
> >> e->callee->symbol.externally_visible = false;
> >> - update_noncloned_frequencies (e->callee, e->frequency);
> >> + // In the case of a COMDAT, the callee's count may be from other
> >> + // modules, and we need to scale it for the current module's calls
> >> + // (e.g. e->count may be 0 despite e->callee->count > 0).
> >> + gcov_type count_scale = REG_BR_PROB_BASE;
> >> + if (e->callee->count > e->count)
> >> + count_scale = GCOV_COMPUTE_SCALE (e->count, e->callee->count);
> >> + update_noncloned_frequencies (e->callee, count_scale, e->frequency);
Please fix the comment to the usual style and go ahead with this change.
Thanks,
Honza
> >> }
> >> else
> >> {
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413