This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: FDO usability patch -- correct insane profile data


> Hi please review the attached patch.
> 
> Ok when bootstrap and test finish?
> 
> Thanks,
> 
> David
> 
> 
> 
> 2011-04-07  Xinliang David Li  <davidxl@google.com>
> 
> 	* ipa-cp.c (ipcp_update_profiling): Correct
> 	negative scale factor due to insane profile data.

> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 171917)
> +++ ipa-cp.c	(working copy)
> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
>  	  scale = ipcp_get_node_scale (orig_node);
>  	  node->count = orig_node->count * scale / REG_BR_PROB_BASE;
>  	  scale_complement = REG_BR_PROB_BASE - scale;
> +
> +          /* Negative scale complement can result from insane profile data
> +             in which the total incoming edge counts in this module is
> +             larger than the callee's entry count. The insane profile data
> +             usually gets generated due to the following reasons:
> +
> +             1) in multithreaded programs, when profile data is dumped
> +             to gcda files in gcov_exit, some other threads are still running.
> +             The profile counters are dumped in bottom up order (call graph).
> +             The caller's BB counters may still be updated while the callee's
> +             counter data is already saved to disk.
> +
> +             2) Comdat functions: comdat functions' profile data are not
> +             allocated in comdat. When a comdat callee function gets inlined
> +             at some callsites after instrumentation, and the remaining calls
> +             to this function resolves to a comdat copy in another module,
> +             the profile counters for this function are split. This can
> +             result in sum of incoming edge counts from this module being
> +             larger than callee instance's entry count.  */
> +
> +          if (scale_complement < 0 && flag_profile_correction)

Please don't use flag_profile_correction in code like this. Even with !flag_profile_correction
the profile is sane only just at the time it is read. Compiler invalidate it for various non-trivial
reasons during the code transformation.  Think if inlining 
int t(int a)
{
  if (a<0)
    return 1;
  else return 2;
}
that is called once with constant -1 and once with constant 0.  Profile say
that the branch has probability 1/2 but in one inline copy it has probablity 0
and in other copy 1.  We simply inline with probability 1/2 and end up with
insane profile in the caller.

I would suggest, for brevity, also drop the comment on why profile can be insane. We have
many places like this, even if it might look bit distracking at start, we probably could
document that somewhere in the internal documentation. Can't think of better place to stick
this.

However the test seems bit symptomatic to me. It matches always when scale > REG_BR_PROB_BASE
and such scale is already nonsential (i.e. clone should not be more often executed than its
original).  It is computed in:

/* Compute the proper scale for NODE.  It is the ratio between the number of
   direct calls (represented on the incoming cgraph_edges) and sum of all
   invocations of NODE (represented as count in cgraph_node).

   FIXME: This code is wrong.  Since the callers can be also clones and
   the clones are not scaled yet, the sums gets unrealistically high.
   To properly compute the counts, we would need to do propagation across
   callgraph (as external call to A might imply call to non-cloned B
   if A's clone calls cloned B).  */
static void
ipcp_compute_node_scale (struct cgraph_node *node)
{ 
  gcov_type sum;
  struct cgraph_edge *cs;

  sum = 0;
  /* Compute sum of all counts of callers. */
  for (cs = node->callers; cs != NULL; cs = cs->next_caller)
    sum += cs->count;
  /* Work around the unrealistically high sum problem.  We just don't want
     the non-cloned body to have negative or very low frequency.  Since
     majority of execution time will be spent in clones anyway, this should
     give good enough profile.  */
  if (sum > node->count * 9 / 10)
    sum = node->count * 9 / 10;
  if (node->count == 0)
    ipcp_set_node_scale (node, 0);
  else
    ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count);
}

We already test that sum is at most 9/10th of node count and then the scale is
computed as sum * REG_BR_PROB_BASE / node->count so the scale should not
exceed REG_BR_PROB_BASE * 9 / 10.

How it possibly can end up being greater than REG_BR_PROB_BASE at the time it is used?

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]