[PATCH] Relax invalidation of TOP N counters in PGO.

Tue Jan 21 15:34:00 GMT 2020

> Ok, after personal discussion with Honza that I had,
> we should be more conservative and prune only
> a run-time TOP N counters before we merge them.
> The patch does that. Can you please Honza test me the patch of Firefox?
> 
> Thanks,
> Martin

For a record, we ran tests of Firefox. It does not fix the problem with
the particular counter I referred to in the PR, but it does improve the
stats Martin collected:

> 
> $ ./stats.py before
> == Stats for before ==
> stats for indirect_call:
>   total: 160441
>   invalid: 930
>   tracked values:
>     0 values:   134841 times (84.04%)
>     1 values:    20085 times (12.52%)
>     2 values:     3098 times (1.93%)
>     3 values:      956 times (0.60%)
>     4 values:      531 times (0.33%)
> 
> stats for topn:
>   total: 167061
>   invalid: 999
>   tracked values:
>     0 values:   141082 times (84.45%)
>     1 values:    20280 times (12.14%)
>     2 values:     3174 times (1.90%)
>     3 values:      974 times (0.58%)
>     4 values:      552 times (0.33%)
> 
> $ ./stats.py after
> == Stats for after ==
> stats for indirect_call:
>   total: 160441
>   invalid: 309
>   tracked values:
>     0 values:   134945 times (84.11%)
>     1 values:    21129 times (13.17%)
>     2 values:     2950 times (1.84%)
>     3 values:      837 times (0.52%)
>     4 values:      271 times (0.17%)
> 
> stats for topn:
>   total: 167061
>   invalid: 326
>   tracked values:
>     0 values:   141207 times (84.52%)
>     1 values:    21356 times (12.78%)
>     2 values:     3033 times (1.82%)
>     3 values:      857 times (0.51%)
>     4 values:      282 times (0.17%)
> 
> $ == Stats for hack ==
> stats for indirect_call:
>   total: 160441
>   invalid: 0
>   tracked values:
>     0 values:   134852 times (84.05%)
>     1 values:    20089 times (12.52%)
>     2 values:     3090 times (1.93%)
>     3 values:      954 times (0.59%)
>     4 values:     1456 times (0.91%)
> 
> stats for topn:
>   total: 167061
>   invalid: 0
>   tracked values:
>     0 values:   141093 times (84.46%)
>     1 values:    20285 times (12.14%)
>     2 values:     3164 times (1.89%)
>     3 values:      972 times (0.58%)
>     4 values:     1547 times (0.93%)
> 

Here 
- before is current mainline,
- after is with patch applied
- hack is w/o reproducible profile merging (i.e. what GCC 9 and ealrier
  does)

So the patch reduces number of discarded values to 1/3 which is clear
improvement.  Incrementally I think we could consider command line
option specifying degree of reproducibility we expect from profile (i.e.
no reproducibility, reproducibility for single threaded train run,
reproducibility for threaded train run).

If distro build takes seriously FDO and reproducibility it will
eventually run into the limitation for single-threaded programs and we
will need to do reproducible runs for multithread apps which means
putting indirect call profiles into thread local storage I think, and
giving up on time profiling.

I think non-reproducible merging could be implemneted easier atop of
current infrastructure by simply adding
HIST_TYPE_INDIR_CALL_REPRODUCIBLE, HIST_TYPE_INDR_CALL_NONRERPODUCIBGL
and HIST_TYPE_TOPN_VALUE_REPRODUCIBLE,
HIST_TYPE_TOPN_VALUE_NONREPRODUCIBLE and use appropriate merging
function (i.e. one which invalidates full counter versus one whic does
not)

I wonder if we want to teach one of our autotesters to verify that
profiledbootstrap is reproducible? I guess we could-re-use our exisitng
bootstrap comparsion, we just need to produce two sets of train data and
then do the comparsion.
> Martin

> From 7fe1e6a59139ae00cefd1f5edf082d428952203e Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Wed, 15 Jan 2020 15:39:55 +0100
> Subject: [PATCH] Smart relaxation of TOP N counter.
> 
> ---
>  gcc/profile.c             |  9 +++++-
>  libgcc/libgcov-driver.c   | 58 ++++++++++++++++++++++++++++++++++++++-
>  libgcc/libgcov-profiler.c | 27 ++++++++----------
>  3 files changed, 77 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/profile.c b/gcc/profile.c
> index 6a2de21c3bd..1a6caf75a61 100644
> --- a/gcc/profile.c
> +++ b/gcc/profile.c
> @@ -863,7 +863,14 @@ compute_value_histograms (histogram_values values, unsigned cfg_checksum,
>  
>        if (hist->type == HIST_TYPE_TOPN_VALUES
>  	  || hist->type == HIST_TYPE_INDIR_CALL)
> -	sort_hist_values (hist);
> +	{
> +	  /* Each count value is multiplied by GCOV_TOPN_VALUES.  */
> +	  if (hist->hvalue.counters[2] != -1)
> +	    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
> +	      hist->hvalue.counters[2 * i + 2] /= GCOV_TOPN_VALUES;

I would use RDIV for consistency with rest of profile code scaling.

> +
> +	  sort_hist_values (hist);
> +	}
>  
>        /* Time profiler counter is not related to any statement,
>           so that we have to read the counter and set the value to
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index 1c07761b7f1..b71b0e3edac 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -213,6 +213,60 @@ static struct gcov_fn_buffer *fn_buffer;
>  /* Including system dependent components. */
>  #include "libgcov-driver-system.c"
>  
> +/* Prune TOP N value COUNTERS.  */
Please add comment explaining the reason for pruning.
> +
> +static void
> +prune_topn_counter (gcov_type *counters, gcov_type all)
> +{
> +  if (counters[1] == -1)
> +    return;
> +
> +  // TODO
> +#if 0
> +  fprintf (stderr, "prune_topn_counter for all=%d\n", all);
> +  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
> +    fprintf (stderr, "%d:%d:%d\n", i, counters[2 * i], counters[2 * i + 1]);
> +#endif
Probably no TODO and #if 0 here.
> +
> +  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
> +    {
> +      if (counters[2 * i + 1] < all)
> +	{
> +	  counters[2 * i] = 0;
> +	  counters[2 * i + 1] = 0;
> +	}
> +    }
> +}
> +
> +/* Prune counters so that they are ready to store or merge.  */
> +
> +static void
> +prune_counters (struct gcov_info *gi)
> +{
> +  for (unsigned i = 0; i < gi->n_functions; i++)
> +    {
> +      const struct gcov_fn_info *gfi = gi->functions[i];
> +      const struct gcov_ctr_info *ci = gfi->ctrs;
veritcal whitespace per GNU coding standard here.
> +      for (unsigned j = 0; j < GCOV_COUNTERS; j++)
> +	{
> +	  if (gi->merge[j] == NULL)
> +	    continue;
> +
> +	  if (gi->merge[j] == __gcov_merge_topn)
> +	    {
> +	      gcc_assert (!(ci->num % GCOV_TOPN_VALUES_COUNTERS));
> +	      for (unsigned k = 0; k < (ci->num / GCOV_TOPN_VALUES_COUNTERS); k++)
> +		{
> +		  gcov_type *counters
> +		    = ci->values + (k * GCOV_TOPN_VALUES_COUNTERS);
> +		  prune_topn_counter (counters + 1, *counters);
> +		}
> +	    }
> +	  ci++;
> +	}
> +    }
> +}
> +
>  /* This function merges counters in GI_PTR to an existing gcda file.
>     Return 0 on success.
>     Return -1 on error. In this case, caller will goto read_fatal.  */
> @@ -429,9 +483,11 @@ dump_one_gcov (struct gcov_info *gi_ptr, struct gcov_filename *gf,
>    struct gcov_summary summary = {};
>    int error;
>    gcov_unsigned_t tag;
> -
>    fn_buffer = 0;
>  
> +  /* Prune current counters before we merge them.  */
> +  prune_counters (gi_ptr);
> +
>    error = gcov_exit_open_gcda_file (gi_ptr, gf);
>    if (error == -1)
>      return;
> diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
> index 9417904d462..39e3c523250 100644
> --- a/libgcc/libgcov-profiler.c
> +++ b/libgcc/libgcov-profiler.c
> @@ -119,37 +119,34 @@ __gcov_topn_values_profiler_body (gcov_type *counters, gcov_type value,
>  
>    ++counters;
>  
> -  /* We have GCOV_TOPN_VALUES as we can keep multiple values
> -     next to each other.  */
> -  unsigned sindex = 0;
> -
>    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
>      {
>        if (value == counters[2 * i])
>  	{
>  	  if (use_atomic)
> -	    __atomic_fetch_add (&counters[2 * i + 1], 1, __ATOMIC_RELAXED);
> +	    __atomic_fetch_add (&counters[2 * i + 1], GCOV_TOPN_VALUES, __ATOMIC_RELAXED);
>  	  else
> -	    counters[2 * i + 1]++;
> +	    counters[2 * i + 1] += GCOV_TOPN_VALUES;
>  	  return;
>  	}
>        else if (counters[2 * i + 1] == 0)
Since you are decrementing without guarding for 0, you want to test <= 0
here.  

OK with these changes, but I think we want to re-test Firefox with this
fixed.  Getting negative value in counter probably renders it dead for a
while?

I think you want
 1) look for counter matching value with non-negative count.
    If matched, increment the corresponding counter and be happy
    At the same time remember if there is a counter with count <= 0
 2) if counter was full but count with <= 0 is found, set it to 1 and
    update value it is tracking
 3) if there was no empty slot decrement all counts.

I think this way it should be reasonably well behaving for concurent
updates too.

Honza

>  	{
>  	  /* We found an empty slot.  */
>  	  counters[2 * i] = value;
> -	  counters[2 * i + 1] = 1;
> +	  counters[2 * i + 1] = GCOV_TOPN_VALUES;
>  	  return;
>  	}
> -
> -      if (counters[2 * i + 1] < counters[2 * sindex + 1])
> -	sindex = i;
>      }
>  
> -  /* We haven't found an empty slot, then decrement the smallest.  */
> -  if (use_atomic)
> -    __atomic_fetch_sub (&counters[2 * sindex + 1], 1, __ATOMIC_RELAXED);
> -  else
> -    counters[2 * sindex + 1]--;
> +  /* We haven't found an empty slot, then decrement all
> +     counter values by one.  */
> +  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
> +    {
> +      if (use_atomic)
> +	__atomic_fetch_sub (&counters[2 * i + 1], 1, __ATOMIC_RELAXED);
> +      else
> +	counters[2 * i + 1]--;
> +    }
>  }
>  
>  #ifdef L_gcov_topn_values_profiler
> -- 
> 2.24.1
>