This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch] Handle profile counts truncated to 0 in coldness test

From: Jan Hubicka <hubicka at ucw dot cz>
To: Teresa Johnson <tejohnson at google dot com>
Cc: Jan Hubicka <hubicka at ucw dot cz>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, David Li <davidxl at google dot com>
Date: Sun, 6 Oct 2013 15:43:31 +0200
Subject: Re: [Patch] Handle profile counts truncated to 0 in coldness test
Authentication-results: sourceware.org; auth=none
References: <CAAe5K+UJOuP3JGpGuRk1b9ZdeWbhx5tEc6A8ajX3t5OgirrhAQ at mail dot gmail dot com>

> 2013-10-03  Teresa Johnson  <tejohnson@google.com>
> 
>         * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.

I would not mention HOT in the parameter name.  Blocks are hot/normal/unlikely
and we HOT_BB_FREQUENCY_FRACTION.

So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
to number of train runs?
> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
> +        "min-hot-run-ratio",
> +         "The minimum ratio of profile runs to basic block execution count "
> +         "for the block to be considered hot",
Perhaps as:
> +         "The maximum ratio of profile runs to basic block execution count "
> +         "for the block to be considered unlikely",

> +        4, 0, 10000)

lower bound should be 1 or we divide by 0.
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203152)
> +++ predict.c   (working copy)
> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>    gcc_checking_assert (fun);
>    if (profile_status_for_function (fun) == PROFILE_READ)
>      {
> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +      int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
> +      if (RDIV (count * min_run_ratio, profile_info->runs) > 0)

This way if count is 1, runs needs to be n_run_ratio * 2
So perhaps if (count * min_run_ratio >= profile_info->runs) instead of division.
> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE)
> +      if (ENTRY_BLOCK_PTR->count)
>         {
> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -                       ENTRY_BLOCK_PTR->frequency)
> -                 < REG_BR_PROB_BASE / 4);
> +          gcov_type scaled_count
> +              = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
> +          gcov_type computed_count;
> +          /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
> +             be large enough to do the division first without losing much
> +             precision.  */
> +          if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)

I do not like the undefined behaviour triggered before the check (sure unsigned
arithmetic would fix that, but it seems all strange).  What about guarding the
whole code.

if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
For large counts the roudoff problems in first test should not be problem.
> +            {
> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
> +                                     ENTRY_BLOCK_PTR->frequency);
> +              computed_count *= frequency * min_run_ratio;
> +            }
> +          else
> +            computed_count = RDIV (scaled_count, ENTRY_BLOCK_PTR->frequency);
> +          if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)

So at nonoverflow path you compute

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * min_run_ratio / runs > 0.5

I think you want

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs

If entry_bb_frequency is 0, you can just return true iff frequency is > min_run_ratio.

I think safe way to compute this is

if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
  scaled_count = ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency;
else
  scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * min_run_ratio

In first path, we have at most 10000*10000*10000*10000 that still fits in 64bit
value.

In the second path the base of fixed point arithmetic is large enough so the
division should not matter. We never get over entry_count * 10000 that is safe.
The first part of statement however just compute value that should match count
so count > 10000 unless profile got misupated and the low-count roundoff errors should
not matter. So perhaps the whole code can execute only if
if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
    && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
and we do not need any conditionals for scaled_count

Does such code work with ratio changed to 16?  I think the double multiplication in your
patch may have caused problems.

I will check the second part of patch separately.
Thanks!
Honza

Follow-Ups:
- Re: [Patch] Handle profile counts truncated to 0 in coldness test
  - From: Teresa Johnson

References:
- [Patch] Handle profile counts truncated to 0 in coldness test
  - From: Teresa Johnson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]