This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch] Handle profile counts truncated to 0 in coldness test
- From: Teresa Johnson <tejohnson at google dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, David Li <davidxl at google dot com>
- Date: Mon, 7 Oct 2013 22:45:04 -0700
- Subject: Re: [Patch] Handle profile counts truncated to 0 in coldness test
- Authentication-results: sourceware.org; auth=none
- References: <CAAe5K+UJOuP3JGpGuRk1b9ZdeWbhx5tEc6A8ajX3t5OgirrhAQ at mail dot gmail dot com> <20131006134331 dot GA5429 at kam dot mff dot cuni dot cz>
On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 2013-10-03 Teresa Johnson <tejohnson@google.com>
>>
>> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>
> I would not mention HOT in the parameter name. Blocks are hot/normal/unlikely
> and we HOT_BB_FREQUENCY_FRACTION.
>
> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
> to number of train runs?
Ok.
>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
>> + "min-hot-run-ratio",
>> + "The minimum ratio of profile runs to basic block execution count "
>> + "for the block to be considered hot",
> Perhaps as:
>> + "The maximum ratio of profile runs to basic block execution count "
>> + "for the block to be considered unlikely",
>
>> + 4, 0, 10000)
>
> lower bound should be 1 or we divide by 0.
Yep, will fix.
>> Index: predict.c
>> ===================================================================
>> --- predict.c (revision 203152)
>> +++ predict.c (working copy)
>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>> gcc_checking_assert (fun);
>> if (profile_status_for_function (fun) == PROFILE_READ)
>> {
>> - if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
>> + int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
>> + if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>
> This way if count is 1, runs needs to be n_run_ratio * 2
> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of division.
Ok.
>> - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE)
>> + if (ENTRY_BLOCK_PTR->count)
>> {
>> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
>> - ENTRY_BLOCK_PTR->frequency)
>> - < REG_BR_PROB_BASE / 4);
>> + gcov_type scaled_count
>> + = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
>> + gcov_type computed_count;
>> + /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
>> + be large enough to do the division first without losing much
>> + precision. */
>> + if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)
>
> I do not like the undefined behaviour triggered before the check (sure unsigned
> arithmetic would fix that, but it seems all strange). What about guarding the
> whole code.
>
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
> For large counts the roudoff problems in first test should not be problem.
Sure, that sounds reasonable.
>> + {
>> + computed_count = RDIV (ENTRY_BLOCK_PTR->count,
>> + ENTRY_BLOCK_PTR->frequency);
>> + computed_count *= frequency * min_run_ratio;
>> + }
>> + else
>> + computed_count = RDIV (scaled_count, ENTRY_BLOCK_PTR->frequency);
>> + if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)
>
> So at nonoverflow path you compute
>
> ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * min_run_ratio / runs > 0.5
Yes, there is an extra min_run_ratio factor in there that I need to remove!
>
> I think you want
>
> ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs
Right, will change that too.
>
> If entry_bb_frequency is 0, you can just return true iff frequency is > min_run_ratio.
We already return false conservatively if entry_bb_frequency is 0. Do
we really want to change this - I'm not sure it makes sense to compare
the frequency to the run ratio directly.
>
> I think safe way to compute this is
>
> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> scaled_count = ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency;
> else
> scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * min_run_ratio
>
> In first path, we have at most 10000*10000*10000*10000 that still fits in 64bit
> value.
>
> In the second path the base of fixed point arithmetic is large enough so the
> division should not matter. We never get over entry_count * 10000 that is safe.
> The first part of statement however just compute value that should match count
> so count > 10000 unless profile got misupated and the low-count roundoff errors should
> not matter. So perhaps the whole code can execute only if
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
> && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> and we do not need any conditionals for scaled_count
>
> Does such code work with ratio changed to 16? I think the double multiplication in your
> patch may have caused problems.
Let me fix this up and will get back.
Thanks,
Teresa
>
> I will check the second part of patch separately.
> Thanks!
> Honza
--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413