[Google] Refine hot caller heuristic
Easwaran Raman
eraman@google.com
Thu Aug 29 18:41:00 GMT 2013
On Wed, Aug 21, 2013 at 6:47 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means
>> + it is always applied, and 2 means it is applied only if the footprint is
>> + smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD. */
>> DEFPARAM (PARAM_INLINE_HOT_CALLER,
>> "inline-hot-caller",
>> "Consider cold callsites for inlining if caller contains hot code",
>> + 2, 0, 2)
>> +
>> +/* The maximum code size estimate under which hot caller heuristic is
>> + applied. */
>> +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD,
>> + "hot-caller-codesize-threshold",
>> + "Maximum profile-based code size footprint estimate for "
>> + "hot caller heuristic ",
>> + 10000, 0, 0)
>
> Out of curiousity, how sensitive is performance to the value of this
> parameter? I.e. is there a clear cutoff for the codes that benefit
> from disabling this inlining vs those that benefit from enabling it?
>
> Also, have you tried spec2006? I remember that the codesize of the gcc
> benchmark was above the larger 15000 threshold I use for tuning down
> unrolling/peeling, and I needed to refine my heuristics to identify
> profitable loops to unroll/peel even in the case of large codesize.
> I'm not sure if there are more benchmarks that will be above the
> smaller 10K threshold.
The cutoff works well for internal benchmarks, but there is a wide
range to set the value. I ran spec2006 and gcc benchmark does well.
However, there are two c++ benchmarks (which fall below the cutoff)
that degrades if I exclude PASS_THRU. I'll investigate that later.
>
>> +
>> +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE,
>> + "inline-useful-cold-callee",
>> + "Consider cold callsites for inlining if caller contains hot code",
>> 1, 0, 1)
>
> The description of this param is wrong (it is the same as the
> description of PARAM_INLINE_HOT_CALLER). It should probably be
> something like
> "Only consider cold callsites for inlining if analysis finds
> optimization opportunities"
Thanks. Fixed.
>
>>
>> /* Limit of iterations of early inliner. This basically bounds number of
>> Index: gcc/ipa-inline.c
>> ===================================================================
>> --- gcc/ipa-inline.c (revision 201768)
>> +++ gcc/ipa-inline.c (working copy)
>> @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e)
>> return false;
>> }
>>
>> +/* Returns true if callee of edge E is considered useful to inline
>> + even if it is cold. A callee is considered useful if there is at
>> + least one argument of pointer type that is not a pass-through. */
>
> Can you expand this comment a bit to add why such arguments indicate
> useful inlining?
Added comments.
- Easwaran
>
> Thanks,
> Teresa
>
>> +
>> +static inline bool
>> +useful_cold_callee (struct cgraph_edge *e)
>> +{
>> + gimple call = e->call_stmt;
>> + int n, arg_num = gimple_call_num_args (call);
>> + struct ipa_edge_args *args = IPA_EDGE_REF (e);
>> +
>> + for (n = 0; n < arg_num; n++)
>> + {
>> + tree arg = gimple_call_arg (call, n);
>> + if (POINTER_TYPE_P (TREE_TYPE (arg)))
>> + {
>> + struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n);
>> + if (jfunc->type != IPA_JF_PASS_THROUGH)
>> + return true;
>> + }
>> + }
>> + return false;
>> +}
>> +
>> +/* Returns true if hot caller heuristic should be used. */
>> +
>> +static inline bool
>> +enable_hot_caller_heuristic (void)
>> +{
>> +
>> + gcov_working_set_t *ws = NULL;
>> + int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD);
>> + int num_counters = 0;
>> + int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER);
>> +
>> + if (param_inline_hot_caller == 0)
>> + return false;
>> + else if (param_inline_hot_caller == 1)
>> + return true;
>> +
>> + ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
>> + if (!ws)
>> + return false;
>> + num_counters = ws->num_counters;
>> + return num_counters <= size_threshold;
>> +
>> +}
>> /* Returns true if an edge or its caller are hot enough to
>> be considered for inlining. */
>>
>> static bool
>> edge_hot_enough_p (struct cgraph_edge *edge)
>> {
>> + static bool use_hot_caller_heuristic = enable_hot_caller_heuristic ();
>> if (cgraph_maybe_hot_edge_p (edge))
>> return true;
>>
>> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>> if (flag_auto_profile && edge->callee->count == 0
>> && edge->callee->max_bb_count > 0)
>> return false;
>> - if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER)
>> - && maybe_hot_count_p (NULL, edge->caller->max_bb_count))
>> - return true;
>> + if (use_hot_caller_heuristic)
>> + {
>> + struct cgraph_node *where = edge->caller;
>> + if (maybe_hot_count_p (NULL, where->max_bb_count))
>> + {
>> + if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE))
>> + return useful_cold_callee (edge);
>> + else
>> + return true;
>> + }
>> + }
>> return false;
>> }
>
> On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman <eraman@google.com> wrote:
>> The current hot caller heuristic simply promotes edges whose caller is
>> hot. This patch does the following:
>> * Turn it off for applications with large footprint since the size
>> increase hurts them
>> * Be more selective by considering arguments to callee when the
>> heuristic is enabled.
>>
>> This performs well on internal benchmarks. Ok for google/4_8 branch if
>> all tests pass?
>>
>> - Easwaran
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
More information about the Gcc-patches
mailing list