This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [GOOGLE] Fix AutoFDO size issue
- From: Dehao Chen <dehao at google dot com>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 13 Nov 2014 15:42:16 -0800
- Subject: Re: [GOOGLE] Fix AutoFDO size issue
- Authentication-results: sourceware.org; auth=none
- References: <CAO2gOZV8=MEHb_Vz6+ZNrUGe0dsyfG3F1hYd+mr2gyFHSCeG1Q at mail dot gmail dot com> <CAAkRFZL-L39L2mnapunhw9_CTSMPOhZyikfsnziWSQXmWv23eg at mail dot gmail dot com> <CAAkRFZJ01Cf2nBKVXdHKLWxTUpdRkO7fDdDSCgbiRTEzTMqr-w at mail dot gmail dot com> <CAO2gOZWy9SstnK=KQkX3T6ukXTTUtd84dT4ZaRt+7jfGACj0jA at mail dot gmail dot com> <CAAkRFZ+MkwgzXMiCG7A8_gMpi7NbCq7tEiWtcqp-6wWqWYn4og at mail dot gmail dot com>
We do not do sophisticated recursive call detection in einline phase.
It only happens in ipa-inline phase.
Dehao
On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>> should not harm to set max iter as 10.
>>
>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> After inline summary is recomputed, the large code growth problem will
>>> also be better controlled, right?
>>
>> For this case, recomputing inline summary does not help because the
>> code was bloated in first einline phase.
>
> For recursive inlining, the inline summary for the cloned edges need
> to be updated to prevent the growth?
>
> david
>
>>
>> Dehao
>>
>>>
>>> David
>>>
>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>
>>>> David
>>>>
>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>> extensive code bloat if we have recursive calls like:
>>>>>
>>>>> dtor() {
>>>>> destroy(node);
>>>>> }
>>>>>
>>>>> destroy(node) {
>>>>> destroy(left)
>>>>> destroy(right)
>>>>> }
>>>>>
>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>
>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>> all inline parameters.
>>>>>
>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>> for google-4_9?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> Index: gcc/auto-profile.c
>>>>> ===================================================================
>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>> +++ gcc/auto-profile.c (working copy)
>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>> free_dominance_info (CDI_DOMINATORS);
>>>>> free_dominance_info (CDI_POST_DOMINATORS);
>>>>> rebuild_cgraph_edges ();
>>>>> + compute_inline_parameters (cgraph_get_node
>>>>> (current_function_decl), true);
>>>>> pop_cfun ();
>>>>> }
>>>>>
>>>>> Index: gcc/opts.c
>>>>> ===================================================================
>>>>> --- gcc/opts.c (revision 217523)
>>>>> +++ gcc/opts.c (working copy)
>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>> maybe_set_param_value (
>>>>> PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>> opts->x_param_values, opts_set->x_param_values);
>>>>> + maybe_set_param_value (
>>>>> + PARAM_EARLY_INLINING_INSNS, 4,
>>>>> + opts->x_param_values, opts_set->x_param_values);
>>>>> + maybe_set_param_value (
>>>>> + PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>> + opts->x_param_values, opts_set->x_param_values);
>>>>> value = true;
>>>>> /* No break here - do -fauto-profile processing. */
>>>>> case OPT_fauto_profile: