[PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

luoxhu luoxhu@linux.ibm.com
Mon Jun 24 09:20:00 GMT 2019



On 2019/6/24 10:34, luoxhu wrote:
> Hi Honza,
> Thanks very much to get so many useful comments from you.
> As a newbie to GCC, not sure whether my questions are described clearly
> enough.  Thanks for your patience in advance.  :)
> 
> 
> On 2019/6/20 21:47, Jan Hubicka wrote:
>> Hi,
>> some comments on the ipa part of the patch
>> (and thanks for working on it - this was on my TODO list for years)
>>
>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>>> index de82316d4b1..0d373a67d1b 100644
>>> --- a/gcc/cgraph.c
>>> +++ b/gcc/cgraph.c
>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>>       fprintf (dump_file, "Introduced new external node "
>>>            "(%s) and turned into root of the clone tree.\n",
>>>            node->dump_name ());
>>> +      node->profile_id = first_clone->profile_id;
>>>       }
>>>     else if (dump_file)
>>>       fprintf (dump_file, "Introduced new external node "
>>
>> This is independent of the rest of changes.  Do you have example where
>> this matters? The inline clones are created in ipa-inline while
>> ipa-profile is run before it, so I can not think of such a scenario.
>> I see you also copy profile_id from function to clone.  I would like to
>> know why you needed that.
>>
>> Also you mention that you hit some ICEs. If fixes are independent of
>> rest of your changes, send them separately.
> 
> I copy the profile_id for cloned node as when in LTO ltrans, there is no
> references or referrings info for the specialized node/cloned node, so it 
> is difficult to track the node's reference in 
> cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.
> Will remove it and split the patches in later version to include ICE fixes.
> 
>>
>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>     int i;
>>>     cgraph_edge *e2;
>>>     cgraph_edge *e = this;
>>> +  cgraph_node *referred_node;
>>>     if (!e->indirect_unknown_callee)
>>>       for (e2 = e->caller->indirect_calls;
>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>       && ((ref->stmt && ref->stmt == e->call_stmt)
>>>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>>         {
>>> -    reference = ref;
>>> -    break;
>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +      {
>>> +        referred_node = dyn_cast<cgraph_node *> (ref->referred);
>>> +        if (strstr (e->callee->name (), referred_node->name ()))
>>> +          {
>>> +        reference = ref;
>>> +        break;
>>> +          }
>>> +      }
>>> +    else
>>> +      {
>>> +        reference = ref;
>>> +        break;
>>> +      }
>>>         }
>>
>> This function is intended to return everything related to the
>> speculative call, so if you add multiple direct targets, i would expect
>> it to tage auto_vec of cgraph_nodes for direct and auto_vec of
>> references.
> 
> So will the signature becomes
> cgraph_edge::speculative_call_info (auto_vec<cgraph_edge *> *direct,
>                                      cgraph_edge *&indirect,
>                                      auto_vec<ipa_ref *> *reference)
> 
> Seems a lot of code related to it, maybe should split to another patch.
> And will the sequence of direct and reference in each auto_vec be strictly
> mapped for iteration convenience?
> Second question is "this" is a direct edge will be pushed to auto_vec 
> "direct", how can it get its next direct edge here?  From e->caller->callees?

There maybe some misunderstanding here.  The direct should be one edge
only, but reference could be multiple.

For example: two indirect edge on one single statement x = p(3);
the first speculative edge is main -> one;
the second speculative edge 2 is main -> two.
direct->call_stmt is: x_10 = p_3 (3);

call code in ipa-inline-transform.c:
for (e = node->callees; e; e = next)
   {
      next = e->next_callee;
      e->redirect_call_stmt_to_callee ();
   }

redirect_call_stmt_to_callee will call
e->speculative_call_info(e, e2, ref).

When e is “main -> one" being redirected, The returned auto_vec reference
length will be 2.
So the map should be 1:N instead of N:N.  (one direct edge will find N 
reference nodes, but only one of it is correct, need iterate to find it
out.)
e2 is the indirect call(e->caller->indirect_calls) can only be set to false
speculative if all indirect targets are redirected by "next=e->next_callee"
Or else, the next speculative edge couldn't finish the redirect as the e2
is not speculative again in next round iteration.
As a result, maybe still need similar logic to check the returned reference
length, only set "e2->speculative = false;" when the length is 1.  which
means all direct targets are redirected.

> 
> 
>>>     /* Speculative edge always consist of all three components - direct 
>>> edge,
>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>>            in the functions inlined through it.  */
>>>       }
>>>     edge->count += e2->count;
>>> -  edge->speculative = false;
>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>>> +    {
>>> +      edge->indirect_info->num_of_ics--;
>>> +      if (edge->indirect_info->num_of_ics == 0)
>>> +    edge->speculative = false;
>>> +    }
>>> +  else
>>> +    edge->speculative = false;
>>>     e2->speculative = false;
>>>     ref->remove_reference ();
>>>     if (e2->indirect_unknown_callee || e2->inline_failed)
>>
>> This function should turn speculative call into direct call to DECL, so
>> I think it should remove all the other direct calls associated with stmt
>> and the indirect one.
>>
>> There are now two cases - in first case you want to turn speculative
>> call into direct call or give up on especulation completely, while in
>> other case you want to only remove one of speculations.
>>
>> I guess we want to have resolve_speculation(decl) for first and
>> remove_one_speculation(edge) for the second case?
>> The second case would be useful for the code below handling type
>> mismatches and also for inline when one of speculative targets seems not
>> useful to bother with.
> 
> So the logic will be:
> 
> if (edge->indirect_info->num_of_ics > 1)
>      cgraph_edge::resolve_speculation (tree callee_decl);
> else
>      remove_one_speculation(edge);
> 
> cgraph_edge::resolve_speculation will call edge->speculative_call_info (e2, 
> edge, ref) internally, at this time, e2 and ref will only contains one 
> direct target?
> 
> 
>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>>                                false);
>>>         e->count = gimple_bb (e->call_stmt)->count;
>>> -      e2->speculative = false;
>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +        {
>>> +          e2->indirect_info->num_of_ics--;
>>> +          if (e2->indirect_info->num_of_ics == 0)
>>> +        e2->speculative = false;
>>> +        }
>>> +      else
>>> +        e2->speculative = false;
>>>         e2->count = gimple_bb (e2->call_stmt)->count;
>>>         ref->speculative = false;
>>>         ref->stmt = NULL;
>>
>>>   extern void debuginfo_early_init (void);
>>>   extern void debuginfo_init (void);
>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>>     int param_index;
>>>     /* ECF flags determined from the caller.  */
>>>     int ecf_flags;
>>> -  /* Profile_id of common target obtrained from profile.  */
>>> +  /* Profile_id of common target obtained from profile.  */
>>>     int common_target_id;
>>>     /* Probability that call will land in function with 
>>> COMMON_TARGET_ID.  */
>>>     int common_target_probability;
>>> +  /* Profile_id of common target obtained from profile.  */
>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  /* Probabilities that call will land in function with 
>>> COMMON_TARGET_IDS.  */
>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>
>> I would use vec of pairs (profile_id,probability) to hold this and do
>> not wire in GCOV_ICALL_TOPN_NCOUTS.  Most of time this vec will be just
>> NULL pointer so it will result in less memory overhead and will avoid
>> hard limit on number of speculations we want to do.
>>
>> Note that the speculative edges may end up being redirected during IPA
>> optimization, for example when their target is cloned for particular
>> call context or when the function is detected identical to other
>> function.  So one can not preserve the mapping between targets and
>> profile ids.
>>
>> Also this infrastructure is useful even w/o profile because we could use
>> ipa-devirt to devirtualize even when multiple polymorphic targets are
>> found. So I would not wirte in the limit GCOV_ICALL_TOPN_NCOUNTS and
>> just use dynamically allocated vectors instead.
>>
>> With ipa-devirt it is possible that we know that there are precisely 2
>> possible polymorphic targets.  Other case i was considering was to
>> speculatively inline with -fPIC.  I.e. when one has interposiable call
>> for foo() one can create foo.localalias() for the definition visible to
>> compiler and then speculate
>>
>> if (foo == foo.localalias())
>>    inline_path
>> else
>>    foo();
>>
>> In these cases we may end up with indirect
>> call that has no associated indirect edge (which we do not support
>> currently), so it may be interesting to move speculative call info away
>> from indirect call info to cgraph edge structure (but that can be done
>> incrementally based on what you do now - this code is not completely
>> easy so lets do it step by step).
> 
> +
> +struct GTY (()) indirect_target_info
> +{
> +  /* Profile_id of common target obtained from profile.  */
> +  int common_target_id;
> +  /* Probability that call will land in function with COMMON_TARGET_ID.  */
> +  int common_target_probability;
> +};
> +
> 
> Tested with "vec<indirect_target_info, va_gc> *indirect_call_targets;"
> works quite good, really much better.
> 
> Is it correct that profile_id/common_target_id will only be used to 
> generate speculative edges in ipa-profile, all new specialized node/cloned
> node will take(not copy) the speculative property from original node, so
> was the reason that no need to clone profile id?
> 
> 
>>
>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>>                 gimple_remove_histogram_value (DECL_STRUCT_FUNCTION 
>>> (node->decl),
>>>                                 stmt, h);
>>>               }
>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +            {
>>> +              unsigned j;
>>> +              struct cgraph_edge *e = node->get_edge (stmt);
>>> +              if (e && !e->indirect_unknown_callee)
>>> +            continue;
>>
>> I suppose you are going to change this for Martin's implementation, so i
>> have skipped it for now.
>>> @@ -558,7 +601,8 @@ ipa_profile (void)
>>>       {
>>>         if (n->count.initialized_p ())
>>>           nindirect++;
>>> -      if (e->indirect_info->common_target_id)
>>> +      if (e->indirect_info->common_target_id
>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>>           {
>>>             if (!node_map_initialized)
>>>               init_node_map (false);
>>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>>                 if (dump_file)
>>>               fprintf (dump_file,
>>>                    "Not speculating: "
>>> -                 "parameter count mistmatch\n");
>>> +                 "parameter count mismatch\n");
>>>               }
>>>             else if (e->indirect_info->polymorphic
>>>                  && !opt_for_fn (n->decl, flag_devirtualize)
>>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>>             nunknown++;
>>>           }
>>>           }
>>> -     }
>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>>> +        {
>>> +          if (in_lto_p)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (dump_file,
>>> +                   "Updating hotness threshold in LTO mode.\n");
>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>>> +                   (int64_t) threshold);
>>> +            }
>>> +          set_hot_bb_threshold (threshold
>>> +                    / e->indirect_info->num_of_ics);
>>> +        }
>>
>> Why do you need different paths for one target and for multiple targets?
>> Also why LTO is different here?
> Will remove the logic and the above change based on Martin'
> implementation.
> 
>>> +          if (!node_map_initialized)
>>> +        init_node_map (false);
>>> +          node_map_initialized = true;
>>> +          ncommon++;
>>> +          unsigned speculative = 0;
>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          n2 = find_func_by_profile_id (
>>> +            e->indirect_info->common_target_ids[i]);
>>> +          if (n2)
>>> +            {
>>> +              if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +                dump_file,
>>> +                "Indirect call -> direct call from"
>>> +                " other module %s => %s, prob %3.2f\n",
>>> +                n->dump_name (), n2->dump_name (),
>>> +                e->indirect_info->common_target_probabilities[i]
>>> +                  / (float) REG_BR_PROB_BASE);
>>> +            }
>>> +              if (e->indirect_info->common_target_probabilities[i]
>>> +              < REG_BR_PROB_BASE / 2)
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (
>>> +                  dump_file,
>>> +                  "Not speculating: probability is too low.\n");
>>> +            }
>>> +              else if (!e->maybe_hot_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: call is cold.\n");
>>> +            }
>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>>> +                   && n2->can_be_discarded_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: target is overwritable "
>>> +                     "and can be discarded.\n");
>>> +            }
>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum
>>> +                   && (!vec_safe_is_empty (
>>> +                 IPA_NODE_REF (n2)->descriptors))
>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                    != ipa_get_cs_argument_count (
>>> +                      IPA_EDGE_REF (e))
>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                     >= ipa_get_cs_argument_count (
>>> +                       IPA_EDGE_REF (e))
>>> +                   || !stdarg_p (TREE_TYPE (n2->decl))))
>>> +            {
>>> +              nmismatch++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file, "Not speculating: "
>>> +                        "parameter count mismatch\n");
>>> +            }
>>> +              else if (e->indirect_info->polymorphic
>>> +                   && !opt_for_fn (n->decl, flag_devirtualize)
>>> +                   && !possible_polymorphic_call_target_p (e, n2))
>>> +            {
>>> +              nimpossible++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: "
>>> +                     "function is not in the polymorphic "
>>> +                     "call target list\n");
>>> +            }
>>> +              else
>>> +            {
>>> +              /* Target may be overwritable, but profile says that
>>> +                 control flow goes to this particular implementation
>>> +                 of N2.  Speculate on the local alias to allow
>>> +                 inlining.
>>> +                 */
>>> +              if (!n2->can_be_discarded_p ())
>>> +                {
>>> +                  cgraph_node *alias;
>>> +                  alias = dyn_cast<cgraph_node *> (
>>> +                n2->noninterposable_alias ());
>>> +                  if (alias)
>>> +                n2 = alias;
>>> +                }
>>> +              nconverted++;
>>> +              e->make_speculative (
>>> +                n2, e->count.apply_probability (
>>> +                  e->indirect_info
>>> +                    ->common_target_probabilities[i]));
>>> +              update = true;
>>> +              speculative++;
>>> +            }
>>> +            }
>>> +          else
>>> +            {
>>> +              if (dump_file)
>>> +            fprintf (dump_file,
>>> +                 "Function with profile-id %i not found.\n",
>>> +                 e->indirect_info->common_target_ids[i]);
>>> +              nunknown++;
>>> +            }
>>> +        }
>>> +          if (speculative < e->indirect_info->num_of_ics)
>>> +        e->indirect_info->num_of_ics = speculative;
>>> +        }
>>> +    }
>>>          if (update)
>>>        ipa_update_overall_fn_summary (n);
>>>        }
>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>>> index 79b250c3943..30347691029 100644
>>> --- a/gcc/ipa-utils.c
>>> +++ b/gcc/ipa-utils.c
>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>>         update_max_bb_count ();
>>>         compute_function_frequency ();
>>>         pop_cfun ();
>>> +      /* When src is speculative, clone the referrings.  */
>>> +      if (src->indirect_call_target)
>>> +    for (e = src->callers; e; e = e->next_caller)
>>> +      if (e->callee == src && e->speculative)
>>> +        dst->clone_referring (src);
>>
>> This looks wrong. Why do you need to copy all references from src
>> to target?
> Clonning the referrings from callee to the merged callee, not references.
> the information from "caller to src" is not passed to "caller to dst" when
> merge profiles, it is a workaround when fixing a SPEC ICE.  Will root
> cause it later.
> 
> 
>>> +                /* Speculative calls consist of two edges - direct
>>> +                   and indirect.  Duplicate the whole thing and
>>> +                   distribute frequencies accordingly.  */
>>> +                if (edge->speculative)
>> I think here you want to handle the whole group with multiple targets.
> OK.  Also an ICE fix in SPEC.
> 
> 
> Thanks
> Xionghu
> 
>>> +                  {
>>> +                struct ipa_ref *ref;
>>> +
>>> +                gcc_assert (!edge->indirect_unknown_callee);
>>> +                old_edge->speculative_call_info (direct,
>>> +                                 indirect, ref);
>>> +
>>> +                profile_count indir_cnt = indirect->count;
>>> +                indirect
>>> +                  = indirect->clone (id->dst_node, call_stmt,
>>> +                             gimple_uid (stmt), num,
>>> +                             den, true);
>>> +
>>> +                profile_probability prob
>>> +                  = indir_cnt.probability_in (old_cnt
>>> +                                  + indir_cnt);
>>> +                indirect->count
>>> +                  = copy_basic_block->count.apply_probability (
>>> +                    prob);
>>> +                edge->count
>>> +                  = copy_basic_block->count - indirect->count;
>>> +                id->dst_node->clone_reference (ref, stmt);
>>> +                  }
>>> +                else
>>> +                  edge->count = copy_basic_block->count;
>>> +              }
>>> +            /* If the indirect call contains more than one indirect
>>> +               targets, need clone all speculative edges here.  */
>>> +            if (old_edge && old_edge->next_callee
>>> +                && old_edge->speculative && indirect
>>> +                && indirect->indirect_info
>>> +                && indirect->indirect_info->num_of_ics > 1)
>>> +              {
>>> +                edge = old_edge->next_callee;
>>> +                old_edge = old_edge->next_callee;
>>> +                if (edge->speculative)
>>> +                  next_speculative = true;
>>> +              }
>>> +              }
>>> +            while (next_speculative);
>>> +          }
>>>             break;
>>>           case CB_CGE_MOVE_CLONES:
> 



More information about the Gcc-patches mailing list