This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition


On Fri, Aug 30, 2013 at 7:50 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Can someone review and ok the attached patch for trunk? It has been
> bootstrapped and tested on x86-64-unknown-linux-gnu, and tested by
> enabling -freorder-blocks-and-partition enabled for a
> profiledbootstrap as well.
>
> (Honza, see more responses inlined below. Rong, please see note below as well).
>
> Thanks,
> Teresa
>
> On Fri, Aug 30, 2013 at 2:14 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Great! Is this the LTO merging you were talking about in an earlier
>>> message, or the gcov runtime fix (that would presumably not be
>>> lto-specific)?
>>
>> It is the LTO path - we need to merge profiles there anyway for his code unification
>> work.
>
> Rong - can you send a summary of the approach you are working on? Is
> it LIPO-specific?

I'm also working to improve COMDAT handling in FDO/LIPO.  It's
applicable to regular FDO.
Our motivation case is different from the case talked here:
  COMDAT function F is defined in module A and is picked by linker as
the out-of-line copy in profile-gen phrase.
It gets all the counters. In profile-use compilation F may not be
emitted in A (like all the callsites are ipa-inlined in A),
We choose instance of F in module B as the out-of-line copy, and it's
not optimized.

We are seeing more of this kind of cases in LIPO due to multiple
COMDAT copies brought by the auxiliary modules.

Since COMDAT function may be inlined after instrumentation, multiple
copies of the counters make be co-exists
We want to differentiate inlined-copy counters or out-of-line-copy counters.
(In LIPO, we actually encourage the inline of COMDAT in IPA-inline to
get more context sensitive counters.)

Our current solution is to have another instrumentation only for
COMDAT functions.
* For each  comdat_key, we create a global var pointing to the
gcov_fn_info of the out-of-line copy.
* This global var is initialized by the instrumentation code placed in
the function entry, to the value of gcov_fn_info of current module.
   This is post IPA-inline instrumentation. So at most one
instrumentation (the one picked by linker) is executed.
* Expand gcov_fn_info to points to the global var.
In the run time, we can differentiate if the counter out-of-line copy
or inlined copy. We set a tag in gcda file accordingly.
(in the case of both out-of-line copy and inlined copy, we treat it as
out-of-line copy).

In the profile-use phrase, we use this info to resolve the decls of
COMDAT functions, and also make sure we only emit
the out-of-line copy of the COMDAT function (even if it's not
referenced in the module).

This has been done and we are testing it now.

I talked to Teresa about the her case some time back. It seems that
merging the counters is an easy change with the above patch because
we have the address of the out-of-line gcov_fn_info. We can do a
simple in-memory merge if the checksum matches. The concern is
that this may reduce the context sensitive information.

-Rong

>
>>
>>> > I have patch to track this.  Moreover vforks seems to produce repeated
>>> > merging of results.
>>>
>>> Aha, does this explain the gimp issue as well?
>>
>> Not really - we still need to debug why we hit cold section so many times with
>> partitioning.  I sitll think easier approach will be to lock the cold section and
>> then start probably with testsuite (i.e. write script to compile the small testcases
>> with FDO + partitioning and see what crash by hitting cold section).
>
> Ok, that is on my todo list.
>
>>
>>> >
>>> > Is it really necessary to run this from internal loop of the cfgcleanup?
>>>
>>> The reason I added it here is that just below there is a call to
>>> verify_flow_info, and that will fail with the new verification.
>>
>> Hmm, OK, I suppose we run the cleanup after partitioning just once or twice, right?
>> We can track this incrementally - I am not sure if we put it from the internal iteration
>> loop we would get anything substantial either.
>> Removing unreachable blocks twice is however ugly.
>
> When I was debugging the issue that led to this change I seemed to see
> 1-2 iterations typically. Although I haven't measured it
> scientifically. It would be good to revisit that and see if we can
> pull both parts out of the loop, but as a separate patch.
>
>>
>>> +/* Ensure that all hot bbs are included in a hot path through the
>>> +   procedure. This is done by calling this function twice, once
>>> +   with WALK_UP true (to look for paths from the entry to hot bbs) and
>>> +   once with WALK_UP false (to look for paths from hot bbs to the exit).
>>> +   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
>>> +   to BBS_IN_HOT_PARTITION.  */
>>> +
>>> +static unsigned int
>>> +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
>>> +                    vec<basic_block> *bbs_in_hot_partition)
>>> +{
>>> +  /* Callers check this.  */
>>> +  gcc_checking_assert (cold_bb_count);
>>> +
>>> +  /* Keep examining hot bbs while we still have some left to check
>>> +     and there are remaining cold bbs.  */
>>> +  vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
>>> +  while (! hot_bbs_to_check.is_empty ()
>>> +         && cold_bb_count)
>>> +    {
>>> +      basic_block bb = hot_bbs_to_check.pop ();
>>> +      vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
>>> +      edge e;
>>> +      edge_iterator ei;
>>> +      int highest_probability = 0;
>>> +      int highest_freq = 0;
>>> +      gcov_type highest_count = 0;
>>> +      bool found = false;
>>> +
>>> +      /* Walk the preds/succs and check if there is at least one already
>>> +         marked hot. Keep track of the most frequent pred/succ so that we
>>> +         can mark it hot if we don't find one.  */
>>> +      FOR_EACH_EDGE (e, ei, edges)
>>> +        {
>>> +          basic_block reach_bb = walk_up ? e->src : e->dest;
>>> +
>>> +          if (e->flags & EDGE_DFS_BACK)
>>> +            continue;
>>> +
>>> +          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
>>> +          {
>>> +            found = true;
>>> +            break;
>>> +          }
>>> +          /* The following loop will look for the hottest edge via
>>> +             the edge count, if it is non-zero, then fallback to the edge
>>> +             frequency and finally the edge probability.  */
>>> +          if (e->count > highest_count)
>>> +            highest_count = e->count;
>>> +          int edge_freq = EDGE_FREQUENCY (e);
>>> +          if (edge_freq > highest_freq)
>>> +            highest_freq = edge_freq;
>>> +          if (e->probability > highest_probability)
>>> +            highest_probability = e->probability;
>>> +        }
>>> +
>>> +      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
>>> +         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
>>> +         then the most frequent pred (or succ) needs to be adjusted.  In the
>>> +         case where multiple preds/succs have the same frequency (e.g. a
>>> +         50-50 branch), then both will be adjusted.  */
>>> +      if (found)
>>> +        continue;
>>> +
>>> +      FOR_EACH_EDGE (e, ei, edges)
>>> +        {
>>> +          if (e->flags & EDGE_DFS_BACK)
>>> +            continue;
>>> +          /* Select the hottest edge using the edge count, if it is non-zero,
>>> +             then fallback to the edge frequency and finally the edge
>>> +             probability.  */
>>> +          if (highest_count)
>>> +            {
>>> +              if (e->count < highest_count)
>>> +                continue;
>>> +            }
>>> +          else if (highest_freq)
>>
>> The frequency condition needs to be done only when you walk predecestors - when
>> you walk down the edge probabilities are just fine.
>
> True. For simplicity I think it should be fine to leave as-is so there
> isn't more special casing as the current approach works in both
> directions.
>
>>
>> The patch seems OK to me now.  I will make our FDO tester to use partitioning so we get
>> this benchmarked a bit.
>
> Ok thanks.
>
>>
>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]