This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition


Hi,
with Martin we did bit of progress on analyzing the problems.  We now have
COMDAT profile merging for FDO and we also noticed that forks can make your
basic blocks appear never executed even though they are executed every run:
the fork is accounted as 3 independnet runs of the program.  First run
is until fork, the second 2 runs are both variant.

I have patch to track this.  Moreover vforks seems to produce repeated
merging of results.

These two factors seems to help to riddle enought the firefox profiles
so it took us weeks to understand what happens.
> +         if (changed)
> +            {
> +              /* Edge forwarding in particular can cause hot blocks previously
> +                 reached by both hot and cold blocks to become dominated only
> +                 by cold blocks. This will cause the verification
> below to fail,
> +                 and lead to now cold code in the hot section. This is not easy
> +                 to detect and fix during edge forwarding, and in some cases
> +                 is only visible after newly unreachable blocks are deleted,
> +                 which will be done in fixup_partitions.  */
> +              fixup_partitions ();

Is it really necessary to run this from internal loop of the cfgcleanup?  It seems
you will play back and forth game where edge forwarding will remove your fallthru
and you will be re-adding it?

I would wait for cfgcleanup to finish its job (I don't really think it needs to be
iterative) and then do the fixup possibly cleaning up when the blocks was repoisitoined
(I suppose it is only case where the code above introduces new cfgcleanup oppurtunities).

> +      /* Walk the preds/succs and check if there is at least one already
> +         marked hot. Keep track of the most frequent pred/succ so that we
> +         can mark it hot if we don't find one.  */
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +
> +          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
> +          {
> +            found = true;
> +            break;
> +          }
> +          if (e->probability > highest_probability)
> +            highest_probability = e->probability;

When doing predecestor walk if you have two predecestors, one executing 100000
times and getting to the block with probability 1%, you want to chose it over
block executing once and getting to you with probability 100%.

You probably want to look for most likely predecestor here.  You need to look
for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
maximal probability only if all fails?

> +        }
> +
> +      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
> +         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
> +         then the most frequent pred (or succ) needs to be adjusted.  In the
> +         case where multiple preds/succs have the same probability (e.g. a
> +         50-50 branch), then both will be adjusted.  */
> +      if (found)
> +        continue;
> +
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +          if (e->probability < highest_probability)
> +            continue;

Again for predecestor walk you need to wind down the bit crazy logic described above.
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 202021)
> +++ predict.c   (working copy)
> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>    return false;
>  }
> 
> +
> +/* Return true in case edge E is probably never executed.  */
> +
> +bool
> +probably_never_executed_edge_p (struct function *fun, edge e)
> +{
> +  gcc_checking_assert (fun);
> +  if (profile_info && flag_branch_probabilities)
> +    return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
> +  if ((!profile_info || !flag_branch_probabilities)
> +      && (cgraph_get_node (fun->decl)->frequency
> +         == NODE_FREQUENCY_UNLIKELY_EXECUTED))
> +    return true;
> +  return false;
Instead of duplicating the conditional, break out the tests into
probably_never_executed_count_p, like we have for maybe_hot_count_p.

It would be nice to extend this to work w/o profile: probably_never_executed_edge_p
can return true for EH edges, setjmp edges and we can then walk bodies ignoring
EH/setjmp flagging blocks that are probably never executed enabling the partitioning
to do a job w/o profile.

Otherwise the patch look OK to me. Thanks for working on this. Do we have agreement on C++ way of mixing
declarations and code?

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]