This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: COMDAT missing profile handling (was Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition)

On Fri, Aug 30, 2013 at 2:31 AM, Jan Hubicka <> wrote:
>> > The esitmated profile is already there before reading the profile in, so we
>> > only do not want to overwrite it.  Does the following work for you?
>> It does get the estimated frequencies on the bbs.
> Good.

I ended up making some slight modifications, see below.

>> > We wil also need to solve problem that in this case cgraph edges will have 0 profile.
>> > We probably want to play the game there and just do the scaling for edge count,
>> > since IPA passes probably do not want to care about partial profiles.
>> The problem I am having is that my new freqs_to_counts routine takes
>> the sum of the incoming edge counts and computes the bb counts by
>> scaling by the estimated bb frequency. But in the case where the
>> caller was also missing profile data the edge count will have 0
>> profile as you note above, so the counts remain 0. I am not quite sure
>> how to handle this. Instead of using the sum of the incoming profile
>> counts, I could just pick a default profile count to use as the entry
>> bb count. The synthesized counts should get scaled down appropriately
>> during inlining.
> I am having problem to think of what default we should pick here.

What I tried was if the sum of the incoming counts was 0 then use the
BB_FREQ_MAX as the fake incoming profile count and apportioning that
to the bbs based on the guessed profile frequencies. Rather than
iterating, I am simply doing this for all 0 count functions. I put an
assert in the inliner when testing to ensure that we never inline a
0-count body. However, I realized after reading your proposed solution
below that this is not good since it will cause all of these (possibly
truly cold) functions to not be placed in the unlikely text section.

As an aside, the patch I am testing is hitting an issue in a
profiledbootstrap that I think is an unrelated bug that seems to be
provoked by my changes:

../../gcc_trunk_7/gcc/config/i386/i386.c:33752:34: error: array
subscript is above array bounds [-Werror=array-bounds]
     if (ipar[i] + 1 != ipar[i + 1])

which seems to be a bogus error (probably related to PR 56456 and
other specific instances of this warning being bogus that have been
filed). Not sure what to do about this. But I am guessing it will not
be triggered once I rework the patch in any case...

> Thinking about the whole problem, I think the following sounds like possible
> sollution:
> 1) use the patch above to get frequencies computed where counts are 0.

Do you mean my original patch, where I am doing this via new function
init_and_estimate_bb_frequencies called from branch_prob()? I assume
so - that will give us frequencies on the branches but no counts yet,
and leave the profile status as READ.

>    Make sure that branch probailities are guessed for functions with non-0
>    counts for each of branch that has count 0.

Not sure about the second sentence above. Do you mean also guess
probabilities within functions that have other non-zero counts in
them. Or do you mean those reached via non-zero cgraph call counts?
The init_and_estimate_bb_frequencies routine will only guess
frequencies for routines that are all-0. I thought that the
probabilities on branches that are 0 would be 0, which seems correct
for functions with other non-zero counts?

> 2) At a point we see inconsistency in the IPA profile (i.e. we get to fact
>    that we have direct calls with non-0 counts to a comdat with 0 count),
>    drop the profile_status for them to profile_guessed and let the backed
>    optimize them as if there was no FDO.
>    I do not think we can do better for those.

I assume you mean sometime later than where we were currently doing
this when reading counts. Maybe during ipa-profile. That way it can be
done iteratively to handle the case where a 0-count comdat is calling
another 0-count comdat where they both are going to be reached via a
non-zero call further up.

>    If we do not see this problem, we can keep status as READ so the function
>    will get size optimized.  Here we will lose for indirect calls, but I
>    do not see how to handle these short of making every 0 count COMDAT guessed.
>    Perhaps you can try how these variants work for you performance/code size
>    wise.

Yes, I think you are right. We will lose a bit on the out-of-line
copy, but still should get this handled properly when inlined due to
3) below.

> 3) at a time we inline function with guessed profile into function with read
>    profile, we will use your trick to feed the counts in based on frequencies.
>    We will do that in tree-inline when inlining. I.e. we will never feed fake
>    counts into non-inline functions

Right, so invoke a new routine like the following, when inlining and
before we invoke the copy_body, etc that scales the counts. Where the
count used can be the current sum of incoming cgraph edge counts
(presumably non-zero at this point since we decided to inline):

+/* Convert estimated frequencies into counts.  */
+freqs_to_counts (struct cgraph_node *node, gcov_type count)
+  basic_block bb;
+  edge_iterator ei;
+  edge e;
+  struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
+  FOR_ALL_BB_FN(bb, fn)
+  {
+    bb->count = apply_scale (count,
+                             GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX));
+    FOR_EACH_EDGE (e, ei, bb->succs)
+      e->count = apply_probability (e->src->count, e->probability);
+  }

> 4) we will want to sanitize callgarph, too.  This means looking for GUESSED
>    profile functions and feeding their counts/edge counts based on sum of
>    the incomming counts.

I assume here the sum of the incoming counts to use for the cgraph
edge is obtained from the bb counts of calls to those functions?

>    We can do this as part of ipa-profile pass (I will move it to separate file
>    as it is getting more complex) Here we will have the propagation issue you
>    mention above.  Either we can ignore it or we can iterate.
> 3) and 4) will absolutely need some capping to be sure we won't make the
> synthetized counts bigger than other counts in program.

How do we end up with a synthesized count that is too big? I.e. for 3
we would use the sum of the incoming cgraph edges.


> I think we have code for 1-3. If the plan looks sane to you, I think we can
> start getting it in?
> Of course we want to eliminate most of the problems by getting runtime to
> do the merging... (but the merging will not be posible always due to
> cfg differences comming from different optimization flags, anyway).
> Honza

Teresa Johnson | Software Engineer | | 408-460-2413

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]