This is the mail archive of the
mailing list for the GCC project.
Re: Compute precise counter histogram at LTO
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Teresa Johnson <tejohnson at google dot com>
- Cc: Rong Xu <xur at google dot com>, Jan Hubicka <hubicka at ucw dot cz>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, David Li <davidxl at google dot com>
- Date: Mon, 22 Apr 2013 20:16:14 +0200
- Subject: Re: Compute precise counter histogram at LTO
- References: <20130329181658 dot GB16079 at kam dot mff dot cuni dot cz> <CAAe5K+VAw8J90_KdNwin-y-6YZYfk-y1jzGOGVugwjeoBWsLFA at mail dot gmail dot com> <CAF1bQ=T8G_zhkLRT=FtQcnB-3LPXCPP_QhvJmMaecUKvwgvQsw at mail dot gmail dot com> <CAAe5K+VTo924jtrGE-UxXk1P6DR7Ohi8Nxe+zW8dgDJXnM7vKg at mail dot gmail dot com>
sorry for getting back to this late.
> >> That's a larger error than I had expected from the merging, although
> >> as you note it is an approximation so there is going to be some amount
> >> of error. If the error is that large then maybe there is a better
> >> merging algorithm, since in the non-LTO case we won't be able to
> >> recompute it exactly. For cc1, what was your test case -
> >> profiledbootstrap or something simpler? I can try to reproduce this
> >> and see if it is another bug or just due to the approximation.
> I've been using Rong's tool to compute the exactly merged histogram
> from the gcov-merged histogram for perlbmk. I tried a couple test
> cases - with the 5 train inputs, and with the 3 ref inputs. In both
> cases I am seeing up to 200% or so difference in some of the working
> set min counter values, although the error is not as high for the
> higher working set percentages. But large enough to cause a potential
> performance issue nonetheless.
One thing that confuse me is why the error tends to be in positive direction.
Since we are minimizing the counter during merging, I would expect us to
more or less consistently underestimate the counters. Do you have any
Also if you have setup with your tool, it may be nice to double check that
the histograms produced by the LTO pass actually match the histogram produced
by the Ron's external tool. I am not sure if his tool takes into account the
estimated execution times of basic blocks. If not it may be interesting
experiment by itself, since we will get how well counting counts alone estimate
the times. (I would expect it to be rather good, but it is always better
to sanity check).
> It looks like the histogram min counters are always off in the larger
> direction with the gcov merged histogram, so one possibility is to
> reduce the min counter value by some divisor when the number of runs
> is >1. Unfortunately, at least in the perlbmk case, the magnitude of
> the error doesn't seem to correlate obviously with the # runs (in the
> 5 run case the error is actually a little less than in the 3 run case
> for perlbench). Honza, how many runs were merged in your 10x error
> case above?
It was the standard profiledbootstrap. I am not sure how many runs exactly
are merged, but it will be couple hundred. There is also an issue with fact
that libbackend is linked into more than one binary.
> One thing I noticed in the perlbmk case was that there were a number
> of large counter values sharing histogram buckets at the high end of
> the histogram in some of the individual run profiles, so there would
> be a large loss of precision when merging, since the range of counter
> values sharing a single histogram is larger at the high end of the
> histogram. I'll experiment with increasing the size of the histogram
> to see how much that would reduce the error.
Thanks! I guess increasing histogram to size matching approximately the
number of counters in the binary should more or less eliminate the precision