This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)


> On Tue, Aug 21, 2012 at 6:56 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> > I can go ahead with the histogram approach. There is some roundoff
> >> > error from the working set scaling approach that can affect different
> >> > merging orders as you note, although I think this only really affects the
> >> > small counter values. The other place where saving/merging the histogram
> >
> > Do you have any intuition on why simple maximalization merging (that is safe wrt
> > ordering) would be bad idea?
> 
> When you say "maximalization merging" are you talking about the
> histogram merging approach I mentioned a few emails back (my response
> on Aug 19) where we assume the same relative order of hotness in the
> counters between runs, and accumulate the counter values in the
> histogram in that order?

I was speaking of BB (counter) counts only here and considering the simplest
strategy: maximalize the BB counts in corresponding buckets and sum the cummlative
values in the corresponding buckets. 

The strategy of scaling is more sensible, but it will also be sensitive to order
of train runs, i.e. it will give unstable results with parallel make.
> > OK, so I guess we went through
> >  1) two pass updating with race in between pases.
> >  2) two pass updating with first pass updating countes and second having race only for summary update.
> >     (i.e. no races for counters)
> >  3) two pass updating with flocking (and some way to handle detected deadlocks)
> >  4) one pass updating with histogram merging + maximalization of working set.
> >     (we do not realy need to scale the buckets, we can simply merge the histograms
> >      and then mutiply them by nruns before comparing to actual counters.
> 
> By merging the histograms (and accumulating the counter values stored
> there as we merge), I don't think we need to multiply the counter
> values by nruns, do we?

With the simple approach we need, but...
> 
> >      This assumes that working sets of all runs are about the same, but should work
> >      resonably in practice I think.
> >
> > I guess 3/4 are acceptable WRT bootstrap reproducibility.
> >
> > I have no experience with flocking large number of files and portability of
> > this solution i.e.  to Windows.  If you think that 2) would be too inaccurate
> > in practice and 3) has chance to be portable, we could go for this.  It will
> > solve the precision problems and will also work for LIPO summaries.
> > I would be curious about effect on profiledbootstrap time of this if you implement
> > it.
> 
> I'm hoping that 2) will be accurate enough in practice, but it will
> need some investigation.

Perhaps weighting all pros/cons you are right here.  I think getting results
consistent across different order of runs is more important than the
possibility of race on the last update and 4) is probably getting quite a lot
of GIGO issues.

We can implement more consistent locking mechanizm incrementally if this turns
out to be issue.

It is posible to actually detect the race: at first round we are updating the
nruns. In the second round we can see if nruns has changed.  If so, we have
problem since someone has already updated the file and will update the
histograms.

I would suggest skiping update when nruns has changes, so the summary at least
match the counters in current file accurately (even though it will be off for
the whole program since whoever updated the file may have not seen all our
updates, just part of them).  Just to reduce changes that the trashed run is
the most important one. Theoretically we could also make libgcov to re-read all
counters in this case and try another update, but I would not worry about that
for the moment.

We will still have problems with bootstrap: the summary of libbackend.a will
depend on what of compiler binaries has executed last, since cc1 will have
different summary from cc1plus becuase frontend is different.  I would give
extra points to voluteer who changes libgcov to do multiple summaries for
different programs as intended originally (if you search archives in 2001-2003,
somewhere are patches that changes libgcov from not merging at all to merging
always. I guess changing it to merge just when program checksum match is not
that hard).

Thanks,
Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> David
> >>
> >>
> >>
> >> >
> >> > Thanks,
> >> > Teresa
> >> >
> >> > >
> >> >> > >>
> >> >> > >>
> >> >> > >> >  2) Do we plan to add some features in near future that will
> >> >> anyway require global locking?
> >> >> > >> >     I guess LIPO itself does not count since it streams its data
> >> >> into independent file as you
> >> >> > >> >     mentioned earlier and locking LIPO file is not that hard.
> >> >> > >> >     Does LIPO stream everything into that common file, or does it
> >> >> use combination of gcda files
> >> >> > >> >     and common summary?
> >> >> > >>
> >> >> > >> Actually, LIPO module grouping information are stored in gcda files.
> >> >> > >> It is also stored in a separate .imports file (one per object) ---
> >> >> > >> this is primarily used by our build system for dependence
> >> >> information.
> >> >> > >
> >> >> > > I see, getting LIPO safe WRT parallel updates will be fun. How does
> >> >> LIPO behave
> >> >> > > on GCC bootstrap?
> >> >> >
> >> >> > We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
> >> >> > main problem for application build -- the link time (for debug build)
> >> >> > is.
> >> >>
> >> >> I was primarily curious how the LIPOs runtime analysis fare in the
> >> >> situation where
> >> >> you do very many small train runs on rather large app (sure GCC is small
> >> >> to google's
> >> >> use case ;).
> >> >> >
> >> >> > > (i.e. it does a lot more work in the libgcov module per each
> >> >> > > invocation, so I am curious if it is practically useful at all).
> >> >> > >
> >> >> > > With LTO based solution a lot can be probably pushed at link time?
> >> >> Before
> >> >> > > actual GCC starts from the linker plugin, LIPO module can read gcov
> >> >> CFGs from
> >> >> > > gcda files and do all the merging/updating/CFG constructions that is
> >> >> currently
> >> >> > > performed at runtime, right?
> >> >> >
> >> >> > The dynamic cgraph build and analysis is still done at runtime.
> >> >> > However, with the new implementation, FE is no longer involved. Gcc
> >> >> > driver is modified to understand module grouping, and lto is used to
> >> >> > merge the streamed output from aux modules.
> >> >>
> >> >> I see. Are there any fundamental reasons why it can not be done at
> >> >> link-time
> >> >> when all gcda files are available? Why the grouping is not done inside
> >> >> linker
> >> >> plugin?
> >> >>
> >> >> Honza
> >> >> >
> >> >> >
> >> >> > David
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Teresa Johnson | Software Engineer |  tejohnson@google.com |  408-460-2413
> >> >
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]