This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)


> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Xinliang David Li <davidxl@google.com> writes:
> >> >
> >> > Process level synchronization problems can happen when two processes
> >> > (running the instrumented binary) exit at the same time. The
> >> > updated/merged counters from one process may be overwritten by another
> >> > process -- this is true for both counter data and summary data.
> >> > Solution 3) does not introduce any new problems.
> >>
> >> You could just use lockf() ?
> >
> > The issue here is holding lock for all the files (that can be many) versus
> > number of locks limits & possibilities for deadlocking (mind that updating
> > may happen in different orders on the same files for different programs built
> > from same objects)
> >
> > For David: there is no thread safety code in mainline for the counters.
> > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was invented)
> > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down
> > as too memory expensive per thread. We could optionaly do atomic updates like ICC
> > or combination of both as discussed in the thread.
> > So far no one implemented it since the coverage fixups seems to work well enough in
> > pracitce for multithreaded programs where reproducibility do not seem to be _that_
> > important.
> >
> > For GCC profiled bootstrap however I would like to see the output binary to be
> > reproducible. We realy ought to update profiles safe for multple processes.
> > Trashing whole process run is worse than doing race in increment. There is good
> > chance that one of runs is more important than others and it will get trashed.
> >
> > I do not think we do have serious update problems in the summaries at the moment.
> > We lock individual files as we update them. The summary is simple enough to be safe.
> > sum_all is summed, max_all is maximum over the individual runs. Even when you combine
> > mutiple programs the summary will end up same. Everything except for max_all is ignored
> > anyway.
> >
> > Solution 2 (i.e. histogram streaming) will also have the property that it is safe
> > WRT multiple programs, just like sum_all.
> 
> I think the sum_all based scaling of the working set entries also has
> this property. What is your opinion on saving the histogram in the

I think the scaling will have at lest roundoff issues WRT different merging orders.

> summary and merging histograms together as best as possible compared
> to the alternative of saving the working set information as now and
> scaling it up by the ratio between the new and old sum_all when
> merging?

So far I like this option best. But David seems to lean towards thirtd option with
whole file locking.  I see it may show to be more extensible in the future.
At the moment I do not understand two things
 1) why do we need info on the number of counter above given threshold, sinc ethe hot/cold
    decisions usually depends purely on the count cutoff.
    Removing those will solve merging issues with variant 2 and then it would be probably
    good solution.
 2) Do we plan to add some features in near future that will anyway require global locking?
    I guess LIPO itself does not count since it streams its data into independent file as you
    mentioned earlier and locking LIPO file is not that hard.
    Does LIPO stream everything into that common file, or does it use combination of gcda files
    and common summary?

    What other stuff Google plans to merge?
    (In general I would be curious about merging plans WRT profile stuff, so we get more
    synchronized and effective on getting patches in. We have about two months to get it done
    in stage1 and it would be nice to get as much as possible. Obviously some of the patches will
    need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think
    this is an important feature).

I also realized today that the common value counters (used by switch, indirect
call and div/mod value profiling) are non-stanble WRT different merging orders
(i.e.  parallel make in train run).  I do not think there is actual solution to
that except for not merging the counter section of this type in libgcov and
merge them in some canonical order at profile feedback time.  Perhaps we just
want to live with this, since the disprepancy here is small. (i.e. these
counters are quite rare and their outcome has just local effect on the final
binary, unlike the global summaries/edge counters).

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> -Andi
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]