This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] profile feedback: -fprofile-use= and -fprofile-correction, correctness fixes and option semantic changes.

On Sun, Mar 30, 2008 at 12:05 AM, Jan Hubicka <> wrote:
> > >
>  > >  It was me.  I need to submit it again for trunk.  IIRC it did slow down
>  > >  the profiled runs somewhat, but not extremely.
>  >
>  > This depends on the application and the system you run it on.
>  > i.e. if you have many threads (16+) and correspondingly high number of
>  > cores/chips,
>  > the slowdown becomes significant. I don't remember the exact numbers,
>  > but IIRC, the slowdown was in the order of 10 on such a system.
>  > This is no surprise,
>  > since counters for hot regions in a function
>  > are clustered together, occupying the same cache line,
>  > and any of the counter updates to the same cache line
>  > will be essentially serialized across the entire system,
>  > making the slowdown proportional to the thread/core count.
>  > On systems with a single chip and a single L2,
>  > the slowdown is naturally not that significant,
>  > since the coherence traffic will be all on chip and the cache line ping-ponging
>  > will be bound to the single chip.
>  > But as you add more chips, the slowdown rapidly becomes significant
>  > as the average cost of grabbing the cache line from the other core
>  > increases significantly, and also the effect of serialization on the
>  > overall throughput
>  > becomes proportionally bigger.
>  You are right that the costs of locking are going only to increase
>  making cost of the locking variant more noticable.  I am leaning towards
>  to simply have both solutions in compiler, perhaps with the locking
>  variant being enabled by default.
>  From maintainibility POV it is very good to have safe way for compiler
>  to realize that the profile is messed up.  This still happens quite
>  often and it is important that the problems are noticed and reported.
>  I also think that the diagnostics instead of reading nonsential profile
>  is going to avoid users from doing simple mistake that will misguide GCC
>  to wrong optimizations and disapoint user as a result.  Also ICC and
>  other compiler use this solution I believe.
>  However I am happy to have the "error tolerant" variant as an
>  alternative when profiling code performance of threaded program is
>  important.

We alread have -Wcoverage-mismatch which will instead of aborting on
old profile data with new source just ignore those cases it cannot match.
So this new code would probably fit in the same place.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]