This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] profile feedback: -fprofile-use= and -fprofile-correction, correctness fixes and option semantic changes.


On Thu, Mar 27, 2008 at 8:04 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>  On Thu, 27 Mar 2008, Jan Hubicka wrote:
>  > > missed counter updates in multithreaded profile collection.
>  > > Using atomic updates for counter increment adds significant overhead,
>  > > especially as the core count grows.
>  >
>  > I will need to play with this more tomorrow.  I would like to have the
>  > atomic operation being the default solution for coverage (I tought
>  > Zdenek had a patch for this, but it is not in mainline, I will need to
>  > dig out the history here).
>
>  It was me.  I need to submit it again for trunk.  IIRC it did slow down
>  the profiled runs somewhat, but not extremely.

This depends on the application and the system you run it on.
i.e. if you have many threads (16+) and correspondingly high number of
cores/chips,
the slowdown becomes significant. I don't remember the exact numbers,
but IIRC, the slowdown was in the order of 10 on such a system.
This is no surprise,
since counters for hot regions in a function
are clustered together, occupying the same cache line,
and any of the counter updates to the same cache line
will be essentially serialized across the entire system,
making the slowdown proportional to the thread/core count.
On systems with a single chip and a single L2,
the slowdown is naturally not that significant,
since the coherence traffic will be all on chip and the cache line ping-ponging
will be bound to the single chip.
But as you add more chips, the slowdown rapidly becomes significant
as the average cost of grabbing the cache line from the other core
increases significantly, and also the effect of serialization on the
overall throughput
becomes proportionally bigger.

Seongbae


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]