Mainline merge part 14 - profiler improvements

Wed May 8 03:26:00 GMT 2002

> On Tue, May 07, 2002 at 10:27:23PM +0200, Zdenek Dvorak wrote:
> > Where? I only call them from libgcc2.c, which IMHO should be OK.
> 
> I was mistaken about the libcalls.
> 
> > I don't think putting additional instructions, one of them jump, to
> > every edge is a good idea concerning the performance.
> 
> It's not every edge.  Just the minimal spanning tree.
> 
> And what about the extra function call you wind up adding at
> the beginning of every function?  You're suggesting that that
> won't have a performance impact?
> 
> > I believe Honza defined it, but it is (number of edges - number of blocks) *
> > sizeof(counter). In my implementation it must be multiplied by number of
> > running threads.
> 
> No, I wanted a number like 10K per thread.  Or whatever.
> 
> What's the actual measured overhead of this on some real-life
> thread-using application?  Say, mozilla.

I failed to get everything set up to compile mozilla, but GCC sum of all .da
files produced is 1.4MB (for 4.5MB text section).

This is quite large, but on the other hand, overhead of thread unsafe profiling
is about 20% for GCC, using the lock prefix on Pentium was about 120 cycles, I
would expect it to increase on newer CPUs, so it can increase the overhead to
about 300-400%

Perhaps we can just allow the both and let user choose the more convenient
method.  Perhaps we find some better choice, but I am not aware of any.

Honza