This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Instrumenting for a different profiling algorithm


Hello all,

[Sorry for the excessively long mail. It contains introduction, problem explanation, solution and a set of how-to questions].

== Introduction ==
Because gprof is completely useless in some cases (see below), I had to develop
(1999) a new profiling algorithm. Unfortunately, my implementation requires manual
instrumentation of all "interesting" functions with RAII objects (C++ only).
Aside form the manual instrumentation, the new profiler works perfectly, with
remarkable accuracy and low overhead (a bit faster than gcc -O2 -pg -g).


Instead of doing this manually, I would like to tell GCC instrument the code such that:

  1. Create a static variable per profiled function (could be void*).
  2. Call a profiling-function upon entry (and pass this pointer, and
     __FUNCTION__ - or whatever its name is in c99/c++0x).
  3. Call a conclude-function upon exit from the profiled function.

== Problem description ==

gcc -pg and gprof can't accumulate time of function's children. Instead, gprof
would interpolate based on call count. For example, look at the attached file.
In practice 'fast' takes less than a millisecond, and slow takes around a second,
yet gprof manages to reverse their order (when compiling gcc -pg -g t.c,
and running gprof a.out):


|                0.00    0.00       1/1001        slow [4]
               1.05    0.00    1000/1001        fast [3]
[1]    100.0    1.06    0.00    1001         f [1]
-----------------------------------------------
[2]    100.0    0.00    1.06                 main [2]
               0.00    1.05    1000/1000        fast [3]
               0.00    0.00       1/1           slow [4]
|The only correct  information is on the time that the leaf "f" takes.

Measuring the time manually (see t.c) give (contradicting gprof):

   |Slow took 1.15 seconds
   Sum of fast took 0.00 seconds
   |

(You can tune t.c to make the result as inaccurate as you like)

=== Solution ===

Record approximate time of entry and exit of each function (volatile tick
counter is updated by a SIGPROF handler). Accumulate time for each:

  1. Time for a function (unlike -pg this will be function+sibling time).
  2. Time for a pair of caller-callee functions. (The time that the
     callee  contributes to the caller's total)

In my experience, this scheme works much better than the current -pg+gprof (the latter is totally useless for my needs, I'd better flip a coin to decide which function I guess to be expensive).

To do:
I would like to add the instrumentation (as explained above). To avoid billing the overhead of the profiling on instrumented functions, I need to maintain
volatile unsigned in_profiler;
and avoid counting time when in_profiler!=0. in_profiler is updated as follows:


  1. in_profiler=1 right before parameters are passed to the call of
     start_profile_function(&state, __FUNCTION__); and
     finalize_profile_function(&state);
  2. in_profiler=0 immediately  after *_profile_function() cleans up
     the remains of the call (depending on the ABI, it is possible that
     no such clean up occurs).

== How to: ==

I would like to find how to force gcc (and its optimizers):

  1. Not to move stuff across in_profiler=*  assignments, for all
     optimizers. This was measured to skew profiling by 10-20% on x86
     Linux, and more than x2 on tiny functions on PPC-AIX (adding an
     __asm__ register+memory barrier lowered this to 1-4% on x86, but
     not for PPC-AIX's ABI)
  2. Instrument start_profile_function as early as possible in the
     profiled function (if possible before most of the preamble).
  3. Instrument finalize _profile_function as late as possible in the
     profiled function (if possible after most of the preamble).
  4. Take exceptions into account
  5. Extract the __FUNCTION__ information from inside
     start_profile_function() instead of passing it explicitly (debug
     info?)
        1. If __FUNCTION__ can be extracted from stack then maybe ditch
           in_profiler=* altogether (this way the profiler will be
           billed correctly for its time, without such variables).
        2. Use specialized ABI to call *_profile_function, saving up
           the need for register save/restore for profiler's function
           calls.

It took me about a week to implement it in C++ using STL (using RAII and manual instrumentation). I hope that it would take me no more than two weeks doing this for C. Anything more than will shift my benefit/cost ration to finding a weaker solution.

I would be happy to get any comments or suggestions (especially "don't waste your time, it is not needed because...").

Note that I am not 100% if I am allowed to assign copyright to FSF. I am doing this with my student hat (need it for accurate profiling of my research code), but my IBM researcher hat may (or may not) interfere, and I'll need advice from our legal department.

Thanks
 Michael

--
Michael Veksler
http:///tx.technion.ac.il/~mveksler

#include <stdio.h>
#include <time.h>
volatile long long total_count=0;
void f(unsigned repeat_count)
{
	unsigned i;
	for (i=0; i < repeat_count ; ++i)
		++total_count;
}
void fast()
{
	f(1);
}
void slow()
{
	f(300*1000*1000);
}
int main()
{
	unsigned i=0;
	clock_t start;
	start=clock();
	slow();
	printf("Slow took %.2f seconds \n",
		 (clock()-start)*1.0/CLOCKS_PER_SEC);
	start=clock();
	for (i=0 ; i < 1000 ; ++i)
		fast();
	printf("Sum of fast took %.2f seconds \n",
		 (clock()-start)*1.0/CLOCKS_PER_SEC);
	return 0;
}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]