profiling wierdness

Mike Stump
Fri Sep 25 19:05:00 GMT 1998

> Date: Wed, 23 Sep 1998 15:54:24 -0600
> From: Dave Steffen <>

This is benchmarking 101 stuff...  Not appropriate for egcs,
and I probably shouldn't respond...

> 	I've got some numerical code I'm trying (desperately) to speed
> up. I'm compiling with 

> 	c++ -ansi -pg ....

> and (right now) am using no optimization, so I can tell what
> improvements I'm getting because of improving the algorithm.

Benchmarking and trying to speed up -O0 programs I don't think is as
useful or meaningful as speeding up -O3 programs.  I'd recompile.

> 	So I run my program:
> helicon: kubo -XYHxyh -e ".1" -O "-5 5 .1" -P .66
> 	And I profile:
> helicon: gprof kubo | c++filt >
> 	And I get
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  us/call  us/call  name    
>  56.86      0.29     0.29 19000000     0.02     0.02  double 
> dot_product<Sparse_
> vector_STLvec<double>, double>(Sparse_vector_STLvec<double> const &, double 
> cons
> t *)
>  15.69      0.37     0.08 19001940     0.00     0.00  vector<ai<int, double>, 

You cannot meaningfully time routines this way when they run for too
little time.  It appears this is too little to measure.  Try numbers
for 100 seconds of run, then 50, then 25, then 8, then 4, 2, 1...  The
point where your numbers start to diverge and become unpredictable is
the point that you can't measure past with this method.

If you can't make your program consume more time, it isn't meaningful
to speed up the program.  :-)

> 	And in case you're wondering, 'time'ing the run gives results
> like "0.11user 47.19system 0:47.30elapsed 99%CPU", and these numbers
> are very consistent. Also, the code executes correctly and generates
> identical output for all the above runs.

That means that measuring the time it takes that way _is_ valid and it
is long enough.

> 	SO: I'm very confused. Does anyone know what's going on? Is
> there any way for me to get reasonably accurate profiling
> information?

You can either learn to lengthen runs or learn to use different
techniques.  Personally, I like to use tick counters on the chips to
measure runtimes, from within the software.  If you do this, then
you'll find that you can tell the difference between running 3 machine
instructions or 4, no other method will do that, unless you put them
in a loop and do it more than once.  I believe that your chip have
tock counters you can use for this type of timing, you just have to
learn how to use them.

More information about the Gcc mailing list