This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: profiling wierdness
- To: egcs at cygnus dot com, steffend at helicon dot physics dot colostate dot edu
- Subject: Re: profiling wierdness
- From: mrs at wrs dot com (Mike Stump)
- Date: Fri, 25 Sep 1998 11:26:37 -0700
> Date: Wed, 23 Sep 1998 15:54:24 -0600
> From: Dave Steffen <steffend@helicon.physics.colostate.edu>
This is benchmarking 101 stuff... Not appropriate for egcs,
and I probably shouldn't respond...
> I've got some numerical code I'm trying (desperately) to speed
> up. I'm compiling with
> c++ -ansi -pg ....
> and (right now) am using no optimization, so I can tell what
> improvements I'm getting because of improving the algorithm.
Benchmarking and trying to speed up -O0 programs I don't think is as
useful or meaningful as speeding up -O3 programs. I'd recompile.
> So I run my program:
> helicon: kubo -XYHxyh -e ".1" -O "-5 5 .1" -P .66
> And I profile:
> helicon: gprof kubo | c++filt > kubo.prof
> And I get
> % cumulative self self total
> time seconds seconds calls us/call us/call name
> 56.86 0.29 0.29 19000000 0.02 0.02 double
> dot_product<Sparse_
> vector_STLvec<double>, double>(Sparse_vector_STLvec<double> const &, double
> cons
> t *)
> 15.69 0.37 0.08 19001940 0.00 0.00 vector<ai<int, double>,
You cannot meaningfully time routines this way when they run for too
little time. It appears this is too little to measure. Try numbers
for 100 seconds of run, then 50, then 25, then 8, then 4, 2, 1... The
point where your numbers start to diverge and become unpredictable is
the point that you can't measure past with this method.
If you can't make your program consume more time, it isn't meaningful
to speed up the program. :-)
> And in case you're wondering, 'time'ing the run gives results
> like "0.11user 47.19system 0:47.30elapsed 99%CPU", and these numbers
> are very consistent. Also, the code executes correctly and generates
> identical output for all the above runs.
That means that measuring the time it takes that way _is_ valid and it
is long enough.
> SO: I'm very confused. Does anyone know what's going on? Is
> there any way for me to get reasonably accurate profiling
> information?
You can either learn to lengthen runs or learn to use different
techniques. Personally, I like to use tick counters on the chips to
measure runtimes, from within the software. If you do this, then
you'll find that you can tell the difference between running 3 machine
instructions or 4, no other method will do that, unless you put them
in a loop and do it more than once. I believe that your chip have
tock counters you can use for this type of timing, you just have to
learn how to use them.