Hi,
great that you look at this routine - a possible enhancement could be
the non portable assembly snippets
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA32LinuxCluster/Doc/timing.html#linuxasm
for linux. similar hacks excist for most systems, and the routine
could default to some unix function if not. this could give roughly
nano second resolution on gigaherz machines for those who wants to
time e.g. single loops. in the ultimate implementation the user could
adjust the resolution of the clock via an environment variable to
avoid overwrapping for longer measurements.