Hello,
I'm running the program below twice with different command line arguments. The
argument is used a a floating point scaling factor in the code, but does not
change the algorithm in any way. I am baffled by the difference in run time of
the two runs, since the program flow is not altered by the argument.
$ gcc -O3 t.c
$ time ./a.out 0.1
real 0m7.300s
user 0m7.286s
sys 0m0.007s
$ time ./a.out 0.0001
real 0m0.060s
user 0m0.058s
sys 0m0.003s
The second run is about 120 times faster then the first.
I did some quick tests using the 'perf' profiling utility on Linux, and
it seems that the slow run has about 70% branch misses, which I guess
might kill performance drastically.
I am able to reproduce this on multiple i686 boxes using various gcc versions
(4.4, 4.6). Compiling on x86_64 does not show this behaviour.
Is anybody able to reproduce this issue, and how can this be explained ?
Thanks,
Ico
/*
* gcc -O3 test.c&& ./a.out NUMBER
*/
#include<stdio.h>
#include<stdlib.h>
#define N 4000
#define S 5000
struct t {
double a, b, f;
};
int main(int argc, char **argv)
{
int i, j;
struct t t[N];
double f = atof(argv[1]);
for(i=0; i<N; i++) {
t[i].a = 0;
t[i].b = 1;
t[i].f = i * f;
};
for(j=0; j<S; j++) {
for(i=0; i<N; i++) {
t[i].a += t[i].b * t[i].f;
t[i].b -= t[i].a * t[i].f;
}
}
return t[1].a;
}
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 11
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-7'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-7)