This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Floating point performance issue


On 12/20/2011 4:52 AM, Ico wrote:
Hello,

I'm running the program below twice with different command line arguments. The
argument is used a a floating point scaling factor in the code, but does not
change the algorithm in any way.  I am baffled by the difference in run time of
the two runs, since the program flow is not altered by the argument.
Evidently, your definition of program flow doesn't take into account changes in your choice of architecture or number of exceptions.
Is your interest in x87 behavior due to historic considerations?

$ gcc -O3 t.c


$ time ./a.out 0.1

real	0m7.300s
user	0m7.286s
sys	0m0.007s

$ time ./a.out 0.0001

real	0m0.060s
user	0m0.058s
sys	0m0.003s


The second run is about 120 times faster then the first.


I did some quick tests using the 'perf' profiling utility on Linux, and
it seems that the slow run has about 70% branch misses, which I guess
might kill performance drastically.

I am able to reproduce this on multiple i686 boxes using various gcc versions
(4.4, 4.6). Compiling on x86_64 does not show this behaviour.

Is anybody able to reproduce this issue, and how can this be explained ?
If you had turned on your search engine, you would have seen the articles about "x87 Floating Point Assist."
Did you also test SSE code with and without abrupt underflow?

Thanks,


Ico



/*
  * gcc -O3 test.c&&  ./a.out NUMBER
  */

#include<stdio.h>
#include<stdlib.h>

#define N 4000
#define S 5000

struct t {
         double a, b, f;
};

int main(int argc, char **argv)
{
         int i, j;
         struct t t[N];
         double f = atof(argv[1]);

         for(i=0; i<N; i++) {
                 t[i].a = 0;
                 t[i].b = 1;
                 t[i].f = i * f;
         };

         for(j=0; j<S; j++) {
                 for(i=0; i<N; i++) {
                         t[i].a += t[i].b * t[i].f;
                         t[i].b -= t[i].a * t[i].f;
                 }
         }

         return t[1].a;
}





processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz
stepping	: 11


Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6/lto-wrapper Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-7' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.6.2 (Debian 4.6.2-7)


--
Tim Prince


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]