complex<double>::norm() -- huge slowdown from egcs-2.91.66 to 3.0.3
Jeroen Nijhof
Jeroen.Nijhof@marconi.com
Wed Feb 20 09:16:00 GMT 2002
>Originator: Jeroen Nijhof
>Organization: Marconi
>Confidential: no
>Synopsis: complex<double>::norm() -- huge slowdown from egcs-2.91.66
to 3.0.3
>Severity: non-critical
>Priority: low
>Class: pessimizes-code
>Release: 3.0.3
>Environment:
System: Linux sfup03.stratford 2.2.17-mosix #7 SMP Tue Oct 24 14:33:47 BST
2000 i686 unknown
Architecture: i686
<machine, os, target, libraries (multiple lines)>
host: i686-pc-linux-gnu (2 x Pentium III 667 MHz)
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../gcc-3.0.3/configure --prefix=/usr/local
>Description:
gcc-3.0.3 (or rather its libstdc++) generates code for complex<double>::norm
which might be more accurate, but which is a lot slower than 're * re + im * im'.
Context: in the split step Fourier algorithm, I need to calcuate
u[j] <- u[j] * exp(I * cst * |u[j]|^2) (in the "time domain", interleaved with FFTs, and multiplications in the "frequency domain").
With the gcc-3.0.3, my program becomes 40 % slower than egcs-2.91.66. The example below is even twice as slow.
I do not know how much more accurate the 3.0.3 calculation is, but certainly for my purposes it
is not worth the 40 % extra running time -- I guess I could override the norm() definition by a
non-templatized one, but I'ld rather have the fast definition in the library.
The current implementation in __Norm_helper<true> boils down to: (z = x + I * y)
s = max (abs(x), abs(y)); a = s * sqrt( (x/s)^2 + (y/s)^2); norm = a * a.
Would the increased accuracy disappear if the sqrt() is eliminated, by returning (s*s) * ( (x/s)^2 + (y/s)^2)?
The max() doesn't seem get inlined with the default -finline-limit with -O3, by the way.
>How-To-Repeat:
example.cc given below; g++ is egcs-2.91.66 (Redhat 6.2)'s g++.
g++ -O3 -mcpu=pentiumpro -DUSE_NORM -o old_norm example.cc
g++-3.0.3 -O3 -mcpu=pentiumpro -DUSE_NORM -o new_norm example.cc -finline-limit=9999
g++ -O3 -mcpu=pentiumpro -o old_separate example.cc
g++-3.0.3 -O3 -mcpu=pentiumpro -o new_separate example.cc -finline-limit=9999
times:
// elapsed user system
// old_norm 1.10 1.10 0.01
// new_norm 2.20 2.19 0.01
// old_separate 1.10 1.10 0.00
// new_separate 1.13 1.13 0.00
// example.cc
#include <complex>
typedef std::complex<double> Complex;
int main(int argc, char *argv[]) {
Complex u[2048];
for (int i = 0; i < 2048; ++i)
u[i] = 1.0;
for (int i = 0; i < 2000; ++i) {
Complex * p = u;
for(unsigned int i = 0; i < 2048; ++i) {
#ifdef USE_NORM
double u2 = norm(*p);
#else
double ur = real(*p); double ui = imag(*p);
double u2 = ur * ur + ui * ui;
#endif
double t = u2 * 0.1;
*p *= Complex(cos(t), sin(t));
// in my real program, I define _GNU_SOURCE and use libgcc's sincos() instead of sin(), cos().
++p;
}
}
}
// end of example.cc
------------
This e-mail and any attachments are confidential. If you are not the intended recipient, please notify us immediately by reply e-mail and then delete this message from your system. Do not copy this e-mail or any attachments, use the contents for any purpose, or disclose the contents to any other person: to do so could be a breach of confidence.
More information about the Gcc-bugs
mailing list