slow complex<double>'s with g++

    While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...

...I've come across the issue that complex<double>'s seem painfully slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues.  The speed difference for
the program below is between 5x-8x depending on the version of g++.  It
is also about 5 times slower than the corresponding gcc version at...

...I'd be interesting in learning the reason for the speed difference.
Does it have something to do with temporaries not being optimized away,
or somesuch?  A limitation of the x87 instruction set?  Is it inherent
in the way the C++ Standard requires complex<double>'s to be calculated?
My bad coding style?


Greg Buchholz

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( )


int main (int argc, char **argv)
  char  bit_num = 0, byte_acc = 0;
  const int iter = 50;
  const double limit_sqr = 2.0 * 2.0;
  int n = atoi(argv[1]);

  std::cout << "P4\n" << n << " " << n << std::endl;

  for(int y=0; y<n; ++y) 
    for(int x=0; x<n; ++x)
       std::complex<double> Z(0.0,0.0);
       std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n - 1.0);
       for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i)  Z = Z*Z + C;
       byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ? 0x00:0x01);

       if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc = 0; }
       else if(x == n-1) { byte_acc  <<= (8-n%8);
                           std::cout << byte_acc;
                           bit_num = byte_acc = 0; }

