This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

std::norm improvement


Hello,

following a recent discussion on comp.lang.c++ about QOI vs.
performance of GCC's implementation of std::norm, it seems I succeeded
to create an implementation that retains the numerical robustness of
the present implementation, yet apparently achieves a higher
performance (avoids sqrt).
The current implementation, used for floating point types, simply
squares the result of std::abs.

I include the patch against gcc trunk (mady by svn diff) as well as
the test program I used to measure the impact.

With g++ 4.3.1, compiling by g++ -O3 test.cc, I get on a Core 2 Duo @ 2.83GHz:

testing floats...
time elapsed = 270000
residual = 7.62815e-20
testing doubles...
time elapsed = 240000
residual = 2.36858e-163

whereas with the patched header, I get:

testing floats...
time elapsed = 60000
residual = 5.40582e-20
testing doubles...
time elapsed = 110000
residual = 1.41387e-163

so it seems the patched version wins in terms of both accuracy and performance.

Please let me know if there's something more to do. The change is
minor, but here's a changelog entry just in case:

2009-03-04  Jaroslav Hajek  <highegg@gmail.com>

	* include/std/complex (_Norm_helper<true>::_S_do_it): Optimize.

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Index: libstdc++-v3/include/std/complex
===================================================================
--- libstdc++-v3/include/std/complex	(revision 144596)
+++ libstdc++-v3/include/std/complex	(working copy)
@@ -649,8 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE(std)
       template<typename _Tp>
         static inline _Tp _S_do_it(const complex<_Tp>& __z)
         {
-          _Tp __res = std::abs(__z);
-          return __res * __res;
+          const _Tp __x = abs(__z.real());
+          const _Tp __y = abs(__z.imag());
+          const _Tp __s = std::max(__x, __y);
+          const _Tp __r = std::min(__x, __y);
+          if (__s == _Tp())
+            return __s;
+          return __s * (__s + __r * (__r / __s));
         }
     };
   
#include <cmath>
#include <complex>
#include <algorithm>
#include <ctime>
#include <iostream>
#include <cfloat>

using namespace std;

template <typename _Tp>
clock_t 
waste_time (const complex<_Tp>& z, _Tp& resid, int nrep = 10000000)
{
  complex<_Tp> *za = new complex<_Tp>[nrep];
  complex<_Tp> fi = polar (static_cast<_Tp> (1), 2*static_cast<_Tp> (M_PI) / nrep);
  za[0] = z;
  for (int i = 1; i < nrep; i++)
    za[i] = za[i-1] * fi;

  _Tp *ra = new _Tp[nrep];
  // First, measure the time.
  clock_t tic = clock ();
  for (int i = 0; i < nrep; i++)
    ra[i] = std::norm (za[i]);

  tic = clock () - tic;
  // Here we take advantage of the fact that doing z /= norm(z) twice
  // should be a no-op mathematically, so let's check how far we get from that.
  _Tp res = 0;
  for (int i = 0; i < nrep; i++)
    {
      complex<_Tp> zx = za[i] / ra[i];
      zx /= std::norm (zx);
      // Add residual.
      res += std::abs (zx - za[i]);
    }

  resid = res;
  delete [] ra;
  delete [] za;
  return tic;
}

int main ()
{
  clock_t t;

  // Generate the overflow issue.
  float lowf = sqrt(FLT_MIN / 2.0f), resf;
  std::complex<float> zf(lowf * 0.9f, lowf * 1.1f);

  std::cout << "testing floats...\n";
  t = waste_time(zf, resf);
  std::cout << "time elapsed = " << t << '\n';
  std::cout << "residual = " << resf << '\n';

  // Ditto for double.
  double lowd = sqrt(DBL_MIN / 2.0), resd;
  std::complex<double> zd(lowd * 0.9f, lowd * 1.1f);

  std::cout << "testing doubles...\n";
  t = waste_time(zd, resd);
  std::cout << "time elapsed = " << t << '\n';
  std::cout << "residual = " << resd << '\n';
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]