This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Performance breakdown for gcc-4.{6,7} vs. gcc-4.5 using std::vector in matrix vector multiplication


Dear experts,

I hope I'm posting this to the right mailing list as STL's vector is partly involved. If not, any pointers to a more appropriate place would be appreciated.


I'm a PhD student at the Institute for Numerical Simulation at Bonn University. In our group we employ a self-written C++ library for our numerical computations. This library also includes a set of "typical" programs used for benchmarking compilers or assessing effects of new compiler flags.


When gcc-4.6 was released we noticed a massive performance breakdown in one of these benchmark problems. However we did not further investigate and waited for gcc-4.7 instead. Unfortunately the problem persisted. Digging deeper produced a minimal stand-alone example which I'm attaching to this mail.

What you see there is actually just 1000 times matrix-vector multiplication. However the matrix has a highly specific structure which is encountered when performing numerical computations using the Finite Element Method (FEM), i.e.:

std::vector<MinimalVec3> rows[9];

Thus it consists of 9 bands of triples of doubles. The length of each band corresponds to the length of the vector it is applied to.


Compiling with gcc-4.5.0 (our standard compiler) the 'time' command gives: 1m13.246s Using gcc-4.7.0 we get: 2m6.623s When removing member variable "double stuff" we get: 1m9.636s

Using a C array instead of std::vector above resolves this issue.


It is probably a demanding question to ask, but anyway: Do you have any clue what could be causing this problem and what could prevent it from happening?


We could of course use another matrix class but in comparison to other matrix implementations (using gcc-4.5) this one here performs best.



We'd be grateful for any advice.
Best regards
Benedict

#include <vector>

class MinimalVec3
{
protected:
  double coords[3];

public:

  MinimalVec3( ) {
    for ( int i = 0; i < 3; ++i )
      coords[i] = 0.;
  }

  inline const double& operator[] ( int I ) const {
    return coords[I];
  }
};

class MinimalVector
{
protected:
  double *_pData;
  double stuff; // EVIL

public:
  explicit MinimalVector ( int length ) {
    _pData = new double[length];
    for (int i = 0; i < length; ++i) _pData[i] = 0.;
  }

  inline double& operator[] ( int I ) {
    return _pData[I];
  }

  inline const double& operator[] ( int I ) const {
    return _pData[I];
  }
};


int main ( int /*argc*/, char** /*argv*/ ) {
    int w = ( 1 << 7 )+1;
    int wsqr = w*w;
    int wcub = w*w*w;

    std::vector<MinimalVec3> rows[9];
    for ( int i = 0; i < 9; ++i ) {
      rows[i].resize ( wcub );
    }

    MinimalVector img ( wcub ), res ( wcub );

    for ( int c = 0; c < 1000; ++c )  {
//       matrix.applyAdd( img, res );
      for ( int i = 1; i < w-1; ++i )
        for ( int j = 0; j < 3; ++j )  {
//           matrix.subApplyAdd ( i*wsqr, ( i + j - 1 ) *wsqr, j, img, res );
          for ( int k = 1; k < w - 1; ++k )
            for ( int l = 0; l < 3; ++l )  {
//               matrix.tripleDiagApplyAdd ( i*wsqr + k*w, ( i + j - 1 ) *wsqr + ( k + l - 1 ) *w, j*3 + l, img, res );
              for ( int m = 1; m < w - 1; ++m )
                for ( int n = 0; n < 3; ++n )
                  res[i*wsqr + k*w + m] += img[( i + j - 1 ) *wsqr + ( k + l - 1 ) *w + m + n - 1] * rows[j*3 + l][i*wsqr + k*w + m][n];

            }
        }
    }
    return 0;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]