This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Performance of copy algorithm


In conversation with Nathan Myers, I started to explore the
performance of the copy algorithm.  The standard doesn't permit memcpy
to be used outright, since the ranges can overlap.  The current
implementation uses memmove whenever possible as an optimization
technique.

Curious, I tried a simple test using an array of chars copied into
another array to try out different copy algorithms.  The test program
was as follows:

#include <iostream>
#include <cstring>
#include <algorithm>
using namespace std;

int main()
{
  char x[] = "1234567890123456789012345678901";
  char y[32];

  for (long long i = 0; i < 100000000; i++) {
    // A
//    for (int j=0; j < sizeof(x); j++)
//       y[j] = x[j];

    // B
//    memcpy(y, x, sizeof(x));

    // C
//    for (int j = 0; j < sizeof(x)/sizeof(double); j++)
//      ((double*)y)[j] = ((double*)x)[j];

    // D
//    memmove(y, x, sizeof(x));

    // E
//    copy(x, x+sizeof(x), y);

  }
}


I compiled it with -O2 (with 3.3 branch) and was quite surprised by
the results:

A) Open coded loop on char			 7.20s
B) memcpy					 0.80s
C) Loop using doubles				 1.68s
 (cheats on alignment and block size)
D) memmove					18.96s
E) std::copy (which translates to memmove)	18.96s

A couple of observations.  First, memcpy is very fast - the compiler
doesn't have enough optimization to squeeze the same performance out
of the straight loop.  Second, we are taking a HUGE performance hit by
using memmove for this case.  I figured memmove might be about
equivalent, but apparently not.

I was just going to suggest we change __copy_trivial to check for
range overlap and use memcpy if there's no overlap and a straight loop
otherwise.  But a quick check, leading to gutting the function and
using memcpy, indicates that memcpy isn't getting inlined.  It is
taking about 19 seconds as in the memmove case.

The case B above apparently makes use of the memcpy builtin.  Changing
the innards of __copy_trivial makes calls to the library (verified
with ltrace).

Suggestions on how I should proceed here?

Jerry Quinn


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]