This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance of copy algorithm

On Tue, Feb 04, 2003 at 05:43:02PM -0500, wrote:
> > This is highly system dependent, and you don't say
> > what OS you're using for your test.
> Debian gnu/linux woody+testing (glibc 2.2.5) on an
> Intel P4.

The performance characteristics of P4 are radically different from 
P3's.  It is not surprising that code optimized for ppro is pessimal 
on a P4.

In particular, I gather that code like "*p++ = *q++" (unless rewritten 
by the compiler) is much slower than "a[i] = b[i]; ++i" on a P4, and 
that shifts are much, much slower too.

> BTW, what would be involved in getting the compiler
> to have a __builtin_stdcopy?  

Probably we need a family of built-ins:

  __builtin_copy_up_1  __builtin_copy_down_1  
  __builtin_copy_up_2  __builtin_copy_down_2  
  __builtin_copy_up_4  __builtin_copy_down_4  
  __builtin_copy_up_8  __builtin_copy_down_8  

std::copy<> always knows its alignment at compile-time, and
std::copy_backward<> shouldn't be slower.

It probably isn't a lot of work until somebody wants them to 
be fast; then it's a separate chunk of work optimizing for each 
target.  Since std::copy<> is appallingly slow now, it should be 
OK to do unoptimized implementations first.

On some architectures, some of them (most typically the "_up_1"
varieties, I suppose) should be implemented identically as memcpy.

Nathan Myers

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]