This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Performance of copy algorithm
- From: Nathan Myers <ncm-nospam at cantrip dot org>
- To: libstdc++ at gcc dot gnu dot org
- Date: Tue, 4 Feb 2003 15:39:31 -0800
- Subject: Re: Performance of copy algorithm
- References: <d27bfd079f.d079fd27bf@optonline.net>
On Tue, Feb 04, 2003 at 05:43:02PM -0500, jlquinn@optonline.net wrote:
> > This is highly system dependent, and you don't say
> > what OS you're using for your test.
>
> Debian gnu/linux woody+testing (glibc 2.2.5) on an
> Intel P4.
The performance characteristics of P4 are radically different from
P3's. It is not surprising that code optimized for ppro is pessimal
on a P4.
In particular, I gather that code like "*p++ = *q++" (unless rewritten
by the compiler) is much slower than "a[i] = b[i]; ++i" on a P4, and
that shifts are much, much slower too.
> BTW, what would be involved in getting the compiler
> to have a __builtin_stdcopy?
Probably we need a family of built-ins:
__builtin_copy_up_1 __builtin_copy_down_1
__builtin_copy_up_2 __builtin_copy_down_2
__builtin_copy_up_4 __builtin_copy_down_4
__builtin_copy_up_8 __builtin_copy_down_8
std::copy<> always knows its alignment at compile-time, and
std::copy_backward<> shouldn't be slower.
It probably isn't a lot of work until somebody wants them to
be fast; then it's a separate chunk of work optimizing for each
target. Since std::copy<> is appallingly slow now, it should be
OK to do unoptimized implementations first.
On some architectures, some of them (most typically the "_up_1"
varieties, I suppose) should be implemented identically as memcpy.
Nathan Myers
ncm@cantrip.org