This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
The XMM code is still more than 3 times faster than rep movsl when data are aligned by 4 or 8, but not by 16.I tend to doubt that odd-byte aligned large memcpys are anywhere near typical. malloc and mmap both return well-aligned buffers (say, 8 byte aligned). Static and on-stack objects are also at least word-aligned 99% of the time.
memcpy can just use "relatively simple" code for copies in which
either src or dst is not word aligned. This cuts possibilities down
from 16 to 4 (or even 2?).
You forgot to look at PowerPC :.. and slow. Why doesn't it use Altivec?
http://cvs.opensolaris.org/source/xref/ppc-dev/ppc-dev/usr/src/lib/libc/ppc/gen/memcpy.s
is that nice and small ?
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |