Efficiency of memmove vs. generic typed copy

Sun Dec 11 10:38:00 GMT 2005

Gabriel Dos Reis wrote:

>Do we have evidence that code B is generally faster than code A for
>the majority of say primary platforms?
>  
>
A few days ago I had your very same curiosity. I haven't done extensive
tests, in particular not checked carefully the effects of restrict, but
two points seemed obvious:
1- For short copies, say, less than ~100 chars, the open coded loop was
*much* faster.
2- For long copies, memcpy was a win, probably not by the same margin,
but a clear win.

An additional consideration is that memcpy, as provided by glibc, for
example, depends on the target. For example, there are certainly targets
providing versions specially optimized for short copies, e.g., x86_64.
This issue with short vs long I already noticed a lot of time ago and I
can verify it in almost any benchmark I do, on a daily basis, at least
on x86 and x86_64 (and tell libcs optimized for short copies).

Of course the above considerations are relative to an operation
eventually handed essentially as-is to libc, I'm not talking about
special cases when the effect of the compiler on the recognized builtin
is clearly evident, e.g., number of chars known at compile time, which
can be optimized much, much better than the general one (noticed a few
weeks ago for string::swap, remember?)

I think we (as library implementors) should look much more into the
details, however, I'm not sure we can easily figure out a good strategy
completely target-independent... At least, it will take some time, maybe
better dealing with the general valarray problems first, IMHO ;)

Paolo.