This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Efficiency of memmove vs. generic typed copy


Ian Lance Taylor <ian@airs.com> writes:

| Gabriel Dos Reis <gdr@integrable-solutions.net> writes:
| 
| > Our typed std::copy() is specialized for T*, where T is a POd, as a
| > forwarding function to generic memmove() from the C library. 
| > Do we have evidence that for the majority of the targets supported by
| > GCC, that is a win compared to the typed generic copy()
| > implementation that exposes alignment and other goodies to the optimizer?
| > On which targets do we know copy() wins?
| 
| Note that gcc will recognize a call to memmove, and handle it
| specially (unless you use -fno-builtin).
| 
| That said, the special handling is not all that clever, and could be
| improved.  For example, it could try to use alias analysis to prove
| that the buffers can not overlap, and call memcpy in that case (the
| special handling of memcpy is fairly clever).  And where the size is
| constant and small, it would sometimes make sense to load it all and
| then store it all.  And at high optimization levels, if the special
| handling of memcpy will do something clever, it would make sense to
| compare the pointers at runtime and call the appropriate form of
| memcpy (move_by_pieces should be able to support a general reverse
| move, perhaps with a little tweaking).
| 
| The code is expand_builtin_memmove and fold_builtin_memmove in
| builtins.c.

Thanks for the pointer.  I'll look into that.

(I think we should change our std::copy() to call memcpy() instead of
memmove())

Assume for a moment that I'll use memcpy() instead of memmove().  The
issue I'm inetrested is this: 

   code A:
   const T* restrict p = ..;
   T* restrict q = ...;
   for (size_t i = 0; i < n; ++i)
       q[i] = p[i];


code A always does aligned access, and if T is an int the compiler
will use int register; if T is double, the compiler will use double
register, etc.  In general, the compiler will use fast data move
pattern, chunk by chnuk. 

Now, assume code A is rewritten as

   code B:
   const T* restrict p = ...;
   T* restrict q = ... ;
   memcpy(q, p, n * sizeof (T));

Here, memcpy() is usually coded for generic data move, and assumes
potentially unaligned data access. I'm aware of some work outside GCC
that special cases memcpy() implementation so that it uses VIS
instructions, for example, where available for known aligned
data. But, I don't believe it is common place for the majority of
targets GCC supports.  So, I'm wondering if GCC has machinery that 
detects the alignment property of memcpy() arguments (in the source
code) so that faster paths are used to move data.
Do we have evidence that code B is generally faster than code A for
the majority of say primary platforms?


-- Gaby


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]