Bug 32667

Summary: block copy with exact overlap is expanded as memcpy
Product: gcc Reporter: merkert
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: fang, fweimer, gcc-bugs, g_sauthoff, hugh, jakub, lopresti, lu_zero, mans, msebor, rguenth
Priority: P3 Keywords: wrong-code
Version: 4.2.0   
Target Milestone: ---   
Host: i686-pc-linux-gnu Target:
Build: Known to work:
Known to fail: 4.7.0 Last reconfirmed: 2007-07-08 19:46:19

Description merkert 2007-07-07 17:55:22 UTC
This code generates a warning when run with valgrind:
#include <vector>

using namespace ::std;

struct X {
  double values[10];

int main()
  vector<X> x;

  for (vector<X>::iterator i=x.begin();i!=x.end();++i) {
    *i = *(x.end()-1);
  return 0;

g++ test.cpp -o foo -O3 

Valgrind error:
valgrind --tool=memcheck foo
==24513== Memcheck, a memory error detector for x86-linux.
==24513== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==24513== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==24513== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==24513== For more details, rerun with: -v
==24513== Source and destination overlap in memcpy(0x1BB68028, 0x1BB68028, 80)
==24513==    at 0x1B9057E5: memcpy (in /usr/lib/valgrind/vgpreload_memcheck.so)
==24513==    by 0x8048696: main (in /home/ray/tmp/foo)
==24513== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 1)
==24513== malloc/free: in use at exit: 0 bytes in 0 blocks.
==24513== malloc/free: 1 allocs, 1 frees, 80 bytes allocated.
==24513== For a detailed leak analysis,  rerun with: --leak-check=yes
==24513== For counts of detected errors, rerun with: -v

On my machine "man memcpy" says "...The memory areas may not overlap. Use memmove(3) if the memory areas do overlap. ..."
Comment 1 Paolo Carlini 2007-07-07 18:19:08 UTC
Interesting: mainline is not affected by the problem. I would guess thanks to fixing libstdc++/29286 ??? 
Comment 2 merkert 2007-07-07 22:36:14 UTC
This may be an old bug and may have crept in between 3.3.3 and 3.4.0 (latter has it, former doesn't)
Comment 3 Richard Biener 2007-07-08 19:46:19 UTC
No, this doesn't have anything to do with aliasing.  4.3 simply inlines the memcpys.  This is probably a middle-end issue with expanding structure assignments.  2.95 and 3.3 also expanded the calls inline.  3.4 does not,
likewise 4.0, 4.1 and 4.2.

Note that this is likely not a problem in practice as memcpy (p, p, sizeof (*p))
is difficult to implement in a way that would make it not work.  So, downgrading
Comment 4 Paolo Carlini 2007-07-08 20:53:20 UTC
Thanks for the clarification, Richard.
Comment 5 Paul Pluzhnikov 2009-05-06 16:36:24 UTC
(In reply to comment #3)

> Note that this is likely not a problem in practice as 
>  memcpy (p, p, sizeof (*p)) is difficult to implement
> in a way that would make it not work.

From Julian Seward:

JS> AIUI, POSIX says the src==dst case is not allowed (along with all other
JS> overlapping cases) because (eg) on PowerPC, it is possible to make a high
JS> performance memcpy that preallocates the destination area in D1 using
JS> dcbz instructions, which create the line in D1 and fill it full of
JS> zeroes.  This avoids dragging the destination line up the memory
JS> hierarchy only to completely overwrite it with stuff from the source.
JS> Result is however that if the src and dst overlap, in any way, including
JS> completely, then this causes zeroes to be written into the src area (!)
JS> which is certainly not what you want.

This bug is likely fixed by:
Comment 6 lu_zero 2011-12-05 16:59:07 UTC
Doesn't seem.

Here a reduced testcase courtesy of Mans Rullgard

struct foo {
    int x[64];

void __attribute__((noinline)) foo(struct foo *a, struct foo *b)
    *a = *b;

int main(void)
    struct foo a = { 0 };
    foo(&a, &a);
    return 0;
Comment 7 Mans Rullgard 2011-12-05 20:37:18 UTC
Note that the test case in comment #6 does not trigger the problem on x86 where the copying is inlined. It does trigger on ARM and other targets where this assignment results in a memcpy() call. All current stable releases show the bug. I did not test with trunk.
Comment 8 Richard Biener 2011-12-06 08:44:46 UTC
This is a generic middle-end issue (at best).  There is at least one duplicate
of this bug (can't find it with a quick search).
Comment 9 Andrew Pinski 2015-02-12 05:38:22 UTC
*** Bug 65029 has been marked as a duplicate of this bug. ***
Comment 10 Richard Biener 2015-02-12 09:00:08 UTC
One way to "fix" this is to emit the memcpy as

  if (p != q)
    memcpy (p, q, ...);

but of course that comes at a cost in code-size and runtime for no obvious good
reason (in practice).
Comment 11 Jakub Jelinek 2015-02-12 09:42:40 UTC
And another way to fix this is add a new library entrypoint, that would have the same semantics as memcpy, but would allow full overlap (dst == src).
When we e.g. expand this without a library call, we could do exactly what we do for memcpy, because we know we do handle the dst == src case fine.  Similarly, e.g. in glibc it could very well be just another alias to memcpy, because we know it handles that too.  On targets which would not have the library function we'd use the #c10 approach.  Of course, this would require coordination in between glibc, gcc, valgrind, libsanitizer, memstomp etc.
Comment 12 D. Hugh Redelmeier 2015-09-28 18:18:43 UTC
JJ in #11 is right.

The compiler should never generate calls to functions with names in the user's space.  That's because the user is allowed to (re)define anything with those names and is not constrained to preserve expected semantics.

There should be a function in the reserved-to-system namespace that does what is needed.  GCC should then call this function.

Let's say that GCC uses __memcpy.  The default standard definition of memcpy could be a synonym (until redefinition) for __memcpy.

On the other hand, there are compile-time constraints on struct assignments that could yield even better performance.  For example, GCC might know that the object is a multiple of 8 bytes, aligned on an 8-byte boundary.  GCC would certainly know that the struct's length is non-zero, conceptually eliminating one test.  This suggests implementing specialized variants, perhaps having names starting with __struct_assign.

(I know GCC has an extension allowing 0 length objects.  The assignment for such objects could be eliminated.)