This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix C++ strict-aliasing issues with memcpy folding
- From: Richard Guenther <rguenther at suse dot de>
- To: Gabriel Dos Reis <gdr at integrable-solutions dot net>
- Cc: Mark Mitchell <mark at codesourcery dot com>, Richard Guenther <richard dot guenther at gmail dot com>, Paolo Bonzini <bonzini at gnu dot org>, gcc-patches at gcc dot gnu dot org, Diego Novillo <dnovillo at google dot com>
- Date: Tue, 2 Feb 2010 12:13:41 +0100 (CET)
- Subject: Re: [PATCH] Fix C++ strict-aliasing issues with memcpy folding
- References: <alpine.LNX.2.00.1001221549140.7863@zhemvz.fhfr.qr> <4B5ABF96.1030804@gnu.org> <84fc9c001001230409xa74cab6iea45fa74da008bf@mail.gmail.com> <4B5AF989.6080002@gnu.org> <84fc9c001001230537lb8f5e9co42869f1965cc17cb@mail.gmail.com> <4B5C8C89.80108@codesourcery.com> <alpine.LNX.2.00.1001242151460.7863@zhemvz.fhfr.qr> <206fcf961002020024u421874b4w313c904d116db819@mail.gmail.com> <alpine.LNX.2.00.1002021120580.29631@zhemvz.fhfr.qr> <206fcf961002020301h7457f79r73bce737182a7143@mail.gmail.com>
On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
> On Tue, Feb 2, 2010 at 4:25 AM, Richard Guenther <rguenther@suse.de> wrote:
> > On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
> >
> >> On Sun, Jan 24, 2010 at 3:18 PM, Richard Guenther <rguenther@suse.de> wrote:
> >> > On Sun, 24 Jan 2010, Mark Mitchell wrote:
> >> >
> >> >> Richard Guenther wrote:
> >> >>
> >> >> > The type of the first argument of memcpy doesn't mean anything, so no.
> >> >>
> >> >> I don't claim to be up on the current state of the standards with
> >> >> respect to these issues. ?The last time I tried to understand all the
> >> >> issues, it appeared that the committees had simply punted; they didn't
> >> >> seem to want to think about what these things meant.
> >> >>
> >> >> But, what evidence do we have that:
> >> >>
> >> >> + ? __builtin_memcpy (&i, p, 4);
> >> >> + ? if (*(float *)&i != 1.0)
> >> >>
> >> >> is in fact legal and intended to change the dynamic type of the storage?
> >> >> ?(In terms of the standard, that has to be "memcpy", not
> >> >> "__builtin_memcpy", but we can understand those to be the same in this
> >> >> context.)
> >> >
> >> > The evidence is 3.8/4 - 'A program may end the lifetime of any object
> >> > by reusing the storage which the object occupies ...'.
> >>
> >> I read "ending the lifetime" of an object very differently from changing
> >> the dynamic type of an object. ?I would be very surprised to see any
> >> evidence in the C++ standard to support the construct shown above.
> >>
> >> > ?Re-using
> >> > the storage involves constructing a new object in place.
> >>
> >> That is correct. ?But, that does not necessarily imply that
> >> the above construct is blessed.
> >>
> >> > ?For POD
> >> > types a copy using memcpy should be a standard conforming way
> >> > (I'm of course always using middle-end view here and do not see
> >> > a reason to not allow this).
> >>
> >> I believe the standard explicitly blesses using a roundtrip memcpy().
> >> I do not see any mention of changing declared object's dynamic type
> >> with memcpy().
> >>
> >> >
> >> > I failed to find any explicit reference to the effect of memcpy
> >> > on dynamic types in the C++ standard.
> >>
> >> I believe in that case, we have "undefined behaviour" :-)
> >>
> >> >?The C standard however
> >> > has in 6.5/6 'If a value is copied into an object having no
> >> > declared type using memcpy or memmove, or is copied as an
> >> > array of character type, then the effective type of the
> >> > modified object for that access and for subsequent accesses that
> >> > do not modify the values is the effective type of the object
> >> > from which the value is copied, if it has one'. ?The C standard
> >> > restricts itself to speak about anonymous memory only here
> >> > which effectively would make the testcase undefined in C
> >> > (at least that's my reading).
> >>
> >> yeah, I think that means objects allocated from free store.
> >> However, C++ uses "object" in a more restricted form than C.
> >
> > Do you have any opinion on the libstdc++ case which effectively does
> >
> > ?struct Y;
> > ?struct X { union T { long alignme; char mem[32]; } m; } x, y, tem;
> > ?new (&x.m.mem) Y;
>
> At this point x.m.mem holds a Y object (the character array essentially
> acts like an object container, due to the ambiguous status of char,
> and C++ lacking proper 'byte' types.)
>
> > ?tem = x;
>
> C++03, curiously, does not define copy-n-assignment of union
> objects. C++0x will fix that by prescribing a memcpy
>
> http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#653
>
> which copies the union-subobject x.m, therefore morally transferring
> the active dynamic type of the 'x.m' sub-object to tem.m
>
> > ?x = y;
>
> at this point 'y' was never initialized, so you should get
> an undefined behaviour on the read of 'y.m'. I don't know
> that is important for the discussion though.
>
> > ?y = tem;
>
> Assuming copy-n-assignment of unions is fixed as above,
> this should make y.m have the same active dynamic type
> as that of tem.m.
>
> >
> > ? ?Basically using an lvalue of type X to access the object of type Y it
> > constructed in-place in mem?
>
> I do not see an lvalue of type X used to access the object Y.
Isn't 'x' an lvalue of type X? As the in-place constructed
object of type Y is embedded in 'x', isnt 'x' also accessing
that object?
Basically the middle-end sees
new (&x.m.mem[0]) Y;
((Y *)&x.m.mem[0])->foo = 1;
tem = x;
and the store does not conflict with the load of x as far
as TBAA is concerned (there is no Y in the alias-subsets of X).
So the scheduler ends up re-ordering the accesses.
Thus I'm looking for a way to claim that the copy as above
invokes undefined behavior (because of that in-place construction),
and that the only valid way to copy 'x' is by using memcpy
(independed on that being broken by the middle-end).
So, is the copy tem = x above indeed invoking undefined behavior?
> > This is what started the whole discussion, as the proposed "fix",
> >
> > ?memcpy (&tem, &x, sizeof(X));
> > ?memcpy (&x, &y, sizeof (X));
> > ?memcpy (&y, &tem, sizeof (x));
> >
> > is folded to the plain assignments again.
>
> ah, but that transformation to plain assignment is middle-end
> gimmicks, not C++ source-level constructs. I think the middle-end
> should probably annotate that transformation (probably as
> a VIEW_CONVERT_EXPR?) since the standard explicitly says that one can
> move valid objects around with memcpy() (for objects of POD type.)
> And surely the library should use memcpy() to copy objects with
> statically ambiguous type.
>
> Does the annotation idea makes sense to the middle-end?
Yes. I was playing with that during the weekend and have some
prototype patch that I need to clean up a bit.
Thanks,
Richard.