This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix C++ strict-aliasing issues with memcpy folding


On Tue, Feb 2, 2010 at 6:44 AM, Richard Guenther <rguenther@suse.de> wrote:
> On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
>
>> On Tue, Feb 2, 2010 at 5:13 AM, Richard Guenther <rguenther@suse.de> wrote:
>> > On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
>> >
>> >> On Tue, Feb 2, 2010 at 4:25 AM, Richard Guenther <rguenther@suse.de> wrote:
>> >> > On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
>> >> >
>> >> >> On Sun, Jan 24, 2010 at 3:18 PM, Richard Guenther <rguenther@suse.de> wrote:
>> >> >> > On Sun, 24 Jan 2010, Mark Mitchell wrote:
>> >> >> >
>> >> >> >> Richard Guenther wrote:
>> >> >> >>
>> >> >> >> > The type of the first argument of memcpy doesn't mean anything, so no.
>> >> >> >>
>> >> >> >> I don't claim to be up on the current state of the standards with
>> >> >> >> respect to these issues. ?The last time I tried to understand all the
>> >> >> >> issues, it appeared that the committees had simply punted; they didn't
>> >> >> >> seem to want to think about what these things meant.
>> >> >> >>
>> >> >> >> But, what evidence do we have that:
>> >> >> >>
>> >> >> >> + ? __builtin_memcpy (&i, p, 4);
>> >> >> >> + ? if (*(float *)&i != 1.0)
>> >> >> >>
>> >> >> >> is in fact legal and intended to change the dynamic type of the storage?
>> >> >> >> ?(In terms of the standard, that has to be "memcpy", not
>> >> >> >> "__builtin_memcpy", but we can understand those to be the same in this
>> >> >> >> context.)
>> >> >> >
>> >> >> > The evidence is 3.8/4 - 'A program may end the lifetime of any object
>> >> >> > by reusing the storage which the object occupies ...'.
>> >> >>
>> >> >> I read "ending the lifetime" of an object very differently from changing
>> >> >> the dynamic type of an object. ?I would be very surprised to see any
>> >> >> evidence in the C++ standard to support the construct shown above.
>> >> >>
>> >> >> > ?Re-using
>> >> >> > the storage involves constructing a new object in place.
>> >> >>
>> >> >> That is correct. ?But, that does not necessarily imply that
>> >> >> the above construct is blessed.
>> >> >>
>> >> >> > ?For POD
>> >> >> > types a copy using memcpy should be a standard conforming way
>> >> >> > (I'm of course always using middle-end view here and do not see
>> >> >> > a reason to not allow this).
>> >> >>
>> >> >> I believe the standard explicitly blesses using a roundtrip memcpy().
>> >> >> I do not see any mention of changing declared object's dynamic type
>> >> >> with memcpy().
>> >> >>
>> >> >> >
>> >> >> > I failed to find any explicit reference to the effect of memcpy
>> >> >> > on dynamic types in the C++ standard.
>> >> >>
>> >> >> I believe in that case, we have "undefined behaviour" :-)
>> >> >>
>> >> >> >?The C standard however
>> >> >> > has in 6.5/6 'If a value is copied into an object having no
>> >> >> > declared type using memcpy or memmove, or is copied as an
>> >> >> > array of character type, then the effective type of the
>> >> >> > modified object for that access and for subsequent accesses that
>> >> >> > do not modify the values is the effective type of the object
>> >> >> > from which the value is copied, if it has one'. ?The C standard
>> >> >> > restricts itself to speak about anonymous memory only here
>> >> >> > which effectively would make the testcase undefined in C
>> >> >> > (at least that's my reading).
>> >> >>
>> >> >> yeah, I think that means objects allocated from free store.
>> >> >> However, C++ uses "object" in a more restricted form than C.
>> >> >
>> >> > Do you have any opinion on the libstdc++ case which effectively does
>> >> >
>> >> > ?struct Y;
>> >> > ?struct X { union T { long alignme; char mem[32]; } m; } x, y, tem;
>> >> > ?new (&x.m.mem) Y;
>> >>
>> >> At this point x.m.mem holds a Y object (the character array essentially
>> >> acts like an object container, due to the ambiguous status of char,
>> >> and C++ lacking proper 'byte' types.)
>> >>
>> >> > ?tem = x;
>> >>
>> >> C++03, curiously, does not define copy-n-assignment of union
>> >> objects. ?C++0x will fix that by prescribing a memcpy
>> >>
>> >> ? ?http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#653
>> >>
>> >> which copies the union-subobject x.m, therefore morally transferring
>> >> the active dynamic type of the 'x.m' sub-object to tem.m
>> >>
>> >> > ?x = y;
>> >>
>> >> at this point 'y' was never initialized, so you should get
>> >> an undefined behaviour on the read of 'y.m'. ?I don't know
>> >> that is important for the discussion though.
>> >>
>> >> > ?y = tem;
>> >>
>> >> Assuming copy-n-assignment of unions is fixed as above,
>> >> this should make y.m have the same active dynamic type
>> >> as that of tem.m.
>> >>
>> >> >
>> >> > ? ?Basically using an lvalue of type X to access the object of type Y it
>> >> > constructed in-place in mem?
>> >>
>> >> I do not see an lvalue of type X used to access the object Y.
>> >
>> > Isn't 'x' an lvalue of type X?
>>
>> yes, 'x' is lvalue of type X. ?But, I do not see it used to access
>> the Y object. ?At most, we would have either 'x.m' or 'x.m.mem',
>> none of which is of type X.
>>
>> >?As the in-place constructed object of type Y is embedded in 'x',
>> > isnt 'x' also accessing that object?
>>
>> I do not see that access in the sample code.
>
> See below...
>
>> In general, when we write 'obj.member' to access a sub-object of
>> an object 'obj', we don't say we use 'obj' to access the data designated
>> by 'obj.member' -- rather, we talk of the complete path 'obj.member'.
>> Otherwise, we would have to throw away almost all of C++ type rules.
>>
>> >
>> > Basically the middle-end sees
>> >
>> > ?new (&x.m.mem[0]) Y;
>> > ?((Y *)&x.m.mem[0])->foo = 1;
>> > ?tem = x;
>
> ... the copy-assignment is seen as a load from 'x' and a store to 'tem'
> by the middle-end. ?Supposedly this is what the frontend emits
> for copy-assignments of POD types.

OK.  So, the middle-end does not care about the distinction
between sub-objects and complete objects?

If I understand what you are saying so far, it looks to me as
if the g++ front-ends is not constructing trees with sufficient
information for the middle-end not to get confused.

>> > and the store does not conflict with the load of x as far
>> > as TBAA is concerned (there is no Y in the alias-subsets of X).
>> > So the scheduler ends up re-ordering the accesses.
>> >
>> > Thus I'm looking for a way to claim that the copy as above
>> > invokes undefined behavior (because of that in-place construction),
>> > and that the only valid way to copy 'x' is by using memcpy
>> > (independed on that being broken by the middle-end).
>>
>> Is there a way to tell the middle end that in the assignment
>>
>> ? ? tem = x;
>>
>> both members of x.m are indistinguishable in terms of aliasing?
>> Or would that be too pessimistic an assumption for the optimizer?
>
> That wouldn't be even enough. ?Or I don't understand what you
> mean with indistinguishable.
>
>> > So, is the copy tem = x above indeed invoking undefined behavior?
>>
>> Technically, that assignment is undefined behaviour in C++03, but
>> I think a compiler that goes by that would be terrible from practical
>> point of view -- that omission was not intended, as indicated in the DR.
>> That will be fixed soon, by requiring the compiler to do a memcpy().
>
> So every copy-assignment of a POD type will have to be a memcpy?

The suggestion of memcpy() for

    tem = x;

is only of copy-n-assignments of union objects.  This is because at the
point of the assignment, we don't necessarily have the information about
which branch of the union was last assigned to -- so, a memberwise
assignmnet would be invalid.  For other POD types,  I think the language
is sufficiently clear to keep whatever we currently  have working.

>
>> Now, I do not know whether the g++ front-end will insert that memcpy(),
>> or if it would let the middle end do it. ?Either way, I think g++ today
>> should treat that assignment as making 'tem.m' have the same active
>> dynamic type as 'x.m'. ?Now, you're the middle-end expert, and I have
>> no clue about how difficult that is to teach GCC with the current
>> infrastructure.
>
> It would be the frontends responsibility to translate tem = x
> with using memcpy as only the frontend knows that the copy has
> to conflict with stores through arbitrary types.

OK, that makes sense.  Assuming the front-ends issues memcpy for
union objects, would we still be running into this issue?  That is I'm
trying to understand whether there are other aspects that we may have
forgotten (from the middle-end point of view.)

>
> Whether using memcpy (or any other middle-end notion with the
> same effect) or simply disabling type-based aliasing for C++ will
> end up more efficient remains to be seen ...

OK. (It would be a pity if we had to disable TBAA for g++.)


-- Gaby


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]