This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix C++ strict-aliasing issues with memcpy folding
On Tue, Feb 2, 2010 at 8:58 AM, Richard Guenther <rguenther@suse.de> wrote:
> On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
>
>> On Tue, Feb 2, 2010 at 6:44 AM, Richard Guenther <rguenther@suse.de> wrote:
>> > On Tue, 2 Feb 2010, Gabriel Dos Reis wrote:
>> >> >
>> >> > Basically the middle-end sees
>> >> >
>> >> > ?new (&x.m.mem[0]) Y;
>> >> > ?((Y *)&x.m.mem[0])->foo = 1;
>> >> > ?tem = x;
>> >
>> > ... the copy-assignment is seen as a load from 'x' and a store to 'tem'
>> > by the middle-end. ?Supposedly this is what the frontend emits
>> > for copy-assignments of POD types.
>>
>> OK. ?So, the middle-end does not care about the distinction
>> between sub-objects and complete objects?
>
> Correct.
Aha, OK. Indeed, that makes some of C++'s distinction a bit
tricky to translate without the VIEW_CONVERT_EXPR construct.
>
>> If I understand what you are saying so far, it looks to me as
>> if the g++ front-ends is not constructing trees with sufficient
>> information for the middle-end not to get confused.
>
> Correct.
>
>> >> > and the store does not conflict with the load of x as far
>> >> > as TBAA is concerned (there is no Y in the alias-subsets of X).
>> >> > So the scheduler ends up re-ordering the accesses.
>> >> >
>> >> > Thus I'm looking for a way to claim that the copy as above
>> >> > invokes undefined behavior (because of that in-place construction),
>> >> > and that the only valid way to copy 'x' is by using memcpy
>> >> > (independed on that being broken by the middle-end).
>> >>
>> >> Is there a way to tell the middle end that in the assignment
>> >>
>> >> ? ? tem = x;
>> >>
>> >> both members of x.m are indistinguishable in terms of aliasing?
>> >> Or would that be too pessimistic an assumption for the optimizer?
>> >
>> > That wouldn't be even enough. ?Or I don't understand what you
>> > mean with indistinguishable.
>> >
>> >> > So, is the copy tem = x above indeed invoking undefined behavior?
>> >>
>> >> Technically, that assignment is undefined behaviour in C++03, but
>> >> I think a compiler that goes by that would be terrible from practical
>> >> point of view -- that omission was not intended, as indicated in the DR.
>> >> That will be fixed soon, by requiring the compiler to do a memcpy().
>> >
>> > So every copy-assignment of a POD type will have to be a memcpy?
>>
>> The suggestion of memcpy() for
>>
>> ? ? tem = x;
>>
>> is only of copy-n-assignments of union objects. ?This is because at the
>> point of the assignment, we don't necessarily have the information about
>> which branch of the union was last assigned to -- so, a memberwise
>> assignmnet would be invalid. ?For other POD types, ?I think the language
>> is sufficiently clear to keep whatever we currently ?have working.
>
> Ok, so if you have a union embedded in a struct like
>
> ?struct X {
> ? ?union { ... } u;
> ? ?int i;
> ?} a, b;
>
> then
>
> ?a = b;
>
> would be translated into
>
> ?memcpy (&a, &b, sizeof (a));
>
> ? ?or would it be
>
> ?memcpy (&a.u, &b.u, sizeof (a.u));
> ?a.i = b.i;
>
> ? ?I suppose the latter.
yes, you're correct that the latter is the abstract semantics.
>
> If other POD types are supposed to be fine then you probably share
> Marks view of disallowing in-place construction of arbitrary
> types in arbitrary storage with a declared type? ?Thus only
> allow it to be constructed in place of a union, supposedly with
> a char[] member?
Yes, that is my view.
> It would be nice to clarify the standard here.
I'll also raise the issue on the CWG list.
>
>> >> Now, I do not know whether the g++ front-end will insert that memcpy(),
>> >> or if it would let the middle end do it. ?Either way, I think g++ today
>> >> should treat that assignment as making 'tem.m' have the same active
>> >> dynamic type as 'x.m'. ?Now, you're the middle-end expert, and I have
>> >> no clue about how difficult that is to teach GCC with the current
>> >> infrastructure.
>> >
>> > It would be the frontends responsibility to translate tem = x
>> > with using memcpy as only the frontend knows that the copy has
>> > to conflict with stores through arbitrary types.
>>
>> OK, that makes sense. ?Assuming the front-ends issues memcpy for
>> union objects, would we still be running into this issue? ?That is I'm
>> trying to understand whether there are other aspects that we may have
>> forgotten (from the middle-end point of view.)
>
> Well, if it would be undefined to do
>
> ?struct X { int i[2]; } a, b;
> ?double *p = new (&a.i[0]) double;
> ?*p = 1.
> ?b = a;
> ?return *(double *)&b.i[0];
>
> because we place the double not into a union then yes, I think
> issueing memcpy for union objects should be enough.
OK. It looks to me as we have a plan.
>
> Note that the middle-end is fine with
>
> ?union { int i; short s; float f; } a, b;
> ?a.i = 0;
> ?b = a;
> ?return b.i;
>
> The active member of a is transferred to b.
OK.
>
> It is just that if there is a type constructed at the place of the
> union that is not in the type list of its members (as in the
> original example), then we are lost. ?Note that placing a char
> member in the union doesn't fix this from the middle-end - only
> using memcpy for the assignment does.
I believe that is a reasonable requirement -- one that is actually being
followed in spirit to resolve the copy-n-assignment semantics of union
objects by CWG.
Thanks!
-- Gaby