This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] accept flexible arrays in struct in unions (c++/71912 - [6/7 regression])
- From: Jason Merrill <jason at redhat dot com>
- To: Martin Sebor <msebor at gmail dot com>
- Cc: Gcc Patch List <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 1 Aug 2016 10:21:39 -0400
- Subject: Re: [PATCH] accept flexible arrays in struct in unions (c++/71912 - [6/7 regression])
- Authentication-results: sourceware.org; auth=none
- References: <57927079.5080400@gmail.com> <5793A6F4.2090900@gmail.com> <69dc2b8d-1841-92b7-20ed-6e0a899435f2@redhat.com> <579BE53E.7070002@gmail.com> <6a6e323e-dbbb-f903-2492-e8ed26e01d65@redhat.com> <579E5F1E.8060603@gmail.com>
On Sun, Jul 31, 2016 at 4:27 PM, Martin Sebor <msebor@gmail.com> wrote:
> On 07/31/2016 10:28 AM, Jason Merrill wrote:
>> On Fri, Jul 29, 2016 at 7:22 PM, Martin Sebor <msebor@gmail.com> wrote:
>>> On 07/26/2016 12:53 PM, Jason Merrill wrote:
>>>> On 07/23/2016 01:18 PM, Martin Sebor wrote:
>>>>>
>>>>> + /* A pair of the first non-static non-empty data members following
>>>>> + either the flexible array member, if found, or the zero-length
>>>>> + array member otherwise. AFTER[1] refers to the first such data
>>>>> + member of a union that the struct containing the flexible array
>>>>> + member or zero-length array is a member, or NULL when no such
>>>>> + union exists. AFTER[0] refers to the first such data member
>>>>> + that is not a member of such a union. */
>>>>
>>>> This is pretty hard to follow, could you add an example? Why do we want
>>>> to track these separately rather than look at DECL_CONTEXT at diagnostic
>>>> time?
>>>
>>> Sure, I've added an example.
>>>
>>> Whether or not a given flexible array member is valid depends not
>>> only on the context in which it's defined (its enclosing class) but
>>> also on the members of other classes whose objects may be defined
>>> in the same or other contexts (e.g., enclosing structs). I don't
>>> know of a way to reach those other members from the context of
>>> the ARRAY.
>>
>> Not from the context of the array, but when we see a field following the
>> flexible array (or aggregate ending in a flexible array), can't we look
>> at the DECL_CONTEXT of that field and see whether it's a union or not?
>
> I don't think that would work in cases like this:
>
> struct S1 {
> struct S2 { int i, a[]; } s2;
> union U { int x; } u;
> };
>
> that need to be treated differently from this one:
>
> union U1 {
> struct S { int i, a[]; } s;
> union U2 { int x; } u2;
> };
Ah, I'm thinking of the following field as u/u2 rather than x. Why
does it improve clarity to look inside U/U2 for a following field?
>>> + For example, in the following, the flexible array member
>>> + S::U::X::a overlaps S::U::Y::i and so AFTER[1] is set to refer to
>>> + the latter. This potential problem is independent of union U's
>>> + membership in struct S. In addition, in the definition of struct
>>> + S, S::U::x::a is followed by S::z, and so AFTER[0] is set to refer
>>> + to the latter. The two problems result in two diagnostics, the
>>> + first one being a pedantic warning and the second a hard error.
>>> +
>>> + struct S {
>>> + union U {
>>> + struct X { int i, a[]; } x;
>>> + struct Y { long i, a[]; } y;
>>> + } u;
>>> + int z;
>>> + };
>>
>>
>> Hmm, I'm not convinced that the first problem is really a problem. Only
>> one of the union members is active at any time, so overlapping with
>> another union member seems is irrelevant.
>>
>> I also can't find the language in the C standard that prohibits nesting
>> a struct ending with a flexible array in another struct or union, do you
>> have a citation?
>
> 6.7.2.1, p3:
>
> A structure or union shall not contain a member with incomplete
> or function type [...], except that the last member of a structure
> with more than one named member may have incomplete array type; such
> a structure (and any union containing, possibly recursively, a member
> that is such a structure) shall not be a member of a structure or
> an element of an array.
Ah, thanks. I note that this says "structure or union" at the
beginning of the paragraph but not at the end, which suggests strongly
to me that such a structure can be a member of a union.
>>>>> + /* The type in which an anonymous struct or union containing ARRAY
>>>>> + is defined or null if no such anonymous struct or union exists. */
>>>>> + tree anonctx;
>>>>
>>>> It seems clearer to find this at diagnostic time by following
>>>> TYPE_CONTEXT.
>>>
>>> I tried this approach and while it's doable (with recursion) I'm
>>> not happy with the results. The diagnostics point to places that
>>> I think are unclear. For example, given:
>>>
>>> struct A { int i, a[]; };
>>> struct B { long j, b[]; };
>>> struct D: A, B { };
>>
>> I don't see any anonymous aggregates in this example, so how does
>> anonctx come into it?
>
> If there is no anonctx then diagnose_flexrrays will have to figure
> out whether the array is a member of an anonymous struct and if so,
> find the struct of which it's effectively a member, and otherwise
> use the array's immediate context. The above was an example of
> the result of a simple implementation of your suggestion.
And I don't understand why it would change the output on this testcase.
> As I said, the code could probably be tweaked to produce the same result
> as it does now in both cases (anonymous and not), but at the cost
> of additional complexity and, IMO, to the detriment of clarity.
> What aspect of clarity do you find lacking in the current approach?
It just seems unnecessarily complicated. anonctx is set to the
innermost non-anonymous-aggregate class containing the flexible array,
yes? That is trivially derived from the array decl.
Looking at this more closely, it seems that the problem is that
"innermost non-anon class" isn't what you want for the example above;
for a that is A, and a *is* at the end of A. So it seems that if a
were inside an anonymous struct inside A, you would get the same wrong
diagnostic output you were seeing with your anon_context function.
>>>>> if (fmem == &flexmems
>>>>> && !TYPE_ANONYMOUS_P (t) && !anon_aggrname_p (TYPE_IDENTIFIER (t)))
>>>>
>>>> I think you want ANON_AGGR_TYPE_P here, too.
>>>
>>> Here, ANON_AGGR_TYPE_P(t) returns false for anonymous structs such
>>> as the one below and using it would cause false positives:
>>>
>>> struct S { int i; struct { int a[]; }; };
>>
>> Ah, it hasn't been set yet here because we don't know yet whether we're
>> declaring a named member or an anonymous aggregate. So yes,
>> TYPE_ANONYMOUS_P is the right test here. But why do you also check
>> anon_aggrname_p directly, when it's used to implement TYPE_ANONYMOUS_P?
>
> Because TYPE_ANONYMOUS_P is set for unnamed structs that are not
> anonymous, as in
>
> struct S { int i; struct { int a[]; } s; };
Yes, but so is anon_aggrname_p (TYPE_IDENTIFIER). And isn't that what
you want here? When we call check_flexarrays for the nested struct
itself, we've only seen as far as the closing brace, so we don't know
if the struct will declare a named member or not.
> It looks like what I actually need here is anon_aggrname_p alone.
Why is that better than the anon_aggrname_p call inside
TYPE_ANONYMOUS_P? The difference between TYPE_ANONYMOUS_P and
anon_aggrname_p (TYPE_IDENTIFIER (t)) is in its handling of
typedef struct { ... } name;
where TYPE_ANONYMOUS_P is false (because it has a name for linkage
purposes) but anon_aggrname_p (TYPE_IDENTIFIER (t)) is true. I don't
see why you would care about this distinction. The typedef doesn't
declare a member, so it seems uninteresting to check_flexarrays.
Speaking of which, why does your latest patch look at typedefs? What
are you missing by looking just at FIELD_DECLs? Even anonymous unions
have a FIELD_DECL, though you may need to exempt them from the
DECL_ARTIFICIAL check.
> Let me change that before committing the patch or posting a new
> version if one is still needed.
>
> FWIW, I spent lots of time last week learning how these different
> macros map on to language features and under what circumstances.
> I find some of them quite hard to work with because they add
> a layer of complexity between concepts in the language and their
> implementation in GCC. In cases like this one when their names
> are deceptively close to the language concepts but they don't
> quite map 1-to-1, they can be downright misleading.
Do you have ideas about how to improve the naming? Perhaps change
TYPE_ANONYMOUS_P to TYPE_NO_LINKAGE_NAME?
Jason