Bug 108896 - provide "element_count" attribute to give more context to __builtin_dynamic_object_size() and -fsanitize=bounds
Summary: provide "element_count" attribute to give more context to __builtin_dynamic_o...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: 15.0
Assignee: qinzhao
URL:
Keywords:
Depends on:
Blocks: 111567
  Show dependency treegraph
 
Reported: 2023-02-22 21:26 UTC by Kees Cook
Modified: 2024-07-25 13:54 UTC (History)
16 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
patch for C FE to add size expressions to VM types in structs (2.50 KB, patch)
2023-03-29 16:12 UTC, Martin Uecker
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kees Cook 2023-02-22 21:26:58 UTC
Frequently a structure containing a flexible array member will also contain a member where the count of array elements is stored. For example:

struct foo {
    ...
    unsigned int count;
    ...
    int data[];
};

struct foo *allocate_foo(unsigned int how_many)
{
    struct foo *p;

    p = malloc(sizeof(*p) + how_many * sizeof(*byte_array));
    p->count = how_many;

    return p;
}

While __builtin_dynamic_object_size(p->data, 1) will know the size within "allocate_foo" due to malloc's __alloc_size hinting, this information is immediately lost on return. However, the information _is_ still available in p->count, but the compiler has no way to know about it.

Please provide a struct member attribute "element_count" that can be used to associate the size of a flexible array to another struct member. For example:

struct foo {
    ...
    unsigned int count;
    ...
    int data[] __attribute__((__element_count__(count)));
};

Now any later examination of the size of "data" can be calculated. For example, this equality will hold true:

    __builtin_dynamic_object_size(p->data) == p->count * sizeof(*p->data)

and -fsanitize-bounds can examine this as well, to trap:

    p->data[index] = ...; /* traps when index < 0, or index >= p->count */
Comment 1 Kees Cook 2023-02-22 21:31:16 UTC
The corresponding Clang feature request is here: https://github.com/llvm/llvm-project/issues/60928
Comment 2 Richard Biener 2023-02-23 08:44:07 UTC
Iff only (GNU) C would accept the following ...

struct foo {
    ...
    unsigned int count;
    ...
    int data[count];
};
Comment 3 Jakub Jelinek 2023-02-23 09:10:07 UTC
(In reply to Richard Biener from comment #2)
> Iff only (GNU) C would accept the following ...
> 
> struct foo {
>     ...
>     unsigned int count;
>     ...
>     int data[count];
> };

Well, that I think conflicts with the variable length structures GNU extension, where
if the array size of a field isn't a constant expression, it is an expression evaluated at runtime once (SAVE_EXPR) to determine the field size.
Here we are talking about something similar to what Fortran wants with its deferred length arrays, essentially to have the size evaluated each time it is accessed.
With the data[count] form even if it would be disambiguated the question is if we want to otherwise treat it like normal flexible array member e.g. for sizeof etc. and only treat it specially for __bdos, or if it would affect say sizeof too.
I don't see how sizeof (struct foo) could be treated differently from flexible array member, because one doesn't have an object on which count can be evaluated, but perhaps
struct foo f;
f.count = 24;
sizeof (f.data) could change.
Comment 4 Martin Uecker 2023-02-24 15:44:32 UTC
What could work is using a subset of designator syntax.

struct {
  int count;
  char data[.count];
};

But I do not think this should be a flexible array member that is simply ignored except for sizeof but a complete type (maybe with some restrictions). 

Then we could also have

struct {
  int count;
  char (*buf)[.count];
};

which would be incredible useful.
Comment 5 qinzhao 2023-03-01 22:54:22 UTC
(In reply to Jakub Jelinek from comment #3)
> (In reply to Richard Biener from comment #2)
> > Iff only (GNU) C would accept the following ...
> > 
> > struct foo {
> >     ...
> >     unsigned int count;
> >     ...
> >     int data[count];
> > };
> 
> Well, that I think conflicts with the variable length structures GNU
> extension, where
> if the array size of a field isn't a constant expression, it is an
> expression evaluated at runtime once (SAVE_EXPR) to determine the field size.

VLA is only legal inside function scopes. 
but this new extension will be legal at file scope.

will this fact be used to distinguish these two?

> Here we are talking about something similar to what Fortran wants with its
> deferred length arrays, essentially to have the size evaluated each time it
> is accessed.
> With the data[count] form even if it would be disambiguated the question is
> if we want to otherwise treat it like normal flexible array member e.g. for
> sizeof etc. and only treat it specially for __bdos, or if it would affect
> say sizeof too.

the immediate purpose of this new extension is used for __bdos.
is there any benefit if it will affect sizeof()?

> I don't see how sizeof (struct foo) could be treated differently from
> flexible array member, because one doesn't have an object on which count can
> be evaluated, but perhaps
> struct foo f;
> f.count = 24;
> sizeof (f.data) could change.

what's the major purpose of enabling this?
Comment 6 Kees Cook 2023-03-01 23:27:57 UTC
I really want to avoid the changes to sizeof() -- this will confuse a lot of other things. Sizeof is expected to be a constant expression, for example.

I think the attribute is best since it avoids colliding with anything else.
Comment 7 Martin Uecker 2023-03-02 15:50:36 UTC
An attribute is certainly simpler and should be easy to add.

I proposed similar extension for C23 and there was some interest,
but I did not have time to follow up.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2660.pdf


Sizeof is not a constant expression in ISO C for a VLA and it is not a constant expression if the struct contains a VLA  (GNU extension).  So this is already the case and nothing would need to change.  It would also potentially avoid mistakes when computing the size of such a struct. But the rules for initialization are not so clear.

I do not think it is a good idea to differentiate between file scope structs and others. This would be confusing. 

Considering that the GNU extensions is rarely used, one could consider redefining the meaning of

int n = 1;
struct {
  int n;
  char buf[n];
};

so that the 'n' refers to the member. Or we add a new syntax similar to designators (which intuitively makes sense to me).
Comment 8 qinzhao 2023-03-02 17:34:37 UTC
(In reply to Martin Uecker from comment #7)
> An attribute is certainly simpler and should be easy to add.
yes.
> 
> I proposed similar extension for C23 and there was some interest,
> but I did not have time to follow up.
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2660.pdf
very interesting proposal!
are there any discussions on this proposal? if so, can you point me to them?
> 
> 
> Sizeof is not a constant expression in ISO C for a VLA and it is not a
> constant expression if the struct contains a VLA  (GNU extension).  So this
> is already the case and nothing would need to change. It would also potentially  
> avoid mistakes when computing the size of such a struct.
agreed.
However, my understanding is: VLA is only valid inside a function scope. GCC use a special SAVE_EXPR to record its size expression. and evaluated during runtime only once. 
when this variable length concept is extended to global scope, not sure how to implement the size expression? need some study here.

>  But the
> rules for initialization are not so clear.
shall we make this clear?
> 
> I do not think it is a good idea to differentiate between file scope structs
> and others. This would be confusing.
Yes. agreed.

this proposal basically is to extend the VLA concept from function scope to global scope. is my understanding correct?

> 
> Considering that the GNU extensions is rarely used, one could consider
> redefining the meaning of
> 
> int n = 1;
> struct {
>   int n;
>   char buf[n];
> };
> 
> so that the 'n' refers to the member. Or we add a new syntax similar to
> designators (which intuitively makes sense to me).
designator might be better IMO.

a question here is:

for the following nested structure: 

struct object {
        ...
        char items;
        ...
        struct inner {
                ...
                int flex[];
        };
} *ptr;

what kind of syntax is good to represent the upper bound of "flex" in the inner struct with "items" in the outer structure? any suggestion?
Comment 9 Martin Uecker 2023-03-02 18:17:02 UTC
Am Donnerstag, dem 02.03.2023 um 17:34 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #8 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #7)
> > An attribute is certainly simpler and should be easy to add.
> yes.
> > 
> > I proposed similar extension for C23 and there was some interest,
> > but I did not have time to follow up.
> > 
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2660.pdf
> very interesting proposal!
> are there any discussions on this proposal? if so, can you point me to them?

One has to check the minutes from the WG14 meetings. You
will find those at the website. But I forgot at which
meeting it was discussed. Although completeness
and quality of the minutes varies, so I am not sure how
interesting this is.

https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log

There were thee follow-up proposals which I advanced for 
C23, but I changed jobs and most of them got delayed too
much for C23.

But we made variably modified types mandatory in C23 to
help with bounds checking and this already works quite
nicely with GCC / Clang:

https://godbolt.org/z/ddfsdWPMj

> > 
> > 
> > Sizeof is not a constant expression in ISO C for a VLA and it is not a
> > constant expression if the struct contains a VLA  (GNU extension).  So this
> > is already the case and nothing would need to change. It would also potentially  
> > avoid mistakes when computing the size of such a struct.
> agreed.
> However, my understanding is: VLA is only valid inside a function scope. GCC
> use a special SAVE_EXPR to record its size expression. and evaluated during
> runtime only once. 

Yes, this is correct.

> when this variable length concept is extended to global scope, not sure how to
> implement the size expression? need some study here.

Here, we want to use a member of the struct as a size 
expression. This could work equally at function and file scope.
But the semantics need to be worked out.  I have started to work
on a patch for GCC a couple of weeks ago using PLACEHOLDER_EXPR,
but did not get very far.

The idea is to evaluate the size expression whenever the member
with the size is accesses. If the size is not set before, this
would be undefined behavior.

Other languages such as Ada support this, so in principle this
should be a piece of cake.

> 
> >  But the
> > rules for initialization are not so clear.
> shall we make this clear?

We should...

> > 
> > I do not think it is a good idea to differentiate between file scope structs
> > and others. This would be confusing.
> Yes. agreed.
> 
> this proposal basically is to extend the VLA concept from function scope to
> global scope. is my understanding correct?

I would say the idea is to allow size expressions to refer
to member of a struct instead of only automatic variables.

> 
> > 
> > Considering that the GNU extensions is rarely used, one could consider
> > redefining the meaning of
> > 
> > int n = 1;
> > struct {
> >   int n;
> >   char buf[n];
> > };
> > 
> > so that the 'n' refers to the member. Or we add a new syntax similar to
> > designators (which intuitively makes sense to me).
> designator might be better IMO.
> 
> a question here is:
> 
> for the following nested structure: 
> 
> struct object {
>         ...
>         char items;
>         ...
>         struct inner {
>                 ...
>                 int flex[];
>         };
> } *ptr;
> 
> what kind of syntax is good to represent the upper bound of "flex" in the inner
> struct with "items" in the outer structure? any suggestion?

I would disallow it. At least at first. It also raises some
questions: For example, one could form a pointer to the inner
struct, and then it is not clear how 'items' could be accessed
anymore.



Martin
Comment 10 Martin Uecker 2023-03-02 18:34:47 UTC
And to get bounds checking for the flexible array member today,
one could use a macro to attach the bound back to the array
on member access.

https://godbolt.org/z/GbaoYrhav

But the bound from the type is not fed back  into
 __builtin_dynamic_object_size(), so this currently
only helps for direct array refs. But this is something we
may be able to improve.
Comment 11 qinzhao 2023-03-02 19:47:56 UTC
(In reply to Martin Uecker from comment #9)
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log
thanks for the info. 
> 
> But we made variably modified types mandatory in C23 to
> help with bounds checking and this already works quite
> nicely with GCC / Clang:
> 
> https://godbolt.org/z/ddfsdWPMj
nice!
can you provide a pointer to the section in C23 that made this change?
> 
> > when this variable length concept is extended to global scope, not sure how to
> > implement the size expression? need some study here.
> 
> Here, we want to use a member of the struct as a size 
> expression. This could work equally at function and file scope.
> But the semantics need to be worked out.  I have started to work
> on a patch for GCC a couple of weeks ago using PLACEHOLDER_EXPR,
> but did not get very far.
> 
> The idea is to evaluate the size expression whenever the member
> with the size is accesses. If the size is not set before, this
> would be undefined behavior.
> 
> Other languages such as Ada support this, so in principle this
> should be a piece of cake.
Oh, Ada can support this already?
how does Ada implement this?
then we can just borrow Ada's implementation idea to implement this in C if this is approved as an GCC extension for C. 

> > this proposal basically is to extend the VLA concept from function scope to
> > global scope. is my understanding correct?
> 
> I would say the idea is to allow size expressions to refer
> to member of a struct instead of only automatic variables.
> 
Okay.

> > a question here is:
> > 
> > for the following nested structure: 
> > 
> > struct object {
> >         ...
> >         char items;
> >         ...
> >         struct inner {
> >                 ...
> >                 int flex[];
> >         };
> > } *ptr;
> > 
> > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > struct with "items" in the outer structure? any suggestion?
> 
> I would disallow it. At least at first. It also raises some
> questions: For example, one could form a pointer to the inner
> struct, and then it is not clear how 'items' could be accessed
> anymore.
> 
Okay.
Comment 12 qinzhao 2023-03-02 19:56:53 UTC
(In reply to Martin Uecker from comment #9)
>
> Here, we want to use a member of the struct as a size 
> expression. This could work equally at function and file scope.
> But the semantics need to be worked out.  I have started to work
> on a patch for GCC a couple of weeks ago using PLACEHOLDER_EXPR,
> but did not get very far.
what kind of semantics your patch will support?
Comment 13 Martin Uecker 2023-03-02 20:07:19 UTC
Am Donnerstag, dem 02.03.2023 um 19:47 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #11 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #9)
> > 
> > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log
> thanks for the info. 
> > 
> > But we made variably modified types mandatory in C23 to
> > help with bounds checking and this already works quite
> > nicely with GCC / Clang:
> > 
> > https://godbolt.org/z/ddfsdWPMj
> nice!
> can you provide a pointer to the section in C23 that made this change?

VLAs and VM types exist since C99 and were made optional in C11.
The minimal change we adopted to make support for VM types 
(but not VLAs) mandatory again was:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2778.pdf

UBSan support in GCC to diagnose such out of bounds accesses
was added here:

https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=04fd785e38c4c37ae4f71704397a27a924baf4d9

> > 
> > > when this variable length concept is extended to global scope, not sure how to
> > > implement the size expression? need some study here.
> > 
> > Here, we want to use a member of the struct as a size 
> > expression. This could work equally at function and file scope.
> > But the semantics need to be worked out.  I have started to work
> > on a patch for GCC a couple of weeks ago using PLACEHOLDER_EXPR,
> > but did not get very far.
> > 
> > The idea is to evaluate the size expression whenever the member
> > with the size is accesses. If the size is not set before, this
> > would be undefined behavior.
> > 
> > Other languages such as Ada support this, so in principle this
> > should be a piece of cake.

> Oh, Ada can support this already?
> how does Ada implement this?

I think using PLACEHOLDER_EXPR that are insert into the size
expression and then replaced later by the struct being accessed, 
e.g.

struct foo {
 int len;
 char buf[PLACEHOLDER_EXPR.len]
};

and then later when we have

struct foo x;

x->buf

we would replace in the size of the type for x->buf the placeholder
with x itself.


> then we can just borrow Ada's implementation idea to implement this in C if
> this is approved as an GCC extension for C. 

Yes, this was what I wanted to do...  My main use case is not flexible
array members but VM types in struct:

struct foo {
  int len;
  char (*buf)[.len];
};


This has less issues because the size of the struct then does not depend
on the length.

But I am still trying to understand how this all works in GCC.


Martin
Comment 14 Bill Wendling 2023-03-03 20:27:58 UTC
(In reply to Martin Uecker from comment #9)
> > > Considering that the GNU extensions is rarely used, one could consider
> > > redefining the meaning of
> > > 
> > > int n = 1;
> > > struct {
> > >   int n;
> > >   char buf[n];
> > > };
> > > 
> > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > designators (which intuitively makes sense to me).
> > designator might be better IMO.
> > 
> > a question here is:
> > 
> > for the following nested structure: 
> > 
> > struct object {
> >         ...
> >         char items;
> >         ...
> >         struct inner {
> >                 ...
> >                 int flex[];
> >         };
> > } *ptr;
> > 
> > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > struct with "items" in the outer structure? any suggestion?
> 
> I would disallow it. At least at first. It also raises some
> questions: For example, one could form a pointer to the inner
> struct, and then it is not clear how 'items' could be accessed
> anymore.
> 

That would be limiting its use in the Linux kernel. It seems that there are ways to refer to struct members already using something like "offsetof":

struct object {
    ...
    char items;
    ...
    struct inner {
        ...
        int flex[] __attribute__((__element_count__(offsetof(struct object, items))));
    };
} *ptr;

The object referenced by "offsetof" would be from the containing structure (see "container_of" macro in Linux).

It has the benefit of not needing to extend C's syntax to allow for designated initializers outside of initialization lists. It also has the benefit of allowing one to reference a variable not in the structure:

const int items;
struct object {
    ...
    char items;
    ...
    struct inner {
        ...
        int flex[] __attribute__((__element_count__(items))); /* References global "items" */
    };
} *ptr;

-bw
Comment 15 Martin Uecker 2023-03-03 21:32:42 UTC
Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot com:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #14 from Bill Wendling <isanbard at gmail dot com> ---
> (In reply to Martin Uecker from comment #9)
> > > > Considering that the GNU extensions is rarely used, one could consider
> > > > redefining the meaning of
> > > > 
> > > > int n = 1;
> > > > struct {
> > > >   int n;
> > > >   char buf[n];
> > > > };
> > > > 
> > > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > > designators (which intuitively makes sense to me).
> > > designator might be better IMO.
> > > 
> > > a question here is:
> > > 
> > > for the following nested structure: 
> > > 
> > > struct object {
> > >         ...
> > >         char items;
> > >         ...
> > >         struct inner {
> > >                 ...
> > >                 int flex[];
> > >         };
> > > } *ptr;
> > > 
> > > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > > struct with "items" in the outer structure? any suggestion?
> > 
> > I would disallow it. At least at first. It also raises some
> > questions: For example, one could form a pointer to the inner
> > struct, and then it is not clear how 'items' could be accessed
> > anymore.
> > 
> 
> That would be limiting its use in the Linux kernel. It seems that there are
> ways to refer to struct members already using something like "offsetof":
> 
> struct object {
>     ...
>     char items;
>     ...
>     struct inner {
>         ...
>         int flex[] __attribute__((__element_count__(offsetof(struct object,
> items))));
>     };
> } *ptr;

This seems to be something completely different. offsetof
computes the offset from the type given in its argument.
But it would not access the value of the member of the
enclosing struct. But it would not work in your example,
because the struct object is incomplete at this point.

So no, you can not use offsetof to reference a member
of an enclosing struct.

> 
> The object referenced by "offsetof" would be from the containing structure (see
> "container_of" macro in Linux).
> 
> It has the benefit of not needing to extend C's syntax to allow for designated
> initializers outside of initialization lists. 

Yes, but that syntax would be intuitive which I would see
as an advantage.

But I am not saying we shouldn't have the attribute first.


> It also has the benefit of
> allowing one to reference a variable not in the structure:
> 
> const int items;
> struct object {
>     ...
>     char items;
>     ...
>     struct inner {
>         ...
>         int flex[] __attribute__((__element_count__(items))); /* References
> global "items" */
>     };
> } *ptr;

Whether you allow this or not has nothing to do with the syntax.

The question is what semantics you attach to this and this is
also a question in your example. 

If you define

struct inner* a = ...

What does it say for  a->flex  ?


Martin










>
Comment 16 Bill Wendling 2023-03-03 23:18:55 UTC
(In reply to Martin Uecker from comment #15)
> Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot com:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > 
> > --- Comment #14 from Bill Wendling <isanbard at gmail dot com> ---
> > (In reply to Martin Uecker from comment #9)
> > > > > Considering that the GNU extensions is rarely used, one could consider
> > > > > redefining the meaning of
> > > > > 
> > > > > int n = 1;
> > > > > struct {
> > > > >   int n;
> > > > >   char buf[n];
> > > > > };
> > > > > 
> > > > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > > > designators (which intuitively makes sense to me).
> > > > designator might be better IMO.
> > > > 
> > > > a question here is:
> > > > 
> > > > for the following nested structure: 
> > > > 
> > > > struct object {
> > > >         ...
> > > >         char items;
> > > >         ...
> > > >         struct inner {
> > > >                 ...
> > > >                 int flex[];
> > > >         };
> > > > } *ptr;
> > > > 
> > > > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > > > struct with "items" in the outer structure? any suggestion?
> > > 
> > > I would disallow it. At least at first. It also raises some
> > > questions: For example, one could form a pointer to the inner
> > > struct, and then it is not clear how 'items' could be accessed
> > > anymore.
> > > 
> > 
> > That would be limiting its use in the Linux kernel. It seems that there are
> > ways to refer to struct members already using something like "offsetof":
> > 
> > struct object {
> >     ...
> >     char items;
> >     ...
> >     struct inner {
> >         ...
> >         int flex[] __attribute__((__element_count__(offsetof(struct object,
> > items))));
> >     };
> > } *ptr;
> 
> This seems to be something completely different. offsetof
> computes the offset from the type given in its argument.
> But it would not access the value of the member of the
> enclosing struct. But it would not work in your example,
> because the struct object is incomplete at this point.
> 
> So no, you can not use offsetof to reference a member
> of an enclosing struct.
> 
"offsetof(struct foo, count)" is a fancy wrapper for "((struct foo *)0)->count", which is resolved during sema, where it does have the full structure definition. For instance, this compiles in C++:

struct a {
	int count;
	int y = ((struct a *)0)->count;
} x;

void foo(struct a *);


> > 
> > The object referenced by "offsetof" would be from the containing structure (see
> > "container_of" macro in Linux).
> > 
> > It has the benefit of not needing to extend C's syntax to allow for designated
> > initializers outside of initialization lists. 
> 
> Yes, but that syntax would be intuitive which I would see
> as an advantage.
> 
Sure, but you have a similar issue to your objection above because you're referencing members of a struct before it's completely defined. :-)

> But I am not saying we shouldn't have the attribute first.
> 
I personally prefer using an attribute than the suggestion to use some other syntax, like:

struct foo {
    int fam[.count];
};

It becomes less intuitive what's going on here. And might conflict with VLA's in structures.

> > It also has the benefit of
> > allowing one to reference a variable not in the structure:
> > 
> > const int items;
> > struct object {
> >     ...
> >     char items;
> >     ...
> >     struct inner {
> >         ...
> >         int flex[] __attribute__((__element_count__(items))); /* References
> > global "items" */
> >     };
> > } *ptr;
> 
> Whether you allow this or not has nothing to do with the syntax.
> 
> The question is what semantics you attach to this and this is
> also a question in your example. 
> 
> If you define
> 
> struct inner* a = ...
> 
> What does it say for  a->flex  ?
> 
I need to point out that I used "offsetof" only as an example. It's possible to create something more robust which will carry along type information, etc., which the current offsetof macro throws away. I should have made that clear.

The sanitizers that see "a->flex" will try to find the correct variable. If they can't, then they won't generate a check. In the case of it referencing a non-field member, it'll use that if it's within scope. If it refers to a field member of a parent container that's not within scope, it'll also not generate a check. It's unfortunate that these checks are done as a "best effort," but it could lead to software changes to improve security checks (like passing a parent structure into a function rather than an inner structure.
Comment 17 Martin Uecker 2023-03-04 07:52:39 UTC
Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot com:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #16 from Bill Wendling <isanbard at gmail dot com> ---
> (In reply to Martin Uecker from comment #15)
> > Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot com:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > > 
> > > --- Comment #14 from Bill Wendling <isanbard at gmail dot com> ---
> > > (In reply to Martin Uecker from comment #9)
> > > > > > Considering that the GNU extensions is rarely used, one could consider
> > > > > > redefining the meaning of
> > > > > > 
> > > > > > int n = 1;
> > > > > > struct {
> > > > > >   int n;
> > > > > >   char buf[n];
> > > > > > };
> > > > > > 
> > > > > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > > > > designators (which intuitively makes sense to me).
> > > > > designator might be better IMO.
> > > > > 
> > > > > a question here is:
> > > > > 
> > > > > for the following nested structure: 
> > > > > 
> > > > > struct object {
> > > > >         ...
> > > > >         char items;
> > > > >         ...
> > > > >         struct inner {
> > > > >                 ...
> > > > >                 int flex[];
> > > > >         };
> > > > > } *ptr;
> > > > > 
> > > > > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > > > > struct with "items" in the outer structure? any suggestion?
> > > > 
> > > > I would disallow it. At least at first. It also raises some
> > > > questions: For example, one could form a pointer to the inner
> > > > struct, and then it is not clear how 'items' could be accessed
> > > > anymore.
> > > > 
> > > 
> > > That would be limiting its use in the Linux kernel. It seems that there are
> > > ways to refer to struct members already using something like "offsetof":
> > > 
> > > struct object {
> > >     ...
> > >     char items;
> > >     ...
> > >     struct inner {
> > >         ...
> > >         int flex[] __attribute__((__element_count__(offsetof(struct object,
> > > items))));
> > >     };
> > > } *ptr;
> > 
> > This seems to be something completely different. offsetof
> > computes the offset from the type given in its argument.
> > But it would not access the value of the member of the
> > enclosing struct. But it would not work in your example,
> > because the struct object is incomplete at this point.
> > 
> > So no, you can not use offsetof to reference a member
> > of an enclosing struct.
> > 
> "offsetof(struct foo, count)" is a fancy wrapper for "((struct foo
> *)0)->count", which is resolved during sema, where it does have the full
> structure definition. For instance, this compiles in C++:
> 
> struct a {
>         int count;
>         int y = ((struct a *)0)->count;
> } x;
> 
> void foo(struct a *);

This is not the case in C: https://godbolt.org/z/P7M6cdnoa

But even we make it resolve the name, which we
have to do for all proposals, it is something  different.

If offsetof it would resolve the count of a different
struct of the same *type* (here a non-existent one at 
address zero). Here we need a self reference to the
same *object*.

...


> > But I am not saying we shouldn't have the attribute first.
> > 
> I personally prefer using an attribute than the suggestion to use some other
> syntax, like:
> 
> struct foo {
>     int fam[.count];
> };
> 
> It becomes less intuitive what's going on here. And might conflict with VLA's
> in structures.

The syntax with the dot would make it not conflict.  But I need
this for this use case

struct foo {
  int count;
  int (*buf)[.count];
};

so that ARRAY_SIZE(*foo->buf) works correctly and also accesses
to foo->buf are bounds checkked.  So it would make sense to 
solve to treat flexible array members in the same way.

But I agree that we should simply add the attribute now also
because it makes it possible to use it for existing code bases.

> > > It also has the benefit of
> > > allowing one to reference a variable not in the structure:
> > > 
> > > const int items;
> > > struct object {
> > >     ...
> > >     char items;
> > >     ...
> > >     struct inner {
> > >         ...
> > >         int flex[] __attribute__((__element_count__(items))); /* References
> > > global "items" */
> > >     };
> > > } *ptr;
> > 
> > Whether you allow this or not has nothing to do with the syntax.
> > 
> > The question is what semantics you attach to this and this is
> > also a question in your example. 
> > 
> > If you define
> > 
> > struct inner* a = ...
> > 
> > What does it say for  a->flex  ?
> > 
> I need to point out that I used "offsetof" only as an example. It's possible to
> create something more robust which will carry along type information, etc.,
> which the current offsetof macro throws away. I should have made that clear.
> 
> The sanitizers that see "a->flex" will try to find the correct variable. If
> they can't, then they won't generate a check. In the case of it referencing a
> non-field member, it'll use that if it's within scope. If it refers to a field
> member of a parent container that's not within scope, it'll also not generate a
> check. It's unfortunate that these checks are done as a "best effort," but it
> could lead to software changes to improve security checks (like passing a
> parent structure into a function rather than an inner structure.

Yes. We could also have an optional warning which warns about accessing
'flex' in a context where 'items' is not accessible.  My point is that
this feature of potentially referring to stuff which may not be accessible
in all cases makes implementation more complicated.

Martin

>
Comment 18 Bill Wendling 2023-03-06 19:15:00 UTC
(In reply to Martin Uecker from comment #17)
> Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot com:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > 
> > --- Comment #16 from Bill Wendling <isanbard at gmail dot com> ---
> > (In reply to Martin Uecker from comment #15)
> > > Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot com:
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > > > 
> > > > --- Comment #14 from Bill Wendling <isanbard at gmail dot com> ---
> > > > (In reply to Martin Uecker from comment #9)
> > > > > > > Considering that the GNU extensions is rarely used, one could consider
> > > > > > > redefining the meaning of
> > > > > > > 
> > > > > > > int n = 1;
> > > > > > > struct {
> > > > > > >   int n;
> > > > > > >   char buf[n];
> > > > > > > };
> > > > > > > 
> > > > > > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > > > > > designators (which intuitively makes sense to me).
> > > > > > designator might be better IMO.
> > > > > > 
> > > > > > a question here is:
> > > > > > 
> > > > > > for the following nested structure: 
> > > > > > 
> > > > > > struct object {
> > > > > >         ...
> > > > > >         char items;
> > > > > >         ...
> > > > > >         struct inner {
> > > > > >                 ...
> > > > > >                 int flex[];
> > > > > >         };
> > > > > > } *ptr;
> > > > > > 
> > > > > > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > > > > > struct with "items" in the outer structure? any suggestion?
> > > > > 
> > > > > I would disallow it. At least at first. It also raises some
> > > > > questions: For example, one could form a pointer to the inner
> > > > > struct, and then it is not clear how 'items' could be accessed
> > > > > anymore.
> > > > > 
> > > > 
> > > > That would be limiting its use in the Linux kernel. It seems that there are
> > > > ways to refer to struct members already using something like "offsetof":
> > > > 
> > > > struct object {
> > > >     ...
> > > >     char items;
> > > >     ...
> > > >     struct inner {
> > > >         ...
> > > >         int flex[] __attribute__((__element_count__(offsetof(struct object,
> > > > items))));
> > > >     };
> > > > } *ptr;
> > > 
> > > This seems to be something completely different. offsetof
> > > computes the offset from the type given in its argument.
> > > But it would not access the value of the member of the
> > > enclosing struct. But it would not work in your example,
> > > because the struct object is incomplete at this point.
> > > 
> > > So no, you can not use offsetof to reference a member
> > > of an enclosing struct.
> > > 
> > "offsetof(struct foo, count)" is a fancy wrapper for "((struct foo
> > *)0)->count", which is resolved during sema, where it does have the full
> > structure definition. For instance, this compiles in C++:
> > 
> > struct a {
> >         int count;
> >         int y = ((struct a *)0)->count;
> > } x;
> > 
> > void foo(struct a *);
> 
> This is not the case in C: https://godbolt.org/z/P7M6cdnoa
> 
Right, it's not legal C syntax. I wanted to point out that it's not an impossible syntax. The lexer and parser have no idea of, nor should they care about, legal types, complete struct definitions, etc. All it sees is a member access. It's semantic analysis that actually determines whether or not it's correct, and if so what it means.

> But even we make it resolve the name, which we
> have to do for all proposals, it is something  different.
> 
> If offsetof it would resolve the count of a different
> struct of the same *type* (here a non-existent one at 
> address zero). Here we need a self reference to the
> same *object*.
> 
If we make it an attribute, we have more control over what the arguments mean, what they refer to, etc.

For the record, I'm not opposed to the idea of using the designated initializer syntax. I just think it's good to explore other ideas before settling on extending C (which is a bit heavy-handed).

> > > But I am not saying we shouldn't have the attribute first.
> > > 
> > I personally prefer using an attribute than the suggestion to use some other
> > syntax, like:
> > 
> > struct foo {
> >     int fam[.count];
> > };
> > 
> > It becomes less intuitive what's going on here. And might conflict with VLA's
> > in structures.
> 
> The syntax with the dot would make it not conflict.  But I need
> this for this use case
> 
> struct foo {
>   int count;
>   int (*buf)[.count];
> };
> 
> so that ARRAY_SIZE(*foo->buf) works correctly and also accesses
> to foo->buf are bounds checkked.  So it would make sense to 
> solve to treat flexible array members in the same way.
> 
> But I agree that we should simply add the attribute now also
> because it makes it possible to use it for existing code bases.
> 
Agreed. I believe we're on the same page (despite what I may write here :-).

> > > > It also has the benefit of
> > > > allowing one to reference a variable not in the structure:
> > > > 
> > > > const int items;
> > > > struct object {
> > > >     ...
> > > >     char items;
> > > >     ...
> > > >     struct inner {
> > > >         ...
> > > >         int flex[] __attribute__((__element_count__(items))); /* References
> > > > global "items" */
> > > >     };
> > > > } *ptr;
> > > 
> > > Whether you allow this or not has nothing to do with the syntax.
> > > 
> > > The question is what semantics you attach to this and this is
> > > also a question in your example. 
> > > 
> > > If you define
> > > 
> > > struct inner* a = ...
> > > 
> > > What does it say for  a->flex  ?
> > > 
> > I need to point out that I used "offsetof" only as an example. It's possible to
> > create something more robust which will carry along type information, etc.,
> > which the current offsetof macro throws away. I should have made that clear.
> > 
> > The sanitizers that see "a->flex" will try to find the correct variable. If
> > they can't, then they won't generate a check. In the case of it referencing a
> > non-field member, it'll use that if it's within scope. If it refers to a field
> > member of a parent container that's not within scope, it'll also not generate a
> > check. It's unfortunate that these checks are done as a "best effort," but it
> > could lead to software changes to improve security checks (like passing a
> > parent structure into a function rather than an inner structure.
> 
> Yes. We could also have an optional warning which warns about accessing
> 'flex' in a context where 'items' is not accessible.  My point is that
> this feature of potentially referring to stuff which may not be accessible
> in all cases makes implementation more complicated.
> 
It does make it more complicated, that's true.

I haven't seen comments on Kees's first example, where "malloc" returns an "__alloc_size" hint that's lost when "p" is returned from the function (at least in Clang). If that information is kept around, it would expand the number of bounds checks we could perform. (Kudos if this is currently the case in GCC.)

-bw
Comment 19 Jakub Jelinek 2023-03-06 19:18:07 UTC
(In reply to Bill Wendling from comment #18)
> > This is not the case in C: https://godbolt.org/z/P7M6cdnoa
> > 
> Right, it's not legal C syntax. I wanted to point out that it's not an
> impossible syntax. The lexer and parser have no idea of, nor should they
> care about, legal types, complete struct definitions, etc. All it sees is a

They do though, in lots of ways.  The reason it works in C++ is that the language
explicitly requires it and the parser just needs to save token sequences of NSDMIs and
member functions defined inside of class and parse them when the class is complete.
We don't want to do that in C.
Comment 20 Martin Uecker 2023-03-06 19:38:15 UTC
Am Montag, dem 06.03.2023 um 19:15 +0000 schrieb isanbard at gmail dot com:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #18 from Bill Wendling <isanbard at gmail dot com> ---
> (In reply to Martin Uecker from comment #17)
> > Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot com:
> > 

...
> 
> > But even we make it resolve the name, which we
> > have to do for all proposals, it is something  different.
> > 
> > If offsetof it would resolve the count of a different
> > struct of the same *type* (here a non-existent one at 
> > address zero). Here we need a self reference to the
> > same *object*.
> > 
> If we make it an attribute, we have more control over what the arguments mean,
> what they refer to, etc.
> 
> For the record, I'm not opposed to the idea of using the designated initializer
> syntax. I just think it's good to explore other ideas before settling on
> extending C (which is a bit heavy-handed).

I agree. An attribute is simple and extending C will take
more care (and work).

The reason I think we should also extend C (together with
WG14) is because this would allow writing code where the
bound is never lost because it is encoded in the type,
while the  __builtin_dynamic_object_size is extremely
useful to enhance existing code bases, but is best
effort only.

So I think we should do both.

...
> 
> > > > > It also has the benefit of
> > > > > allowing one to reference a variable not in the structure:
> > > > > 
> > > > > const int items;
> > > > > struct object {
> > > > >     ...
> > > > >     char items;
> > > > >     ...
> > > > >     struct inner {
> > > > >         ...
> > > > >         int flex[] __attribute__((__element_count__(items))); /* References
> > > > > global "items" */
> > > > >     };
> > > > > } *ptr;
> > > > 
> > > > Whether you allow this or not has nothing to do with the syntax.
> > > > 
> > > > The question is what semantics you attach to this and this is
> > > > also a question in your example. 
> > > > 
> > > > If you define
> > > > 
> > > > struct inner* a = ...
> > > > 
> > > > What does it say for  a->flex  ?
> > > > 
> > > I need to point out that I used "offsetof" only as an example. It's possible to
> > > create something more robust which will carry along type information, etc.,
> > > which the current offsetof macro throws away. I should have made that clear.
> > > 
> > > The sanitizers that see "a->flex" will try to find the correct variable. If
> > > they can't, then they won't generate a check. In the case of it referencing a
> > > non-field member, it'll use that if it's within scope. If it refers to a field
> > > member of a parent container that's not within scope, it'll also not generate a
> > > check. It's unfortunate that these checks are done as a "best effort," but it
> > > could lead to software changes to improve security checks (like passing a
> > > parent structure into a function rather than an inner structure.
> > 
> > Yes. We could also have an optional warning which warns about accessing
> > 'flex' in a context where 'items' is not accessible.  My point is that
> > this feature of potentially referring to stuff which may not be accessible
> > in all cases makes implementation more complicated.
> > 
> It does make it more complicated, that's true.
> 
> I haven't seen comments on Kees's first example, where "malloc" returns an
> "__alloc_size" hint that's lost when "p" is returned from the function (at
> least in Clang). If that information is kept around, it would expand the number
> of bounds checks we could perform. (Kudos if this is currently the case in
> GCC.)

I think not.  But this would not work across TU boundaries anyway.

https://godbolt.org/z/eW9GT579r

Martin
Comment 21 Martin Uecker 2023-03-06 19:57:25 UTC
(In reply to Jakub Jelinek from comment #19)
> (In reply to Bill Wendling from comment #18)
> > > This is not the case in C: https://godbolt.org/z/P7M6cdnoa
> > > 
> > Right, it's not legal C syntax. I wanted to point out that it's not an
> > impossible syntax. The lexer and parser have no idea of, nor should they
> > care about, legal types, complete struct definitions, etc. All it sees is a
> 
> They do though, in lots of ways.  The reason it works in C++ is that the
> language
> explicitly requires it and the parser just needs to save token sequences of
> NSDMIs and
> member functions defined inside of class and parse them when the class is
> complete.
> We don't want to do that in C.

For the parser I have a prototype. It only works for size expressions using members that came before in the source and not members which are not yet declared. For flexible array members this is sufficient. It works by storing the field decl in a list during declaration of struct and then uses the list to lookup the field decl in build_component_ref. 

Martin
Comment 22 Siddhesh Poyarekar 2023-03-06 20:05:49 UTC
(In reply to Martin Uecker from comment #20)
> > I haven't seen comments on Kees's first example, where "malloc" returns an
> > "__alloc_size" hint that's lost when "p" is returned from the function (at
> > least in Clang). If that information is kept around, it would expand the number
> > of bounds checks we could perform. (Kudos if this is currently the case in
> > GCC.)
> 
> I think not.  But this would not work across TU boundaries anyway.
> 
> https://godbolt.org/z/eW9GT579r

It's not that the information is lost in this case, it is just not visible, unlike bug 96503 for example where the information is actually lost.  The example above doesn't work because the compiler is instructed to *not* look inside foo; I'd expect something like LTO to work even across TU boundaries whenever possible.  There's the external linkage gap, but I don't know if that's a relevant motivation for the kernel.

From the kernel (or distribution builds) perspective, __element_count__ will cover situations where it's not straightforward to propagate sizes, e.g. where the target code is an allocator that itself wants to enforce size constraints on objects it returns, beyond the input size request.

This really should have been the way __access__ was implemented, but we tied that attribute to only functions.  Would it be a terrible idea to make __element_count__ more general purpose so that it ends up deprecating __access__?
Comment 23 qinzhao 2023-03-08 16:56:59 UTC
(In reply to Martin Uecker from comment #13)
> 
> VLAs and VM types exist since C99 and were made optional in C11.
> The minimal change we adopted to make support for VM types 
> (but not VLAs) mandatory again was:
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2778.pdf

So, this will be official in C23? i.e, VM types will be mandatory, but VLA will be optional (and later might be deprecated?)

> 
> I think using PLACEHOLDER_EXPR that are insert into the size
> expression and then replaced later by the struct being accessed, 
> e.g.
> 
> struct foo {
>  int len;
>  char buf[PLACEHOLDER_EXPR.len]
> };
> 
> and then later when we have
> 
> struct foo x;
> 
> x->buf
> 
> we would replace in the size of the type for x->buf the placeholder
> with x itself.

I see. Yes, this will resolve the implementation difficulty for filling the size of the FAM field when the size is the previous declared field in the same structure. 
 

> Yes, this was what I wanted to do...  My main use case is not flexible
> array members but VM types in struct:
> 
> struct foo {
>   int len;
>   char (*buf)[.len];
> };
> 
> 
> This has less issues because the size of the struct then does not depend
> on the length.

a little confused here:
what's the definition of VM type? it's size will not depend on the ".len" ?
Comment 24 qinzhao 2023-03-08 17:13:58 UTC
(In reply to Martin Uecker from comment #15)
> 
> Yes, but that syntax would be intuitive which I would see
> as an advantage.

Yes, I agree.
> 
> But I am not saying we shouldn't have the attribute first.

both the new attribute and the C's syntax extension might be needed at the same time, I think.

1. Attribute might be better for changing the existing source code to make them bound-checking friendly;
2. new code can use the C's syntax change, and hopefully this new syntax extension can be made into next C language standard.

however, I think that both the new attribute and the new C syntax extension should support the similar user interface. We might need to decide on this first.

right now, the user interface we cannot agreed on is:

whether we should support the following nested annotation (either with attribute or with the C syntax extension):

struct object {
        ...
        unsigned int items;
        ...
        struct inner {
                ...
                int flex[];
        };
} *ptr;


My opinion is: No, we should not support this, it will make the implementation much more complicated both for attribute and for C syntax extension. 

But I am not very sure on this yet. 

Is the PLACEHOLDER_EXPR able to resolve this?
Comment 25 qinzhao 2023-03-08 17:36:39 UTC
(In reply to Martin Uecker from comment #17)
> The syntax with the dot would make it not conflict.  But I need
> this for this use case
> 
> struct foo {
>   int count;
>   int (*buf)[.count];
> };
> 
> so that ARRAY_SIZE(*foo->buf) works correctly and also accesses
> to foo->buf are bounds checkked.  So it would make sense to 
> solve to treat flexible array members in the same way.
Yes, I agree. 

then the size of the array type (even though it's not a constant) will be embedded in the TYPE consistently. therefore simplify compiler's implementation and make it consistently. 

> 
> But I agree that we should simply add the attribute now also
> because it makes it possible to use it for existing code bases.
Yes.
Comment 26 qinzhao 2023-03-08 17:38:49 UTC
(In reply to Martin Uecker from comment #20)
> 
> I agree. An attribute is simple and extending C will take
> more care (and work).
> 
> The reason I think we should also extend C (together with
> WG14) is because this would allow writing code where the
> bound is never lost because it is encoded in the type,
> while the  __builtin_dynamic_object_size is extremely
> useful to enhance existing code bases, but is best
> effort only.
> 
> So I think we should do both.
> 
Agreed.
Comment 27 qinzhao 2023-03-08 17:43:23 UTC
(In reply to Siddhesh Poyarekar from comment #22)

> This really should have been the way __access__ was implemented, but we tied
> that attribute to only functions.  Would it be a terrible idea to make
> __element_count__ more general purpose so that it ends up deprecating
> __access__?
I feel that __access__ and __element_count_ have some overlapping, but serve different purposes, might not be good to merge them?
Comment 28 Martin Uecker 2023-03-08 17:48:12 UTC
Am Mittwoch, dem 08.03.2023 um 16:56 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #23 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #13)
> > 
> > VLAs and VM types exist since C99 and were made optional in C11.
> > The minimal change we adopted to make support for VM types 
> > (but not VLAs) mandatory again was:
> > 
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2778.pdf
> 
> So, this will be official in C23? i.e, VM types will be mandatory, but VLA will
> be optional (and later might be deprecated?)

Yes, this is in C23. But no, the plan is *not* to deprecate VLAs.

I really hope we can make VLAs mandatory at some point as well
and I will push very much in this direction.

This reason is that VLAs are very good properties:

- automatic memory management
- run-time bounds (which can be used for checking)

The problems with VLA are in my opinion caused by poor
implementation (e.g. no stack probing etc) and bad
code generation (Linus was not happy about this) and
not because anything is fundamentally bad about them
from the point of language semantics.


They are now often replaced by fixed size buffers on
the stack, which I think is a step backwards form the
point of security because the information about the
precise run-time bound is lost.


> > 
> > I think using PLACEHOLDER_EXPR that are insert into the size
> > expression and then replaced later by the struct being accessed, 
> > e.g.
> > 
> > struct foo {
> >  int len;
> >  char buf[PLACEHOLDER_EXPR.len]
> > };
> > 
> > and then later when we have
> > 
> > struct foo x;
> > 
> > x->buf
> > 
> > we would replace in the size of the type for x->buf the placeholder
> > with x itself.
> 
> I see. Yes, this will resolve the implementation difficulty for filling the
> size of the FAM field when the size is the previous declared field in the same
> structure. 
> 
> 
> > Yes, this was what I wanted to do...  My main use case is not flexible
> > array members but VM types in struct:
> > 
> > struct foo {
> >   int len;
> >   char (*buf)[.len];
> > };
> > 
> > 
> > This has less issues because the size of the struct then does not depend
> > on the length.
> 
> a little confused here:
> what's the definition of VM type? it's size will not depend on the ".len" ?

VM = variably modified.  In C it is a type which is derived from
a VLA which is not necessarily itself a VLA, e.g. a pointer to
a VLA. But a VLA is also a VM type.

Martin


>
Comment 29 Martin Uecker 2023-03-08 18:37:35 UTC
Am Mittwoch, dem 08.03.2023 um 17:13 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #24 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #15)
> > 

...
> 
> however, I think that both the new attribute and the new C syntax extension
> should support the similar user interface. We might need to decide on this
> first.

I think there are some fundamental differences: In one
case the size is encoded into the type and in the other
it is just an annotation that can be ignored.

> 
> right now, the user interface we cannot agreed on is:
> 
> whether we should support the following nested annotation (either with
> attribute or with the C syntax extension):
> 
> struct object {
>         ...
>         unsigned int items;
>         ...
>         struct inner {
>                 ...
>                 int flex[];
>         };
> } *ptr;
> 
> 
> My opinion is: No, we should not support this, it will make the implementation
> much more complicated both for attribute and for C syntax extension. 
> 
> But I am not very sure on this yet. 

I am fine with supporting it, but one needs to decide
on the semantics in the case where inner is not accessed
via the outer struct.
> 
> Is the PLACEHOLDER_EXPR able to resolve this?

I don't know yet. 

Martin


>
Comment 30 qinzhao 2023-03-08 19:20:09 UTC
(In reply to Martin Uecker from comment #28)

> The problems with VLA are in my opinion caused by poor
> implementation (e.g. no stack probing etc) and bad
> code generation (Linus was not happy about this) and
> not because anything is fundamentally bad about them
> from the point of language semantics.

you mean gcc's implementation? how about other compilers?

> VM = variably modified.  In C it is a type which is derived from
> a VLA which is not necessarily itself a VLA, e.g. a pointer to
> a VLA. But a VLA is also a VM type.

Okay.
> 
> > struct foo {
> >   int len;
> >   char (*buf)[.len];
> > };
> > 
> > 
> > This has less issues because the size of the struct then does not depend
> > on the length.

but I am still not clear on why "the size of the above struct 'foo' does not depend on the .len?"  in my opinion, it should depend on .len. do I miss anything here?
Comment 31 qinzhao 2023-03-08 19:47:45 UTC
(In reply to Martin Uecker from comment #29)
> > however, I think that both the new attribute and the new C syntax extension
> > should support the similar user interface. We might need to decide on this
> > first.
> 
> I think there are some fundamental differences: In one
> case the size is encoded into the type and in the other
> it is just an annotation that can be ignored.

Yes, understood.

I might not make myself clear in the previous message. what I mean by the "the similar user interfaces" is:

both might support the following:

1. a declared variable as the size:

**C syntax extension:
unsigned int length;
struct object {
   int others;
   char flex[length];  
}

**corresponding attribute:
unsigned int length;
struct object {
   int others;
   char flex[] __attribute__((__element_count__(length)));  
}

2. a declared field of the same structure as the size:
**C syntax extension:
struct object {
   unsigned int length;
   char flex[.length];  
}

**corresponding attribute:
struct object {
   unsigned int length;
   char flex[] __attribute__((__element_count__(.length)));  
}

3. expressions including declared variables and declared fields of the same structure as the size:

**C syntax extension:
unsigned int item;
struct object {
   unsigned int ratio;
   char flex[item * .ratio];  
}

**corresponding attribute:
unsigned int item;
struct object {
   unsigned int ratio;
   char flex[] __attribute__((__element_count__(item * .ratio)));  
}

Not sure on the following case:

4. a declared field of the containing structure as the size:

**C syntax extension:
struct object {
   unsigned int length;
   struct inner {
     ...
     int flex[..length];
   };
}

**corresponding attribute:
struct object {
   unsigned int length;
   struct inner {
     ...
     char flex[] __attribute__((__element_count__(..length)));
   }  
}

what else from the user's point of view?
> 
> > 
> > right now, the user interface we cannot agreed on is:
> > 
> > whether we should support the following nested annotation (either with
> > attribute or with the C syntax extension):
> > 
> > struct object {
> >         ...
> >         unsigned int items;
> >         ...
> >         struct inner {
> >                 ...
> >                 int flex[];
> >         };
> > } *ptr;
> > 
> I am fine with supporting it, but one needs to decide
> on the semantics in the case where inner is not accessed
> via the outer struct.
Yes, that's right.
Comment 32 Martin Uecker 2023-03-08 20:20:43 UTC
Am Mittwoch, dem 08.03.2023 um 19:20 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #30 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #28)
> 
> > The problems with VLA are in my opinion caused by poor
> > implementation (e.g. no stack probing etc) and bad
> > code generation (Linus was not happy about this) and
> > not because anything is fundamentally bad about them
> > from the point of language semantics.
> 
> you mean gcc's implementation? how about other compilers?

I mostly mean compilers without stack probing (which GCC
has but is not activated by default). Code quality is
probably still bad for most compilers.

> 
> > VM = variably modified.  In C it is a type which is derived from
> > a VLA which is not necessarily itself a VLA, e.g. a pointer to
> > a VLA. But a VLA is also a VM type.
> 
> Okay.
> > 
> > > struct foo {
> > >   int len;
> > >   char (*buf)[.len];
> > > };
> > > 
> > > 
> > > This has less issues because the size of the struct then does not depend
> > > on the length.
> 
> but I am still not clear on why "the size of the above struct 'foo' does not
> depend on the .len?"  in my opinion, it should depend on .len. do I miss
> anything here?

Here the last element is not a flexible array member but
a pointer to an array of size len. The size of the pointer is
fixed.

Martin



>
Comment 33 qinzhao 2023-03-08 20:47:57 UTC
(In reply to Martin Uecker from comment #32)

> > > > struct foo {
> > > >   int len;
> > > >   char (*buf)[.len];
> > > > };
> Here the last element is not a flexible array member but
> a pointer to an array of size len. The size of the pointer is
> fixed.

Oh, that's right.

then, just curious on how did you embed the '.len' information into IR?
Comment 34 Martin Uecker 2023-03-29 16:12:42 UTC
Created attachment 54787 [details]
patch for C FE to add size expressions to VM types in structs


Here is a preliminary patch for C FE just to see how this could work
look like VM-types in structs.

struct foo {
  int n;
  int (*buf)[.n];
};

It works with UBSan, but it isn't clear how this could pass the
information to the object size pass. This also does not work for
parameters: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109334

So it seems for an attribute it might make sense to implement
it differently.
Comment 35 qinzhao 2023-04-03 20:29:41 UTC
(In reply to Martin Uecker from comment #34)
> Created attachment 54787 [details]
> patch for C FE to add size expressions to VM types in structs

thanks a lot for the patch.

could you please provide a small testing case that works with this patch that I can take a further look?
I tried very simple one, didn't compile:

struct P {
  int k;
  int x[.k];
};

void
foo (int n)
{
  struct P p;
  p.k = n;
  return;
}

> 
> It works with UBSan, but it isn't clear how this could pass the
> information to the object size pass. This also does not work for
> parameters: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109334

From your change, I can see that you put the ".k" info to the index of the array type for x[.k], so I guess that other passes can refer the index of the array type to get the max size of this array.

> 
> So it seems for an attribute it might make sense to implement
> it differently.
implement should be different. but the functionality of the user interface is better to be kept consistent between attribute and language extension. i.e

in addition to the basic:

struct P {
  int k;
  int x[.k];
};

will you support expressions:

struct P {
  int k;
  int x[.k * 4];
}
?
or other more complicated syntax?
Comment 36 Martin Uecker 2023-04-03 21:53:42 UTC
Am Montag, dem 03.04.2023 um 20:29 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #35 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #34)
> > Created attachment 54787 [details]
> > patch for C FE to add size expressions to VM types in structs
> 
> thanks a lot for the patch.
> 
> could you please provide a small testing case that works with this patch that I
> can take a further look?

I considered pointers to arrays:

struct P {
  int n;
  char (*buf)[.n];
};

the FAM case needs more work and I guess there are
still many other problems with the patch.

> I tried very simple one, didn't compile:
> 
> struct P {
>   int k;
>   int x[.k];
> };
> 
> void
> foo (int n)
> {
>   struct P p;
>   p.k = n;
>   return;
> }
> 
> > 
> > It works with UBSan, but it isn't clear how this could pass the
> > information to the object size pass. This also does not work for
> > parameters: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109334
> 
> From your change, I can see that you put the ".k" info to the index of the
> array type for x[.k], so I guess that other passes can refer the index of the
> array type to get the max size of this array.
> 

The comments in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104970

imply that the size information does not survive long enough

> > 
> > So it seems for an attribute it might make sense to implement
> > it differently.
> implement should be different. but the functionality of the user interface is
> better to be kept consistent between attribute and language extension. i.e
> 
> in addition to the basic:
> 
> struct P {
>   int k;
>   int x[.k];
> };
> 
> will you support expressions:
> 
> struct P {
>   int k;
>   int x[.k * 4];
> }
> ?
> or other more complicated syntax?

I am not sure.  I am still experimenting with this to gain experience.

It is easy to move the parser code elsehwere and then
complicated expressions can be used.

But the size expression is evaluated each time when the member is
accessed.  Do we really want to run arbitrary code at this point? 
Maybe the size expressions should be limited to very simple
expressions without side effects.

Martin

>
Comment 37 qinzhao 2023-04-04 15:07:14 UTC
(In reply to Martin Uecker from comment #36)

> 
> I considered pointers to arrays:
> 
> struct P {
>   int n;
>   char (*buf)[.n];
> };
> 

Okay. 
Then for the field "buf", it has a pointer type, the size of this field is a compile-time constant. where in the IR do you put the ".n" (the size of the array this pointer points to). That's the place I am trying to understand from your patch or from the IR dump of a working small testing case. 

 
> the FAM case needs more work and I guess there are
> still many other problems with the patch.
> 
For the FAM case, since the field itself is an ARRAY type, then the ".n" can be naturally put to the SIZE of the type of the field. 

Another thing I'd like to point out, for the original intention of this PR, FAM case is more important than the pointer to array case, I think. So, should we focus on FAM first?

> 
> The comments in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104970
> 
> imply that the size information does not survive long enough
need to study this a little bit more.

> But the size expression is evaluated each time when the member is
> accessed. 
How to represent this in IR?

> Maybe the size expressions should be limited to very simple 
expressions without side effects.
agreed. but I think we might want to focus on FAM first.
Comment 38 Martin Uecker 2023-04-04 16:33:27 UTC
Am Dienstag, dem 04.04.2023 um 15:07 +0000 schrieb qinzhao at gcc dot gnu.org:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> 
> --- Comment #37 from qinzhao at gcc dot gnu.org ---
> (In reply to Martin Uecker from comment #36)
> 
> > 
> > I considered pointers to arrays:
> > 
> > struct P {
> >   int n;
> >   char (*buf)[.n];
> > };
> > 
> 
> Okay. 
> Then for the field "buf", it has a pointer type, the size of this field is a
> compile-time constant. where in the IR do you put the ".n" (the size of the
> array this pointer points to). That's the place I am trying to understand from
> your patch or from the IR dump of a working small testing case. 

Try this:

struct foo {
   int n;
   char (*buf)[.n];
};

void store(struct foo* p, int a, int b) { (*p->buf)[a] = b; }

int main()
{
  struct foo p = { 7, &(char[7]){ 0 } };
  store(&p, 7, 1);
}

With UBSan, this should give this error:
ex.c:8:52: runtime error: index 7 out of bounds for type 'char [*]'


In the parser for '.n' I create a component ref with a placeholder
expression inside it.  But then later when creating a component
ref for p->buf  I replace all placeholder expressions in the
*type* of p->buf  with *p. So the type of p->buf has a
<placeholder>.n  as size expression somewwere and this 
becomes p.n.

So the middle-end never sees the placeholder, but only the size
expression with the p.n in it. According to the comment
in tree.def  I thought the later (replacement) part happens
automatically in the middle end, but this does not seem to be
the case in this scenario, so I implemented it in the FE.

> > the FAM case needs more work and I guess there are
> > still many other problems with the patch.
> > 
> For the FAM case, since the field itself is an ARRAY type, then the ".n" can be
> naturally put to the SIZE of the type of the field.

Yes, potentially. But a FAM has special semantics and the
code for this detects this based on the missing size (I think).

> Another thing I'd like to point out, for the original intention of this PR, FAM
> case is more important than the pointer to array case, I think. So, should we
> focus on FAM first?

I tried the other thing first, because it seemed simpler and
could potentially serve as a basis for the FAM case.  But I am
not so sure anymore, because of the issue mentioned below.

> > 
> > The comments in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104970
> > 
> > imply that the size information does not survive long enough
> need to study this a little bit more.
> 
> > But the size expression is evaluated each time when the member is
> > accessed. 
> How to represent this in IR?

This is done in the FE by putting the size in save_expr which
and building the C_MAYBE_CONST_EXPR.

> 
> > Maybe the size expressions should be limited to very simple 
> expressions without side effects.
> agreed. but I think we might want to focus on FAM first.

For the FAM case we should probably attach the attribute
to the member and read it directly in the object size pass,
similar how this is now done for the access attributes for
parameters.

I also thought that passing the information via the type
would be better, but I am not sure how to do this at the
moment.

In any case, I am also just learning about a lot of things
in the middle end, so if others with more experience
have advice, you should better listen to them.

Martin




>
Comment 39 qinzhao 2023-04-04 20:08:38 UTC
(In reply to Martin Uecker from comment #38)
> struct foo {
>    int n;
>    char (*buf)[.n];
> };
> 
> void store(struct foo* p, int a, int b) { (*p->buf)[a] = b; }
> 
> int main()
> {
>   struct foo p = { 7, &(char[7]){ 0 } };
>   store(&p, 7, 1);
> }
> 
> With UBSan, this should give this error:
> ex.c:8:52: runtime error: index 7 out of bounds for type 'char [*]'
yes, this worked. and with -fdump-tree-gimple, I can see the IR representation for this test case. thanks.

> In the parser for '.n' I create a component ref with a placeholder
> expression inside it.  But then later when creating a component
> ref for p->buf  I replace all placeholder expressions in the
> *type* of p->buf  with *p. So the type of p->buf has a
> <placeholder>.n  as size expression somewwere and this 
> becomes p.n.

the above explanation is very clean. thanks.
but the thing that confused me is: is the *type* of p->buf Pointer or Array? 

From the source code, it's a pointer, right? if so, why embed the array size to it?

> Yes, potentially. But a FAM has special semantics and the
> code for this detects this based on the missing size (I think).

Yes, that's true. currently, we determine a FAM as the following (c-decl.cc):

/* Determine whether TYPE is a ISO C99 flexible array memeber type "[]".  */
static bool
flexible_array_member_type_p (const_tree type)
{
  if (TREE_CODE (type) == ARRAY_TYPE
      && TYPE_SIZE (type) == NULL_TREE
      && TYPE_DOMAIN (type) != NULL_TREE
      && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE)
    return true;

  return false;
}

So, if we want to change the syntax of FAM to include size info, and embedded this size info to the TYPE, then we need to adjust all the places in FE or in Middle end that depend on the current FAM IR. Yes, potentially much bigger changes. 

therefore, directly extending the FAM syntax to include the size info might involve bigger change than attribute approach. 

We can implement attribute first, and then later do the syntax extension. 
that might be safer. 

> I tried the other thing first, because it seemed simpler and
> could potentially serve as a basis for the FAM case.  But I am
> not so sure anymore, because of the issue mentioned below.
Okay, I see now.

> For the FAM case we should probably attach the attribute
> to the member and read it directly in the object size pass,
> similar how this is now done for the access attributes for
> parameters.

Yes, implementation of attribute should be simpler and safer. 
I will try to come up with a patch on this attribute. 

> 
> I also thought that passing the information via the type
> would be better, but I am not sure how to do this at the
> moment.

Yes, passing the Size info through the type will be better, we can implement this syntax extension later if the attribute works very well, and we can promote it to a real language standard at that time.
Comment 40 qinzhao 2023-04-19 16:32:16 UTC
I had an initial patch to support the element_count attribute and use this attribute in builtin_dynamic_object_size(), for the following testing case:

  1 #include <stdlib.h>
  2 #include <assert.h>
  3 #ifdef ENABLE_ELEMENT_COUNT_ATTRIBUTE  
  4 int k;
  5 struct P {
  6   int k;
  7   int x[] __attribute__ ((element_count ("k")));
  8 } *p;
  9 #else
 10 struct P {
 11   int k;
 12   int x[];
 13 } *p;
 14 #endif
 15 
 16 void store(int a, int b)
 17 {
 18   p = (struct P *)malloc (sizeof (struct P) + a * sizeof (int));
 19   p->k = a;
 20   p->x[b] = 0;
 21   assert (__builtin_dynamic_object_size (p->x, 1) == (p->k * sizeof (int)));
 22   assert (__builtin_dynamic_object_size (p, 0) == (p->k * sizeof (int) + sizeof (struct P)));
 23   return;
 24 }
 25 
 26 int main()
 27 {
 28  store(7, 7);
 29  p->x[8] = 1;
 30 #ifdef ENABLE_ELEMENT_COUNT_ATTRIBUTE 
 31  assert (__builtin_dynamic_object_size (p->x, 1) == (p->k * sizeof (int)));
 32  assert (__builtin_dynamic_object_size (p, 0) == (p->k * sizeof (int) + sizeof (struct P)));
 33 #endif
 34 }

when compiled with my gcc as:

opc@qinzhao-ol8u3-x86 108896]$ sh t
/home/opc/Install/latest-d/bin/gcc -O -DENABLE_ELEMENT_COUNT_ATTRIBUTE t.c
a.out: t.c:32: main: Assertion `__builtin_dynamic_object_size (p, 0) == (p->k * sizeof (int) + sizeof (struct P))' failed.
t: line 21: 3266310 Aborted                 (core dumped) ./a.out

with this private gcc, the __builtin_dynamic_object_size (p->x,1) at line 31 can be correctly computed based on the attribute.

However, due to PR109557, the __builtin_dynamic_object_size (p, 0) at line 32 cannot be computed. 
Due to bug PR109557, the assertion at line 32 failed.
Comment 41 qinzhao 2023-05-03 13:57:05 UTC
my private gcc is able to use the element_count information in bound sanitizer now:
[opc@qinzhao-ol8u3-x86 108896]$ cat t6.c
#include <stdlib.h>
#include <assert.h>
struct P {
  int k;
  int x[] __attribute__ ((element_count ("k")));
} *p;

void __attribute__ ((noinline)) store(int a, int b) 
{
  p = (struct P *)malloc (sizeof (struct P) + a * sizeof (int));
  p->k = a;
  return;
}

int main()
{
 store(7, 7);
 p->x[7] = 1;
}
[opc@qinzhao-ol8u3-x86 108896]$ sh t
/home/opc/Install/latest-d/bin/gcc -O -fsanitize=bounds-strict t6.c
t6.c:18:6: runtime error: index 7 out of bounds for type 'int [*]'
[opc@qinzhao-ol8u3-x86 108896]$
Comment 42 Kees Cook 2023-05-03 15:32:25 UTC
Exciting! Are you able to attach the latest patch? I'd love to try it out. I've been testing Clang's version as well:
https://reviews.llvm.org/D148381
Comment 43 Martin Uecker 2023-05-04 15:16:24 UTC
Yes, this is great! I am also looking forward to your patch!  It seems it works with the bounds checking code?  Does it also interact with the object size pass?
Comment 44 qinzhao 2023-05-04 15:30:58 UTC
(In reply to Martin Uecker from comment #43)
> works with the bounds checking code?  Does it also interact with the object
> size pass?

both array-bounds sanitizer and object size pass are updated to use this new attribute. 
I will refine the patch a little bit before posting it in the next week.
Comment 47 Thorsten Glaser 2023-10-05 19:54:29 UTC
Regarding the proposed attribute: it would also be really great if I could do…

struct foo {
        int type;
        char name[] __attribute__((__element_count__(0)));
};

… to denote name[] is a NUL-terminated string.

This also allows, although not static, bounds checking for most of the use cases not covered by a counting member (optionally multiplied by an extra factor).
Comment 48 Martin Uecker 2023-10-05 20:21:27 UTC
Indicating a null terminated string should certainly use a different attribute name.
Comment 49 Sean 2023-12-27 06:31:00 UTC
>>will you support expressions:
>>
>>struct P {
>>  int k;
>>  int x[.k * 4];
>>}
> 
>Maybe the size expressions should be limited to very simple
>expressions without side effects.

Simple expressions without side effects would be useful for libusb, whose public API has structures like:

struct libusb_bos_dev_capability_descriptor {
	/** Size of this descriptor (in bytes) */
	uint8_t  bLength;

	uint8_t  other1;
	uint8_t  other2;

	/** Device Capability data (bLength - 3 bytes) */
	uint8_t  dev_capability_data[];
};

Here it seems we'd want to tag it with __attribute__((__element_count__(bLength - 3)));

See https://github.com/libusb/libusb/blob/37dee8f9dd429558e4d5bc4a804ab586abe7262a/libusb/libusb.h#L859C1-L859C51
Comment 52 Cristian Rodríguez 2024-07-25 13:54:15 UTC
Any hope for something that could apply to for example struct sockaddr ? where the last member is AT LEAST 14 bytes but it size is otherwise unspecified? or do we need to turn sa_data into a c99 flexible array and add 14 bytes of padding..?
I don't think that is going to fly with library/kernel maint..
Comment 53 Cristian Rodríguez 2024-07-25 13:54:32 UTC
Any hope for something that could apply to for example struct sockaddr ? where the last member is AT LEAST 14 bytes but it size is otherwise unspecified? or do we need to turn sa_data into a c99 flexible array and add 14 bytes of padding..?
I don't think that is going to fly with library/kernel maint..