[00/10][RFC] Splitting the C and C++ concept of "complete type"

Wed Oct 17 12:54:00 GMT 2018

[ Sorry that there were so many typos in my last reply, will try to do better
  this time... ]

Joseph Myers <joseph@codesourcery.com> writes:
> On Tue, 16 Oct 2018, Richard Sandiford wrote:
>> The patches therefore add a new "__sizeless_struct" keyword to denote
>> structures that are sizeless rather than sized.  Unlike normal
>> structures, these structures can have members of sizeless type in
>> addition to members of sized type.  On the other hand, they have all
>> the same limitations as other sizeless types (described in earlier
>> sections).
>
> I don't see anything here disallowing offsetof on such structures.

I didn't think this needed to be done explicitly since:

	offsetof(type, member-designator)

    which expands to an integer constant expression that has type size_t,
    the value of which is the offset in bytes, to the structure
    member (designated by member-designator), from the beginning of its
    structure (designated by type). The type and member designator shall be
    such that given

        static type t;

    then the expression &(t.member-designator) evaluates to an address
    constant. (If the specified member is a bit-field, the behavior is
    undefined.)

implicitly rejects sizeless types on the basis that "static type t;"
would be invalid.  I think that's the same way that it rejects
incomplete structure types.

But yeah, it looks like I forgot to handle this in GCC. :-(

> On Tue, 16 Oct 2018, Richard Sandiford wrote:
>> > as Joseph pointed out, there are some related discussions
>> > on the WG14 reflector. How a about moving the discussion
>> > there?
>> 
>> The idea was to get a feel for what would be acceptable to GCC
>> maintainers.  When Arm presented an extension of P0214 to support SVE
>> at the last C++ committee meeting, using this sizeless type extension
>> as a possible way of providing the underlying vector types, the feeling
>> seemed to be that it wouldn't be considered unless it had already been
>> proven in compilers.
>
> But as shown in the related discussions, there are other possible features 
> that might also involve non-VLA types whose size is not a compile-time 
> constant.  And so it's necessary to work with the people interested in 
> those features in order to clarify what the underlying concepts ought to 
> look like to support different such features.

Could you give pointers to the specific proposals/papers you mean?

>> I think it is for some people though.  If the vectors don't decay to
>> pointers, they're more akin to a VLA wrapped in a structure rather than
>> a stand-alone VLA.  There is a GNU extension for that, e.g.:
>> 
>>   int
>>   f (int n)
>>   {
>>     struct s {
>>       int x[n];
>>     } foo;
>>     return sizeof (foo.x);
>>   }
>> 
>> But even though clang supports VLAs (of course), it rejects the
>> above with:
>> 
>>   error: fields must have a constant size: 'variable length array in structure' extension will never be supported
>> 
>> This gives a strong impression that wrapping a VLA type like this
>> is a bridge too far for some :-)  The message makes it clear that's
>> a case of "don't even bother asking".
>
> What are the clang concerns about VLAs in structs that are the reason for 
> not supporting them?

The user manual says:

-  clang does not support the gcc extension that allows variable-length
   arrays in structures. This is for a few reasons: one, it is tricky to
   implement, two, the extension is completely undocumented, and three,
   the extension appears to be rarely used. Note that clang *does*
   support flexible array members (arrays with a zero or unspecified
   size at the end of a structure).

So I guess defining it would remove the second objection.

> How do the sizeless structs with sizeless members in your proposal
> avoid those concerns about the definition of VLAs in structs?

The key difference is that the size, offset and layout don't have to be
known to the frontend and available during semantic analysis (unlike for
VLAs in structs).  In the clang implementation of sizeless types those
details only start to matter when translating clang ASTs into LLVM IR.
(With GCC it's a bit different, since TYPE_SIZE is set as soon as the
type definition is complete, even though for SVE TYPE_SIZE should only
matter in the mid and backend.)

>> The problem isn't so much that the size is only known at runtime,
>> but that the size isn't necessarily invariant, and the size of an
>> object doesn't carry the size information with it.
>> 
>> This means you can't tell what size a given object is, even at runtime.
>
> How then is e.g. passing a pointer to such a struct (containing such 
> unknown-size members) to another function supposed to work?  Or is there 
> something in your proposed standard text edits that would disallow passing 
> such a pointer, or disallow using "->" with it to access members?

The idea here...

>> All you can tell is what size the object would be if you created it
>> from scratch.  E.g.:
>> 
>>   svint8_t *ptr;  // pointer to variable-length vector type
>> 
>>   void thread1 (void)
>>   {
>>     svint8_t local;
>>     *ptr = &local;
>>     ...run for a long time...
>>   }
>> 
>>   void thread2 (void)
>>   {
>>     ... sizeof (*ptr); ...;
>>   }
>> 
>> If thread1 and thread2 have different vector lengths, thread2 has no way
>> of knowing what size *ptr is.
>> 
>> Of course, thread2 can't validly use *ptr if it has wider vectors than
>> thread1, but if we resort to saying "undefined behavior" for the above,
>> then it becomes difficult to define when the size actually is defined.
>
> What in your standard text edits serves to make that undefined?  
> Generally, what in those edits serves to say when conversions involving 
> such types, or pointers thereto, or accesses through compatible types in 
> different places, are or are not defined?
>
> In standard C, for example, we have for VLAs 6.7.6.2#6, "If the two array 
> types are used in a context which requires them to be compatible, it is 
> undefined behavior if the two size specifiers evaluate to unequal 
> values.".  What is the analogue of this for sizeless types?  Since unlike 
> VLAs you're allowing these types, and sizeless structs containing them, to 
> be passed by value, assigned, etc., you need something like that to 
> determine whether assignment, conditional expression, function argument 
> passing, function return, access via a pointer, etc., are valid.

...and here is that any size changes come only from changes in the
implementation-defined built-in sizeless types.  The user can't define
a new type whose size varies in new ways.  E.g. the size and layout of:

    __sizeless_struct s1 { svuint32_t v0, v1; };

can only vary because the size of the implementation-defined svuint32_t
can vary.  Something like:

    __sizeless_struct s2 { uint32_t v0, v1; };

would never change size or layout dynamically and the above scenarios
would be valid whenever they would be valid for "struct s2".

Since the built-in types are implementation-defined, the idea was
that the situations in which their size could vary should be
implementation-defined as well.

So passing a pointer to a sizeless struct containing a vector is OK if
nothing causes the vector length to change in the meantime.  The exact
circumstances in which that could happen are implementation-defined.
The same goes for the thread example above: in normal SVE usage it
would be fine for thread2 to use the vector at *ptr, since normally
all threads would run with the same vector length.  But there are some
implementation-defined situations in which the length could be different.

The situations in which the vector length can change for SVE (and thus
the situations in which the above examples would be undefined behaviour)
are very SVE-specific.  It's likely that the rules for any future sizeless
types would also be very specific to those types.  E.g. my understanding
of the proposed RISC-V vector extension is that the vector length would
change much more often than it does for SVE.

> Can these types be used with _Atomic?  I don't see anything to say they 
> can't.

At the moment that's allowed, although of course it probably isn't
useful in practice.

> Can these types be passed to variadic functions and named in va_arg?  
> Again, I don't see anything to say they can't.

Yes, this is allowed (and covered by the tests FWIW).

> Can you have file-scope, and so static-storage-duration, compound literals 
> with these types?  You're allowing compound literals with these types, and 
> what you have disallowing objects with these types with non-automatic 
> storage duration seems to be specific to the case of "an identifier for an 
> object".

Ah, you mean something like:

    typedef __sizeless_struct s { int i; } s;
    s *ptr = &(s) { 1 };

?  Yeah, good point, hadn't thought of that.  That should be invalid too.

> I don't see any change proposed to 6.2.6.1#2 corresponding to what you say 
> elsewhere about discontiguous representations of sizeless structures.

Yeah, good point.  It should be edited to say:

    Except for bit-fields *and sizeless structures*, objects are
    composed of contiguous sequences of one or more bytes, the number,
    order, and encoding of which are either explicitly specified or
    implementation-defined.

TBH the possibility of a discontiguous representation was an early idea
that we've never actually used so far, so if that's a problem, we could
probably drop it.  It just seemed to be a natural extension of the
principle that the layout is completely implementation-defined.

>> But do you have any feel for whether this would ever be acceptable
>> in C++?  One of the main requirements for this was that it needs
>> to work in both C and C++, with the same ABI representation.
>> I thought VLAs were added to an early draft of C++14 and then
>> removed before it was published.  They weren't added back for C++17,
>> and I'd seen other proposals about classes having a "sizeof field"
>> instead (i.e. the type would carry the size information with it,
>> which we don't want).  So the prospects didn't look good.
>
> What were the C++ concerns with VLAs,

The sizeless type proposal was drawn up while SVE was still
confidential, so it wasn't something we could openly talk about.
At the time it wasn't obvious from the online trail why VLAs
(well, ARBs) were removed.  The proposal was:

   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3639.html

which says that it "was approved by EWG during the Portland meeting of
WG21 and was updated to reflect CWG review".  At the time we were doing
the sizeless type proposal, we were in pretty much the same situation
as this reddit poster:

    https://www.reddit.com/r/cpp_questions/comments/3clm34/why_was_n3639_runtimesized_arrays_with_automatic/

in that all we could find was a statement of its removal, with no
explanation of the reasons.

I see now there's a stackoverflow answer with some of the history
(this postdates the original sizeless type work):

    https://stackoverflow.com/questions/40633344/variable-length-arrays-in-c14

It ends with:

    Sadly [...] there are no future plans to resurrect ARBs/VLAs with
    C++ in the simple c99 VLA form.

> and how do the variable-size types in this proposal avoid those concerns?

I think the key difference between sizeless types and ARBs is that
ARBs were intended to provide user-controlled sources of variability.
That would only be useful if having variable-length arrays as part of
the core language rather than the library is useful.  Here we're simply
providing canned built-in types with a fixed source of variability,
with that variability not being modelled in the language at all.
Users can combine those canned types together but they can't create
new sources of variability.  The sizeless type proposal is mostly just
a means to an end: a core language change that allows new library
extensions to be defined for targets like SVE.

I think the key difference between sizeless types and full C99-style
VLAs is that the size and layout of sizeless types never matters for
semantic analysis.  Rather than the sizes of types becoming variable
(and the offsets of members becoming variable, and constexprs becoming
variable-sized, etc.), we simply don't make those concepts available
for sizeless types.

So nothing at the language level becomes variable that was constant before.
All that happens is that some things become invalid for sizeless types
that would be valid for sized ones.

The idea was really for the language to provide a framework for
implementations to define implementation-specific types with
implementation-specific rules while disturbing the language itself
as little as possible.

Thanks,
Richard