This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: C extension to support variable-length vector types


Summary
=======

This is an RFC about some C language changes to support ARM's Scalable
Vector Extension (SVE).  A detailed description of SVE is available here:

    https://static.docs.arm.com/ddi0584/a/DDI0584A_a_SVE_supp_armv8A.pdf

but the only feature that really matters for this RFC is that SVE has
no fixed or preferred vector length.  Implementations can instead choose
from a range of possible vector lengths, with 128 bits being the minimum
and 2048 bits being the maximum.

SVE code will generally be written in a "vector-length agnostic" way;
i.e. it generally won't (need to) assume a particular vector length.
The practical upshot of this for compilers is that the size of an SVE
vector is not normally known until runtime.

ARM has defined a set of types and intrinsic functions (known as the
"ACLE") for using SVE operations directly in C and C++ code.
For reference, the ACLE specification is available here:

    https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf

but I'll try to keep the RFC self-contained.

Since the length of an SVE vector is not normally known until runtime,
the sizes of these ACLE types are likewise not normally known until runtime.
The ACLE handles this by treating the vector types as a new form of
"incomplete" type, with rules that are more relaxed than for normal
incomplete types.  The RFC is specifically about this approach, which
I'll describe in more detail below.  The main questions are:

  (1) Does the approach seem reasonable?

  (2) Would it be acceptable in principle to add this extension to the
      GCC C frontend (only enabled where necessary)?

  (3) Should we submit this to the standards committee?


Scope
=====

The RFC only discusses the C semantics.  The ACLE has a similar set of
changes for C++, but the fundamental approach is very similar, so we
thought it would be better to concentrate on C to start with.

Ideally this would be discussed in parallel with the GCC and clang
communities, but since cfe-dev is subscriber-only, it wouldn't really
be appropriate to cross-post.  We'll therefore ask on the clang lists
later, folding in any outcome of this RFC.

As far as question (3) above goes, we'd be happy to turn this into a
formal submission to the standards committee if that seems appropriate.
We just thought it would be good to get some feedback here first.


Contents
========

1. The types in more detail
2. Requirements
3. Possible approaches
4. Outline of the type system changes
5. Rationale for choosing this approach
6. Edits to the C standard
7. User-defined sizeless types
8. Implementation


1. The types in more detail
===========================

The ACLE defines a vector type sv<base>_t for each supported element type
<base>_t, so that the complete set is:

    svint8_t      svint16_t     svint32_t     svint64_t
    svuint8_t     svuint16_t    svuint32_t    svuint64_t
                  svfloat16_t   svfloat32_t   svfloat64_t

The types in each column have the same number of lanes and have twice
as many lanes as those in the column to the right.

These types can be combined into tuples of 2, 3 or 4 vectors using
sv<base>xN_t, with the individual vectors being fields with the names
"v0", "v1", etc.  For example, svint8x4_t contains four separate vectors
of type svint8_t, with the vectors being in fields named "v0", "v1",
"v2" and "v3".

The ACLE also defines a single predicate type:

    svbool_t

that has the same number of lanes as svint8_t and svuint8_t.


2. Requirements
===============

One of the main questions that we needed to answer for the ACLE was:
how do we add the variable-length types above to the type system?
The key requirements were:

  * The approach must work in both C and C++.

  * It must be possible to define automatic variables with these types.

  * It must be possible to pass and return objects of these types
    (since that's what intrinsics and vector library routines need to do).

  * It must be possible to use the types in _Generic associations
    (since the ACLE uses _Generic to provide tgmath.h-style overloads).

  * It must be possible to create pointers or references to the types
    (for passing or returning by pointer or reference, and because not
    allowing references would be semantically difficult in C++).


3. Possible approaches
======================

It seems that any approach to defining the ACLE types would fall into
one of three categories:

  (1) Limit the types in such a way that there is no concept of size.

  (2) Define the size of the types to be variable.

  (3) Define the size of the types to be constant, either with the
      constant being large enough for all possible vector lengths or
      with the types pointing to separate memory (as for C++ classes
      like std::string).

The approach we chose comes under (1).  The next sections describe this
approach informally in more detail, explain the rationale for chosing it,
and then give a more formal definition, as an edit to the standard.


4. Outline of the type system changes
=====================================

C classifies types as "complete" (the size of objects can be calculated)
or "incomplete" (the size of objects can't be calculated).  There's very
little you can do with a type until it becomes complete.

The approach we took was to treat all the SVE types as permanently
incomplete.  On its own, this would put them in a similar situation to
"void" (although they wouldn't be exactly the same, since there are some
specific rules for "void" that don't apply to incomplete types in general).
We then relaxed specific rules until the types were actually useful.

To do this, we classified types as:

  * "indefinite" (lacking sufficient information to create an object of
    that type) or "definite" (having sufficient information)

  * "sized" (will have a known size when definite) or "sizeless" (will
    never have a known size)

  * "incomplete" (lacking sufficient information to determine the size of
    objects of that type) or "complete" (having sufficient information)

where the wording for the final bullet is taken verbatim from the
standard.  "Complete" is now equivalent to "sized and definite".
All standard types are "sized" (even "void", although it's always
indefinite).

We then needed to make some rules use the distinction between "indefinite"
and "definite" rather than "incomplete" and "complete".  Referring back
to the requirements above, the specific things we wanted to allow were:

  * automatic variables with sizeless type
  * function parameters and return values with sizeless type
  * use of sizeless types with _Generic
  * pointers to sizeless types

Other useful things to allow are:

  * compound literals with sizeless type (so that's it's possible to use
    compound literals to create x2, x3 and x4 tuples)

Specific things we wanted to remain invalid -- by inheriting the rules from
incomplete types -- were:

  * creating or accessing arrays that have sizeless types
  * doing pointer arithmetic on pointers to sizeless types
  * using sizeof and _Alignof with a sizeless type (or object of sizeless type)
  * unions or structures with sizeless members
  * applying _Atomic to a sizeless type

It also seemed worth adding an extra restriction:

  * variables with sizeless type must not have static or thread-local
    storage duration

In practice it's impossible to define such variables with incomplete type,
but having an explicit rule means that things like:

    extern svint8_t foo;

are outright invalid rather than simply useless (because no other
translation unit could ever define foo).  Similarly, without an
explicit rule:

    svint8_t foo;

would be a valid tentative definition at the point it occurs and only
become invalid at the end of the translation unit, because svint8_t is
never completed.

This restriction isn't critical but it should allow better diagnostics.


5. Rationale for choosing this approach
=======================================

To recap the classification above, any approach would fall into
one of three categories:

  (1) Limit the types in such a way that there is no concept of size.

  (2) Define the size of the types to be variable.

  (3) Define the size of the types to be constant, either with the
      constant being large enough for all possible vector lengths or
      with the types pointing to separate memory (as for C++ classes
      like std::string).

(2) seemed initially appealing since C already has the concept of
variable-length arrays.  However, variable-length built-in types
would work in a significantly different way.  Arrays often decay to
pointers (which of course are fixed-length types), whereas the vector
types never would.  Unlike arrays, it should be possible to pass
variable-length vectors to functions, return them from functions,
and assign them by value.

One particular difficulty is that the semantics of variable-length arrays
rely on having a point at which the array size is evaluated.  It would
be difficult to extend this approach to declarations of functions that
pass or return variable-length types.

Also, as described above, we'd need to be able to create x2, x3 and
x4 aggregates of vectors, and this would introduce the concept of
variable-length aggregates that have variable-length fields in the
middle (not just at the end, like for flexible array members).
Although Ada (I think) has this concept, it would be new ground for C,
and again would be an invasive change.

As well as the extension itself being relatively complex (especially
for C++), it might be difficult to define it in a way that interacts
naturally with other (unseen) extensions, even those that are aware of
variable-length arrays.  Also, AIUI, variable-length arrays were added
to an early draft of C++14, but were later removed as too controversial
and didn't make it into the final standard.  C++17 still requires sizeof
to be constant and C11 makes variable-length arrays optional.

(3) can be divided into two:

(3a) The vector types have a constant size and are large enough for all
     possible vector lengths.

The main problem with this is that the maximum size of 2048 bits is much
larger than the minimum of 128 bits.  Using a fixed size of 2048 bits
would be extremely inefficient for smaller vector lengths, and of course
the whole point of the ACLE is to make things *more* efficient.

Also, we would need to define the types such that only the bytes
associated with the actual vector length are significant.  This would
make it possible to pass or return the types in registers and treat
them as register values when copying.  This perhaps has some similarity
with overaligned structures such as:

    struct s { _Alignas(16) int i; };

except that the amount of padding is only known at runtime.

There's also a significant conceptual problem: encoding a fixed size
goes against the guiding principle of SVE, in which there is no preferred
vector length.  There's nothing particularly magical about the current
limit of 2048 bits and we wouldn't want to have to create an ABI break
if it ever did increase in future.

(3b) The vector types have a constant size and refer to separate storage
     (as for std::string etc.)

This would be difficult to do without C++-style constructor, destructor,
copy and move semantics, so wouldn't work well in C.  And in C++ it would
be less efficient than the proposed approach, since presumably an Allocator
would be needed to allocate the separate storage.


A more positive justification of the ACLE approach is that it seems
to meet the requirements in the most efficient way possible.  The
vectors can use their natural (native) representation, and the type
system prevents uses that would make that representation problematic.

Also, the approach of starting with very restricted types and then
specifically allowing certain things should be more future-proof
and interact better with other (unseen) language extensions.  By default,
any language extension would treat the new types like other incomplete
types and choose conservatively-correct behaviour.  It would then be
possible to relax the language extension if this default behaviour
turns out to be too restrictive.

(That said, treating the types as permanently incomplete still won't
avoid all clashes with other extensions.  For example, we need to
allow objects of automatic storage duration to have certain forms of
incomplete type, whereas an extension might implicitly assume that all
such objects must already have complete type.  The approach should still
avoid the worst effects though.)


6. Edits to the C standard
==========================

This section specifies the behaviour for sizeless types as an edit to N1570.

6.2.5 Types
-----------

In 6.2.5/1, replace:

    At various points within a translation unit an object type may be
    /incomplete/ …

onwards with:

    Object types are further partitioned into /sized/ and /sizeless/; all
    basic and derived types defined in this standard are sized, but an
    implementation may provide additional sizeless types.

and add two additional clauses:

  * At various points within a translation unit an object type may be
    /indefinite/ (lacking sufficient information to construct an object
    of that type) or /definite/ (having sufficient information).
    An object type is said to be /complete/ if it is both sized and
    definite; all other object types are said to be /incomplete/.
    Complete types have sufficient information to determine the size
    of an object of that type while incomplete types do not.

  * Arrays, structures, unions and enumerated types are always sized,
    so for them the term /incomplete/ is equivalent to (and used
    interchangeably with) the term /indefinite/.

Change 6.2.5/19 to:

    The void type comprises an empty set of values; it is a sized
    indefinite object type that cannot be completed (made definite).

Replace "incomplete" with "indefinite" and "complete" with "definite" in
6.2.5/37, which describes how a type's state can change throughout a
translation unit.

6.3.2.1 Lvalues, arrays, and function designators
-------------------------------------------------

Replace "incomplete" with "indefinite" in 6.3.2.1/1, so that sizeless
definite types are modifiable lvalues.

Make the same replacement in 6.3.2.1/2, to prevent undefined behaviour
when lvalues have sizeless definite type.

6.5.1.1 Generic selection
-------------------------

Replace "complete object type" with "definite object type" in 6.5.1.1/2,
so that the type name in a generic association can be a sizeless definite
type.

6.5.2.2 Function calls
----------------------

Replace "complete object type" with "definite object type" in 6.5.2.2/1,
so that functions can return sizeless definite types.

Make the same change in 6.5.2.2/4, so that arguments can also have
sizeless definite type.

6.5.2.5 Compound literals
-------------------------

Replace "complete object type" with "definite object type" in 6.5.2.5/1,
so that compound literals can have sizeless definite type.

6.7 Declarations
----------------

Insert the following new clause after 6.7/4:

  * If an identifier for an object does not have automatic storage
    duration, its type must be sized rather than sizeless.

Replace "complete" with "definite" in 6.7/7, which describes when the
type of an object becomes definite.

6.7.6.3 Function declarators (including prototypes)
---------------------------------------------------

Replace "incomplete type" with "indefinite type" in 6.7.6.3/4, so that
parameters can also have sizeless definite type.

Make the same change in 6.7.6.3/12, which allows even indefinite types
to be function parameters if no function definition is present.

6.7.9 Initialization
--------------------

Replace "complete object type" with "definite object type" in 6.7.9/3,
to allow initialization of identifiers with sizeless definite type.

6.9.1 Function definitions
--------------------------

Replace "complete object type" with "definite object type" in 6.9.1/3,
so that functions can return sizeless definite types.

Make the same change in 6.9.1/7, so that adjusted parameter types can be
sizeless definite types.

J.2 Undefined behavior
----------------------

Update the entries that refer to the clauses above.


7. User-defined sizeless types
==============================

One thing not covered above is how the compiler would define x2, x3 and
x4 tuples.  We do have a follow-on proposal for allowing such aggregates
to be defined directly in C, but this would of course only be useful if
the basic concept of sizeless types seems reasonable.  Since the message
is quite long already, I thought it would be better to leave them out
for now.


8. Implementation
=================

So far we have implemented the extension in a local version of clang,
for both C and C++.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]