C++ API for vectors

Contents

C++ API for vectors
1. Vector class
2. Converting existing code

This page describes the new API used for vectors in GCC. The previous implementation was based on pre-processor macros that would create new structure types for each vector element type. Vector operations were all macros that required the element type and (often) the allocation strategy for the vector. Additionally, all vector instances were required to be pointers.

The new implementation is based on C++ templates. This allows more flexibility in the internal structure of the vector, improves type safety and allows vectors to be regular instances. This page describes how to transition from the old interface to the new interface by providing examples.

Vector class

The vector class vec is a template class with 3 arguments:

Element type (T): This is the type of the elements in the vector. Any valid type can be used here.
Allocation strategy (A): This is a type controlling the allocation of the vector. Valid values are:
1. va_heap: Allocate the vector in the heap. This is the default allocation strategy.
2. va_gc: Allocate the vector in GC memory.
3. va_gc_atomic: Allocate the vector in GC memory, but garbage collection should not traverse the vector marking each element (i.e., the elements are 'atomic' w.r.t. GC)
4. va_stack: Allocate the vector on the stack using alloca(). The initial allocation of these vectors is on the stack, but if the vector is ever forced to grow beyond the initial allocation size, it will be relocated to the heap. So, make sure that these vectors never have to grow if you want to keep them on the stack.
Layout (L): This controls the structural layout of the vec class. The different layout strategies provide space/time tradeoffs. We support two layouts:
1. vl_embed: This layout uses the trailing array idiom for the vector. The data structure contains three fields: the number of used slots, the number of allocated slots and a trailing array of type T of a single element. When using this layout, the number of allocated slots may not grow. Initial allocation reserves a single block of memory that covers the control data (num_ and alloc_) plus enough room to hold all the requested slots of type T. This layout is useful when declaring small, fixed-size vectors that are embedded inside a data structure. Access to the vector is always fast because its control and data are laid out together. This is the default layout for all vectors using the va_gc or va_gc_atomic allocation. We use this layout because vectors in GC memory must be pointers (see the section on pointers to vectors below).
2. vl_ptr: This layout is the same used by the previous VEC implementation. It is defined to be a pointer to a vl_embed vector. This is useful for vectors that are part of structures. Before initial allocation, this vector takes up a single word of storage. Once allocated, its control and data are laid out together, and the vector can dynamically change size without changing its address (only the internal pointer changes). This is the default layout.

Converting existing code

The old VEC_* macros have been converted to member functions of class vec. Additionally, the vast majority of vectors are no longer pointers. They are regular instance variables. The main exception are some vectors using GC allocation (i.e., va_gc). Due to limitations in the GTY machinery, every vector that acts as a GC root, must be declared as a pointer. Failing to do so will result in an error from gengtype.

The names of the member functions are the same as the old macro names with the following changes:

The prefix VEC_ disappears.
The vector is no longer a parameter to the function.
The vector type is no longer required as a parameter to the function.
The allocation strategy is no longer required as a parameter to the function.

For example, to push a new element VAL into an array V (allocated on GC):

Before	After
`VEC_safe_push (MyType, gc, V, VAL);`	`V.safe_push (VAL);`

Declaring vectors

You no longer need to use the macros DEF_VEC_P, DEF_VEC_O, DEF_VEC_ALLOC_P and DEF_VEC_ALLOC_O. The following code snippet declares a vector named v of int elements on the heap, with the embedded pointer layout:

vec<int> v;

Notice that since va_heap and vl_ptr are default values for the template, they do not need to be specified.

One limitation of the current implementation is that the vec class must be a POD because these vectors are often stored in unions and we are using C++ 2003 (which does not support that). This means that class vec may not have constructors nor destructors, so instances of class vec must be explicitly initialized.

To initialize a vector, you can use the create method or simply assign an empty instance to it. For example,

vec<int> v = vec<int>();

vec<tree, va_gc> a;
a.create (0);

The declarations above initialize vectors v and a to empty using two different idioms. The first idiom is useful when passing empty vectors as parameters to functions.

Similarly, to deallocate the memory used by a vector, use the member function release.

Handling pointers to vectors

In some cases, you will need to create a pointer to a vector (for instance, when the vector must be in GC memory). To simplify allocation and deallocation, you can use the free functions vec_alloc and vec_free. Both functions take as argument a reference to a pointer to a vector. They allocate memory for the pointer and they allocate the internal vector.

Vectors in GC memory are currently required to be pointers. The current implementation of GC requires that every object in GC memory be allocated via a pointer.

To allocate a pointer to a vector with 20 slots in GC:

vec<MyType, va_gc> *vec_ptr;

vec_alloc (vec_ptr, 20);
... operate on vec_ptr ...
vec_free (vec_ptr);

Note that the call to vec_free will deallocate the internal vector and vec_ptr itself. Additionally, it will set vec_ptr to NULL.

Additionally, any operation that (a) may need to handle a NULL pointer, or (b) may reallocate the vector, will have to be done via one of the free functions named vec_*. All these functions take a vec pointer or a vec pointer reference (if they need to modify it). The following operations are included:

vec_safe_space(v, nelems): v->space(nelems). Returns false if v is NULL.
vec_safe_length(v): v->length() or 0 if v is NULL.
vec_safe_address(v): v->address() or NULL if v is NULL.
vec_safe_is_empty(v): v->is_empty() or true if v is NULL.
vec_safe_reserve(v, nelems, exact=false): Allocates space for nelems elements if v is NULL or has no room. It modifies v. exact should be true if the caller wants to allocate exactly, and false to allocate exponentially.
vec_safe_reserve_exact(v, nelems): Calls vec_safe_reserve(v, nelems, true). It modifies v.
vec_alloc(v, nelems): Sets v to NULL and calls vec_safe_reserve(v, nelems).
vec_free(v): Frees the memory used by v and sets it to NULL.
vec_safe_grow(v, nelems): Grows v to nelems. It modifies v if it needs to reallocate it.
vec_safe_grow_cleared(v, nelems): Same as vec_safe_grow and it clears the new slots.
vec_safe_iterate(v, ix, ptr): Returns false if v is NULL or if ix is past the end of the vector. The associated macro FOR_EACH_VEC_SAFE_ELT calls this function.
vec_safe_push(v, obj): If v is NULL, it allocates a new vector, it adds one more slot to the vector and then calls v->quick_push.
vec_safe_insert(v, ix, obj): If v is NULL, it allocates a new vector, it adds one more slot to the vector and then calls v->quick_insert.
vec_safe_truncate(v, nelems): If v is NULL, it does nothing. Otherwise, it calls v->truncate(nelems).
vec_safe_splice(v_dst, v_src): If v_dst is NULL, it allocates it. It then calls v_dst->splice(*v_src).

In general, if you can avoid calling the vec_* functions, your code will be slightly faster because you will avoid doing the test on the vec pointer. However, any operation that may cause the vector to grow must always be done via these free functions (most notably, vec_safe_push).

Indexing into a vector

The old macros VEC_index and VEC_replace have disappeared and have been replaced by operator[].

Before	After
`x = VEC_index (MyType, v, 10);`	`MyType x = v[10];`
`VEC_replace (MyType, v, 10, Value);`	`v[10] = Value;`

Presence tests

In the old implementation, every VEC was a pointer type. To determine whether a vector V had been allocated, one could test whether V was NULL. This no longer holds on regular vector instances. Instead, you can use the member function exists to determine if the internal vector has been allocated or not.

Note that you can distinguish from a vector that is merely empty (i.e., v.length() == 0) from a non-existent vector (v.exists() == false).