= C++ API for vectors = <> This page describes the new API used for vectors in GCC. The previous implementation was based on pre-processor macros that would create new structure types for each vector element type. Vector operations were all macros that required the element type and (often) the allocation strategy for the vector. Additionally, all vector instances were required to be pointers. The new implementation is based on C++ templates. This allows more flexibility in the internal structure of the vector, improves type safety and allows vectors to be regular instances. This page describes how to transition from the old interface to the new interface by providing examples. == Vector class == The vector class {{{vec}}} is a template class with 3 arguments: 1. '''Element type''' ({{{T}}}): This is the type of the elements in the vector. Any valid type can be used here. 2. '''Allocation strategy''' ({{{A}}}): This is a type controlling the allocation of the vector. Valid values are: 1. {{{va_heap}}}: Allocate the vector in the heap. '''This is the default allocation strategy'''. 2. {{{va_gc}}}: Allocate the vector in GC memory. 3. {{{va_gc_atomic}}}: Allocate the vector in GC memory, but garbage collection should not traverse the vector marking each element (i.e., the elements are 'atomic' w.r.t. GC) 4. {{{va_stack}}}: Allocate the vector on the stack using alloca(). The initial allocation of these vectors is on the stack, but if the vector is ever forced to grow beyond the initial allocation size, it will be relocated to the heap. So, make sure that these vectors never have to grow if you want to keep them on the stack. 3. '''Layout''' ({{{L}}}): This controls the structural layout of the {{{vec}}} class. The different layout strategies provide space/time tradeoffs. We support two layouts: 1. {{{vl_embed}}}: This layout uses the trailing array idiom for the vector. The data structure contains three fields: the number of used slots, the number of allocated slots and a trailing array of type {{{T}}} of a single element. When using this layout, the number of allocated slots may not grow. Initial allocation reserves a single block of memory that covers the control data ({{{num_}}} and {{{alloc_}}}) plus enough room to hold all the requested slots of type {{{T}}}. This layout is useful when declaring small, fixed-size vectors that are embedded inside a data structure. Access to the vector is always fast because its control and data are laid out together. This is the '''default layout''' for all vectors using the {{{va_gc}}} or {{{va_gc_atomic}}} allocation. We use this layout because vectors in GC memory '''must''' be pointers (see the section on pointers to vectors below). 2. {{{vl_ptr}}}: This layout is the same used by the previous {{{VEC}}} implementation. It is defined to be a pointer to a {{{vl_embed}}} vector. This is useful for vectors that are part of structures. Before initial allocation, this vector takes up a single word of storage. Once allocated, its control and data are laid out together, and the vector can dynamically change size without changing its address (only the internal pointer changes). '''This is the default layout'''. == Converting existing code == The old {{{VEC_*}}} macros have been converted to member functions of {{{class vec}}}. Additionally, the vast majority of vectors are no longer pointers. They are regular instance variables. The main exception are '''some''' vectors using GC allocation (i.e., {{{va_gc}}}). Due to limitations in the {{{GTY}}} machinery, every vector that acts as a GC root, must be declared as a pointer. Failing to do so will result in an error from {{{gengtype}}}. The names of the member functions are the same as the old macro names with the following changes: 1. The prefix {{{VEC_}}} disappears. 2. The vector is no longer a parameter to the function. 3. The vector type is no longer required as a parameter to the function. 4. The allocation strategy is no longer required as a parameter to the function. For example, to push a new element {{{VAL}}} into an array {{{V}}} (allocated on GC): || Before || After || || {{{VEC_safe_push (MyType, gc, V, VAL);}}} || {{{V.safe_push (VAL);}}} || === Declaring vectors === You no longer need to use the macros {{{DEF_VEC_P}}}, {{{DEF_VEC_O}}}, {{{DEF_VEC_ALLOC_P}}} and {{{DEF_VEC_ALLOC_O}}}. The following code snippet declares a vector named {{{v}}} of {{{int}}} elements on the heap, with the embedded pointer layout: {{{ vec v; }}} Notice that since {{{va_heap}}} and {{{vl_ptr}}} are default values for the template, they do not need to be specified. One limitation of the current implementation is that the {{{vec}}} class must be a [[http://en.wikipedia.org/wiki/Plain_old_data_structure|POD]] because these vectors are often stored in unions and we are using C++ 2003 (which does not support that). This means that {{{class vec}}} may not have constructors nor destructors, so instances of {{{class vec}}} must be '''explicitly initialized'''. To initialize a vector, you can use the {{{create}}} method or simply assign an empty instance to it. For example, {{{ vec v = vec(); vec a; a.create (0); }}} The declarations above initialize vectors {{{v}}} and {{{a}}} to empty using two different idioms. The first idiom is useful when passing empty vectors as parameters to functions. Similarly, to deallocate the memory used by a vector, use the member function {{{release}}}. === Handling pointers to vectors === In some cases, you will need to create a pointer to a vector (for instance, when the vector must be in GC memory). To simplify allocation and deallocation, you can use the free functions {{{vec_alloc}}} and {{{vec_free}}}. Both functions take as argument a reference to a pointer to a vector. They allocate memory for the pointer and they allocate the internal vector. Vectors in GC memory are currently required to be pointers. The current implementation of GC requires that every object in GC memory be allocated via a pointer. To allocate a pointer to a vector with 20 slots in GC: {{{ vec *vec_ptr; vec_alloc (vec_ptr, 20); ... operate on vec_ptr ... vec_free (vec_ptr); }}} Note that the call to {{{vec_free}}} will deallocate the internal vector '''and''' {{{vec_ptr}}} itself. Additionally, it will set {{{vec_ptr}}} to {{{NULL}}}. Additionally, any operation that (a) may need to handle a {{{NULL}}} pointer, or (b) may reallocate the vector, will have to be done via one of the free functions named {{{vec_*}}}. All these functions take a {{{vec}}} pointer or a {{{vec}}} pointer reference (if they need to modify it). The following operations are included: 1. {{{vec_safe_space(v, nelems)}}}: {{{v->space(nelems)}}}. Returns {{{false}}} if {{{v}}} is {{{NULL}}}. 1. {{{vec_safe_length(v)}}}: {{{v->length()}}} or 0 if {{{v}}} is {{{NULL}}}. 1. {{{vec_safe_address(v)}}}: {{{v->address()}}} or {{{NULL}}} if {{{v}}} is {{{NULL}}}. 1. {{{vec_safe_is_empty(v)}}}: {{{v->is_empty()}}} or {{{true}}} if {{{v}}} is {{{NULL}}}. 1. {{{vec_safe_reserve(v, nelems, exact=false)}}}: Allocates space for {{{nelems}}} elements if {{{v}}} is {{{NULL}}} or has no room. It modifies {{{v}}}. {{{exact}}} should be {{{true}}} if the caller wants to allocate exactly, and {{{false}}} to allocate exponentially. 1. {{{vec_safe_reserve_exact(v, nelems)}}}: Calls {{{vec_safe_reserve(v, nelems, true)}}}. It modifies {{{v}}}. 1. {{{vec_alloc(v, nelems)}}}: Sets {{{v}}} to {{{NULL}}} and calls {{{vec_safe_reserve(v, nelems)}}}. 1. {{{vec_free(v)}}}: Frees the memory used by {{{v}}} and sets it to {{{NULL}}}. 1. {{{vec_safe_grow(v, nelems)}}}: Grows {{{v}}} to {{{nelems}}}. It modifies {{{v}}} if it needs to reallocate it. 1. {{{vec_safe_grow_cleared(v, nelems)}}}: Same as {{{vec_safe_grow}}} and it clears the new slots. 1. {{{vec_safe_iterate(v, ix, ptr)}}}: Returns false if {{{v}}} is {{{NULL}}} or if {{{ix}}} is past the end of the vector. The associated macro {{{FOR_EACH_VEC_SAFE_ELT}}} calls this function. 1. {{{vec_safe_push(v, obj)}}}: If {{{v}}} is {{{NULL}}}, it allocates a new vector, it adds one more slot to the vector and then calls {{{v->quick_push}}}. 1. {{{vec_safe_insert(v, ix, obj)}}}: If {{{v}}} is {{{NULL}}}, it allocates a new vector, it adds one more slot to the vector and then calls {{{v->quick_insert}}}. 1. {{{vec_safe_truncate(v, nelems)}}}: If {{{v}}} is {{{NULL}}}, it does nothing. Otherwise, it calls {{{v->truncate(nelems)}}}. 1. {{{vec_safe_splice(v_dst, v_src)}}}: If {{{v_dst}}} is {{{NULL}}}, it allocates it. It then calls {{{v_dst->splice(*v_src)}}}. In general, if you can avoid calling the {{{vec_*}}} functions, your code will be slightly faster because you will avoid doing the test on the {{{vec}}} pointer. However, any operation that may cause the vector to grow must always be done via these free functions (most notably, {{{vec_safe_push}}}). === Indexing into a vector === The old macros {{{VEC_index}}} and {{{VEC_replace}}} have disappeared and have been replaced by {{{operator[]}}}. || Before || After || || {{{x = VEC_index (MyType, v, 10);}}} || {{{MyType x = v[10]; }}} || || {{{VEC_replace (MyType, v, 10, Value); }}} || {{{v[10] = Value; }}} || === Presence tests === In the old implementation, every {{{VEC}}} was a pointer type. To determine whether a vector {{{V}}} had been allocated, one could test whether {{{V}}} was {{{NULL}}}. This no longer holds on regular vector instances. Instead, you can use the member function {{{exists}}} to determine if the internal vector has been allocated or not. Note that you can distinguish from a vector that is merely empty (i.e., {{{v.length() == 0}}}) from a non-existent vector ({{{v.exists() == false}}}).