18. Trees: The intermediate representation used by the C and C++ front ends

This chapter documents the internal representation used by GCC and C++ to represent C and C++ source programs. When presented with a C or C++ source program, GCC parses the program, performs semantic analysis (including the generation of error messages), and then produces the internal representation described here. This representation contains a complete representation for the entire translation unit provided as input to the front end. This representation is then typically processed by a code-generator in order to produce machine code, but could also be used in the creation of source browsers, intelligent editors, automatic documentation generators, interpreters, and any other programs needing the ability to process C or C++ code.

This chapter explains the internal representation. In particular, it documents the internal representation for C and C++ source constructs, and the macros, functions, and variables that can be used to access these constructs. The C++ representation which is largely a superset of the representation used in the C front end. There is only one construct used in C that does not appear in the C++ front end and that is the GNU "nested function" extension. Many of the macros documented here do not apply in C because the corresponding language constructs do not appear in C.

If you are developing a "back end", be it is a code-generator or some other tool, that uses this representation, you may occasionally find that you need to ask questions not easily answered by the functions and macros available here. If that situation occurs, it is quite likely that GCC already supports the functionality you desire, but that the interface is simply not documented here. In that case, you should ask the GCC maintainers (via mail to gcc@gcc.gnu.org) about documenting the functionality you require. Similarly, if you find yourself writing functions that do not deal directly with your back end, but instead might be useful to other people using the GCC front end, you should submit your patches for inclusion in GCC.

18.1 Deficiencies    Topics net yet covered in this document.

18.2 Overview    All about trees.

18.3 Types    Fundamental and aggregate types.

18.4 Scopes    Namespaces and classes.

18.6 Functions    Overloading, function bodies, and linkage.

18.5 Declarations    Type declarations and variables.

18.7 Attributes in trees    Declaration and type attributes.

18.8 Expressions    From typeid to throw.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.1 Deficiencies

There are many places in which this document is incomplet and incorrekt. It is, as of yet, only preliminary documentation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.2 Overview

The central data structure used by the internal representation is the tree. These nodes, while all of the C type tree, are of many varieties. A tree is a pointer type, but the object to which it points may be of a variety of types. From this point forward, we will refer to trees in ordinary type, rather than in this font, except when talking about the actual C type tree.

You can tell what kind of node a particular tree is by using the TREE_CODE macro. Many, many macros take a trees as input and return trees as output. However, most macros require a certain kinds of tree node as input. In other words, there is a type-system for trees, but it is not reflected in the C type-system.

For safety, it is useful to configure GCC with `--enable-checking'. Although this results in a significant performance penalty (since all tree types are checked at run-time), and is therefore inappropriate in a release version, it is extremely helpful during the development process.

Many macros behave as predicates. Many, although not all, of these predicates end in `_P'. Do not rely on the result type of these macros being of any particular type. You may, however, rely on the fact that the type can be compared to 0, so that statements like

if (TEST_P (t) && !TEST_P (y)) x = 1;
and

int i = (TEST_P (t) != 0);
are legal. Macros that return int values now may be changed to return tree values, or other pointers in the future. Even those that continue to return int may return multiple non-zero codes where previously they returned only zero and one. Therefore, you should not write code like

if (TEST_P (t) == 1)
as this code is not guaranteed to work correctly in the future.

You should not take the address of values returned by the macros or functions described here. In particular, no guarantee is given that the values are lvalues.

In general, the names of macros are all in uppercase, while the names of functions are entirely in lower case. There are rare exceptions to this rule. You should assume that any macro or function whose name is made up entirely of uppercase letters may evaluate its arguments more than once. You may assume that a macro or function whose name is made up entirely of lowercase letters will evaluate its arguments only once.

The error_mark_node is a special tree. Its tree code is ERROR_MARK, but since there is only ever one node with that code, the usual practice is to compare the tree against error_mark_node. (This test is just a test for pointer equality.) If an error has occurred during front-end processing the flag errorcount will be set. If the front end has encountered code it cannot handle, it will issue a message to the user and set sorrycount. When these flags are set, any macro or function which normally returns a tree of a particular kind may instead return the error_mark_node. Thus, if you intend to do any processing of erroneous code, you must be prepared to deal with the error_mark_node.

Occasionally, a particular tree slot (like an operand to an expression, or a particular field in a declaration) will be referred to as "reserved for the back end." These slots are used to store RTL when the tree is converted to RTL for use by the GCC back end. However, if that process is not taking place (e.g., if the front end is being hooked up to an intelligent editor), then those slots may be used by the back end presently in use.

If you encounter situations that do not match this documentation, such as tree nodes of types not mentioned here, or macros documented to return entities of a particular kind that instead return entities of some different kind, you have found a bug, either in the front end or in the documentation. Please report these bugs as you would any other bug.

18.2.1 Trees    Macros and functions that can be used with all trees.

18.2.2 Identifiers    The names of things.

18.2.3 Containers    Lists and vectors.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.2.1 Trees

This section is not here yet.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.2.2 Identifiers

An IDENTIFIER_NODE represents a slightly more general concept that the standard C or C++ concept of identifier. In particular, an IDENTIFIER_NODE may contain a `$', or other extraordinary characters.

There are never two distinct IDENTIFIER_NODEs representing the same identifier. Therefore, you may use pointer equality to compare IDENTIFIER_NODEs, rather than using a routine like strcmp.

You can use the following macros to access identifiers:

IDENTIFIER_POINTER: The string represented by the identifier, represented as a char*. This string is always NUL-terminated, and contains no embedded NUL characters.
IDENTIFIER_LENGTH: The length of the string returned by IDENTIFIER_POINTER, not including the trailing NUL. This value of IDENTIFIER_LENGTH (x) is always the same as strlen (IDENTIFIER_POINTER (x)).
IDENTIFIER_OPNAME_P: This predicate holds if the identifier represents the name of an overloaded operator. In this case, you should not depend on the contents of either the IDENTIFIER_POINTER or the IDENTIFIER_LENGTH.
IDENTIFIER_TYPENAME_P: This predicate holds if the identifier represents the name of a user-defined conversion operator. In this case, the TREE_TYPE of the IDENTIFIER_NODE holds the type to which the conversion operator converts.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.2.3 Containers

Two common container data structures can be represented directly with tree nodes. A TREE_LIST is a singly linked list containing two trees per node. These are the TREE_PURPOSE and TREE_VALUE of each node. (Often, the TREE_PURPOSE contains some kind of tag, or additional information, while the TREE_VALUE contains the majority of the payload. In other cases, the TREE_PURPOSE is simply NULL_TREE, while in still others both the TREE_PURPOSE and TREE_VALUE are of equal stature.) Given one TREE_LIST node, the next node is found by following the TREE_CHAIN. If the TREE_CHAIN is NULL_TREE, then you have reached the end of the list.

A TREE_VEC is a simple vector. The TREE_VEC_LENGTH is an integer (not a tree) giving the number of nodes in the vector. The nodes themselves are accessed using the TREE_VEC_ELT macro, which takes two arguments. The first is the TREE_VEC in question; the second is an integer indicating which element in the vector is desired. The elements are indexed from zero.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.3 Types

All types have corresponding tree nodes. However, you should not assume that there is exactly one tree node corresponding to each type. There are often several nodes each of which correspond to the same type.

For the most part, different kinds of types have different tree codes. (For example, pointer types use a POINTER_TYPE code while arrays use an ARRAY_TYPE code.) However, pointers to member functions use the RECORD_TYPE code. Therefore, when writing a switch statement that depends on the code associated with a particular type, you should take care to handle pointers to member functions under the RECORD_TYPE case label.

In C++, an array type is not qualified; rather the type of the array elements is qualified. This situation is reflected in the intermediate representation. The macros described here will always examine the qualification of the underlying element type when applied to an array type. (If the element type is itself an array, then the recursion continues until a non-array type is found, and the qualification of this type is examined.) So, for example, CP_TYPE_CONST_P will hold of the type const int ()[7], denoting an array of seven ints.

The following functions and macros deal with cv-qualification of types:

CP_TYPE_QUALS: This macro returns the set of type qualifiers applied to this type. This value is TYPE_UNQUALIFIED if no qualifiers have been applied. The TYPE_QUAL_CONST bit is set if the type is const-qualified. The TYPE_QUAL_VOLATILE bit is set if the type is volatile-qualified. The TYPE_QUAL_RESTRICT bit is set if the type is restrict-qualified.
CP_TYPE_CONST_P: This macro holds if the type is const-qualified.
CP_TYPE_VOLATILE_P: This macro holds if the type is volatile-qualified.
CP_TYPE_RESTRICT_P: This macro holds if the type is restrict-qualified.
CP_TYPE_CONST_NON_VOLATILE_P: This predicate holds for a type that is const-qualified, but not volatile-qualified; other cv-qualifiers are ignored as well: only the const-ness is tested.
TYPE_MAIN_VARIANT: This macro returns the unqualified version of a type. It may be applied to an unqualified type, but it is not always the identity function in that case.

A few other macros and functions are usable with all types:

TYPE_SIZE: The number of bits required to represent the type, represented as an INTEGER_CST. For an incomplete type, TYPE_SIZE will be NULL_TREE.
TYPE_ALIGN: The alignment of the type, in bits, represented as an int.
TYPE_NAME: This macro returns a declaration (in the form of a TYPE_DECL) for the type. (Note this macro does not return a IDENTIFIER_NODE, as you might expect, given its name!) You can look at the DECL_NAME of the TYPE_DECL to obtain the actual name of the type. The TYPE_NAME will be NULL_TREE for a type that is not a built-in type, the result of a typedef, or a named class type.
CP_INTEGRAL_TYPE: This predicate holds if the type is an integral type. Notice that in C++, enumerations are not integral types.
ARITHMETIC_TYPE_P: This predicate holds if the type is an integral type (in the C++ sense) or a floating point type.
CLASS_TYPE_P: This predicate holds for a class-type.
TYPE_BUILT_IN: This predicate holds for a built-in type.
TYPE_PTRMEM_P: This predicate holds if the type is a pointer to data member.
TYPE_PTR_P: This predicate holds if the type is a pointer type, and the pointee is not a data member.
TYPE_PTRFN_P: This predicate holds for a pointer to function type.
TYPE_PTROB_P: This predicate holds for a pointer to object type. Note however that it does not hold for the generic pointer to object type void *. You may use TYPE_PTROBV_P to test for a pointer to object type as well as void *.
same_type_p: This predicate takes two types as input, and holds if they are the same type. For example, if one type is a typedef for the other, or both are typedefs for the same type. This predicate also holds if the two trees given as input are simply copies of one another; i.e., there is no difference between them at the source level, but, for whatever reason, a duplicate has been made in the representation. You should never use == (pointer equality) to compare types; always use same_type_p instead.

Detailed below are the various kinds of types, and the macros that can be used to access them. Although other kinds of types are used elsewhere in G++, the types described here are the only ones that you will encounter while examining the intermediate representation.

VOID_TYPE

Used to represent the void type.

INTEGER_TYPE

Used to represent the various integral types, including char, short, int, long, and long long. This code is not used for enumeration types, nor for the bool type. Note that GCC's CHAR_TYPE node is not used to represent char. The TYPE_PRECISION is the number of bits used in the representation, represented as an unsigned int. (Note that in the general case this is not the same value as TYPE_SIZE; suppose that there were a 24-bit integer type, but that alignment requirements for the ABI required 32-bit alignment. Then, TYPE_SIZE would be an INTEGER_CST for 32, while TYPE_PRECISION would be 24.) The integer type is unsigned if TREE_UNSIGNED holds; otherwise, it is signed.

The TYPE_MIN_VALUE is an INTEGER_CST for the smallest integer that may be represented by this type. Similarly, the TYPE_MAX_VALUE is an INTEGER_CST for the largest integer that may be represented by this type.

REAL_TYPE

Used to represent the float, double, and

long
double

types. The number of bits in the floating-point representation is given by TYPE_PRECISION, as in the INTEGER_TYPE case.

COMPLEX_TYPE

Used to represent GCC built-in __complex__ data types. The TREE_TYPE is the type of the real and imaginary parts.

ENUMERAL_TYPE

Used to represent an enumeration type. The TYPE_PRECISION gives (as an int), the number of bits used to represent the type. If there are no negative enumeration constants, TREE_UNSIGNED will hold. The minimum and maximum enumeration constants may be obtained with TYPE_MIN_VALUE and TYPE_MAX_VALUE, respectively; each of these macros returns an INTEGER_CST.

The actual enumeration constants themselves may be obtained by looking at the TYPE_VALUES. This macro will return a TREE_LIST, containing the constants. The TREE_PURPOSE of each node will be an IDENTIFIER_NODE giving the name of the constant; the TREE_VALUE will be an INTEGER_CST giving the value assigned to that constant. These constants will appear in the order in which they were declared. The TREE_TYPE of each of these constants will be the type of enumeration type itself.

BOOLEAN_TYPE

Used to represent the bool type.

POINTER_TYPE

Used to represent pointer types, and pointer to data member types. The TREE_TYPE gives the type to which this type points. If the type is a pointer to data member type, then TYPE_PTRMEM_P will hold. For a pointer to data member type of the form `T X::*', TYPE_PTRMEM_CLASS_TYPE will be the type X, while TYPE_PTRMEM_POINTED_TO_TYPE will be the type T.

REFERENCE_TYPE

Used to represent reference types. The TREE_TYPE gives the type to which this type refers.

FUNCTION_TYPE

Used to represent the type of non-member functions and of static member functions. The TREE_TYPE gives the return type of the function. The TYPE_ARG_TYPES are a TREE_LIST of the argument types. The TREE_VALUE of each node in this list is the type of the corresponding argument; the TREE_PURPOSE is an expression for the default argument value, if any. If the last node in the list is void_list_node (a TREE_LIST node whose TREE_VALUE is the void_type_node), then functions of this type do not take variable arguments. Otherwise, they do take a variable number of arguments.

Note that in C (but not in C++) a function declared like void f() is an unprototyped function taking a variable number of arguments; the TYPE_ARG_TYPES of such a function will be NULL.

METHOD_TYPE

Used to represent the type of a non-static member function. Like a FUNCTION_TYPE, the return type is given by the TREE_TYPE. The type of *this, i.e., the class of which functions of this type are a member, is given by the TYPE_METHOD_BASETYPE. The TYPE_ARG_TYPES is the parameter list, as for a FUNCTION_TYPE, and includes the this argument.

ARRAY_TYPE

Used to represent array types. The TREE_TYPE gives the type of the elements in the array. If the array-bound is present in the type, the TYPE_DOMAIN is an INTEGER_TYPE whose TYPE_MIN_VALUE and TYPE_MAX_VALUE will be the lower and upper bounds of the array, respectively. The TYPE_MIN_VALUE will always be an INTEGER_CST for zero, while the TYPE_MAX_VALUE will be one less than the number of elements in the array, i.e., the highest value which may be used to index an element in the array.

RECORD_TYPE

Used to represent struct and class types, as well as pointers to member functions. If TYPE_PTRMEMFUNC_P holds, then this type is a pointer-to-member type. In that case, the TYPE_PTRMEMFUNC_FN_TYPE is a POINTER_TYPE pointing to a METHOD_TYPE. The METHOD_TYPE is the type of a function pointed to by the pointer-to-member function. If TYPE_PTRMEMFUNC_P does not hold, this type is a class type. For more information, see see section 18.4.2 Classes.

UNKNOWN_TYPE

This node is used to represent a type the knowledge of which is insufficient for a sound processing.

OFFSET_TYPE

This node is used to represent a data member; for example a pointer-to-data-member is represented by a POINTER_TYPE whose TREE_TYPE is an OFFSET_TYPE. For a data member X::m the TYPE_OFFSET_BASETYPE is X and the TREE_TYPE is the type of m.

TYPENAME_TYPE

Used to represent a construct of the form typename T::A. The TYPE_CONTEXT is T; the TYPE_NAME is an IDENTIFIER_NODE for A. If the type is specified via a template-id, then TYPENAME_TYPE_FULLNAME yields a TEMPLATE_ID_EXPR. The TREE_TYPE is non-NULL if the node is implicitly generated in support for the implicit typename extension; in which case the TREE_TYPE is a type node for the base-class.

TYPEOF_TYPE

Used to represent the __typeof__ extension. The TYPE_FIELDS is the expression the type of which is being represented.

UNION_TYPE

Used to represent union types. For more information, see section 18.4.2 Classes.

There are variables whose values represent some of the basic types. These include:

void_type_node: A node for void.
integer_type_node: A node for int.
unsigned_type_node.: A node for unsigned int.
char_type_node.: A node for char.

It may sometimes be useful to compare one of these variables with a type in hand, using same_type_p.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.4 Scopes

The root of the entire intermediate representation is the variable global_namespace. This is the namespace specified with :: in C++ source code. All other namespaces, types, variables, functions, and so forth can be found starting with this namespace.

Besides namespaces, the other high-level scoping construct in C++ is the class. (Throughout this manual the term class is used to mean the types referred to in the ANSI/ISO C++ Standard as classes; these include types defined with the class, struct, and union keywords.)

18.4.1 Namespaces Member functions, types, etc.

18.4.2 Classes Members, bases, friends, etc.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.4.1 Namespaces

A namespace is represented by a NAMESPACE_DECL node.

However, except for the fact that it is distinguished as the root of the representation, the global namespace is no different from any other namespace. Thus, in what follows, we describe namespaces generally, rather than the global namespace in particular.

The ::std namespace, however, is special when flag_honor_std is not set. When flag_honor_std is set, the std namespace is just like any other namespace. When flag_honor_std is not set, however, the ::std namespace is treated as a synonym for the global namespace, thereby allowing users to write code that will work with compilers that put the standard library in the ::std namespace. The std namespace is represented by the variable std_node. Although std_node is a NAMESPACE_DECL, it does not have all the fields required of a real namespace, and the macros and functions described here do not work, in general. It is safest simply to ignore std_node should you encounter it while examining the internal representation. In particular, you will encounter std_node while looking at the members of the global namespace. Just skip it without attempting to examine its members.

The following macros and functions can be used on a NAMESPACE_DECL:

DECL_NAME

This macro is used to obtain the IDENTIFIER_NODE corresponding to the unqualified name of the name of the namespace (see section 18.2.2 Identifiers). The name of the global namespace is `::', even though in C++ the global namespace is unnamed. However, you should use comparison with global_namespace, rather than DECL_NAME to determine whether or not a namespaces is the global one. An unnamed namespace will have a DECL_NAME equal to anonymous_namespace_name. Within a single translation unit, all unnamed namespaces will have the same name.

DECL_CONTEXT

This macro returns the enclosing namespace. The DECL_CONTEXT for the global_namespace is NULL_TREE.

DECL_NAMESPACE_ALIAS

If this declaration is for a namespace alias, then DECL_NAMESPACE_ALIAS is the namespace for which this one is an alias.

Do not attempt to use cp_namespace_decls for a namespace which is an alias. Instead, follow DECL_NAMESPACE_ALIAS links until you reach an ordinary, non-alias, namespace, and call cp_namespace_decls there.

DECL_NAMESPACE_STD_P

This predicate holds if the namespace is the special ::std namespace.

cp_namespace_decls

This function will return the declarations contained in the namespace, including types, overloaded functions, other namespaces, and so forth. If there are no declarations, this function will return NULL_TREE. The declarations are connected through their TREE_CHAIN fields.

Although most entries on this list will be declarations, TREE_LIST nodes may also appear. In this case, the TREE_VALUE will be an OVERLOAD. The value of the TREE_PURPOSE is unspecified; back ends should ignore this value. As with the other kinds of declarations returned by cp_namespace_decls, the TREE_CHAIN will point to the next declaration in this list.

For more information on the kinds of declarations that can occur on this list, See section 18.5 Declarations. Some declarations will not appear on this list. In particular, no FIELD_DECL, LABEL_DECL, or PARM_DECL nodes will appear here.

This function cannot be used with namespaces that have DECL_NAMESPACE_ALIAS set.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.4.2 Classes

A class type is represented by either a RECORD_TYPE or a UNION_TYPE. A class declared with the union tag is represented by a UNION_TYPE, while classes declared with either the struct or the class tag are represented by RECORD_TYPEs. You can use the CLASSTYPE_DECLARED_CLASS macro to discern whether or not a particular type is a class as opposed to a struct. This macro will be true only for classes declared with the class tag.

Almost all non-function members are available on the TYPE_FIELDS list. Given one member, the next can be found by following the TREE_CHAIN. You should not depend in any way on the order in which fields appear on this list. All nodes on this list will be `DECL' nodes. A FIELD_DECL is used to represent a non-static data member, a VAR_DECL is used to represent a static data member, and a TYPE_DECL is used to represent a type. Note that the CONST_DECL for an enumeration constant will appear on this list, if the enumeration type was declared in the class. (Of course, the TYPE_DECL for the enumeration type will appear here as well.) There are no entries for base classes on this list. In particular, there is no FIELD_DECL for the "base-class portion" of an object.

The TYPE_VFIELD is a compiler-generated field used to point to virtual function tables. It may or may not appear on the TYPE_FIELDS list. However, back ends should handle the TYPE_VFIELD just like all the entries on the TYPE_FIELDS list.

The function members are available on the TYPE_METHODS list. Again, subsequent members are found by following the TREE_CHAIN field. If a function is overloaded, each of the overloaded functions appears; no OVERLOAD nodes appear on the TYPE_METHODS list. Implicitly declared functions (including default constructors, copy constructors, assignment operators, and destructors) will appear on this list as well.

Every class has an associated binfo, which can be obtained with TYPE_BINFO. Binfos are used to represent base-classes. The binfo given by TYPE_BINFO is the degenerate case, whereby every class is considered to be its own base-class. The base classes for a particular binfo can be obtained with BINFO_BASETYPES. These base-classes are themselves binfos. The class type associated with a binfo is given by BINFO_TYPE. It is always the case that BINFO_TYPE (TYPE_BINFO (x)) is the same type as x, up to qualifiers. However, it is not always the case that TYPE_BINFO (BINFO_TYPE (y)) is always the same binfo as y. The reason is that if y is a binfo representing a base-class B of a derived class D, then BINFO_TYPE (y) will be B, and TYPE_INFO (BINFO_TYPE (y)) will be B as its own base-class, rather than as a base-class of D.

The BINFO_BASETYPES is a TREE_VEC (see section 18.2.3 Containers). Base types appear in left-to-right order in this vector. You can tell whether or public, protected, or private inheritance was used by using the TREE_VIA_PUBLIC, TREE_VIA_PROTECTED, and TREE_VIA_PRIVATE macros. Each of these macros takes a BINFO and is true if and only if the indicated kind of inheritance was used. If TREE_VIA_VIRTUAL holds of a binfo, then its BINFO_TYPE was inherited from virtually.

FIXME: Talk about TYPE_NONCOPIED_PARTS.

The following macros can be used on a tree node representing a class-type.

LOCAL_CLASS_P: This predicate holds if the class is local class i.e. declared inside a function body.
TYPE_POLYMORPHIC_P: This predicate holds if the class has at least one virtual function (declared or inherited).
TYPE_HAS_DEFAULT_CONSTRUCTOR: This predicate holds whenever its argument represents a class-type with default constructor.
CLASSTYPE_HAS_MUTABLE
TYPE_HAS_MUTABLE_P: These predicates hold for a class-type having a mutable data member.
CLASSTYPE_NON_POD_P: This predicate holds only for class-types that are not PODs.
TYPE_HAS_NEW_OPERATOR: This predicate holds for a class-type that defines operator new.
TYPE_HAS_ARRAY_NEW_OPERATOR: This predicate holds for a class-type for which operator new[] is defined.
TYPE_OVERLOADS_CALL_EXPR: This predicate holds for class-type for which the function call operator() is overloaded.
TYPE_OVERLOADS_ARRAY_REF: This predicate holds for a class-type that overloads operator[]
TYPE_OVERLOADS_ARROW: This predicate holds for a class-type for which operator-> is overloaded.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.5 Declarations

This section covers the various kinds of declarations that appear in the internal representation, except for declarations of functions (represented by FUNCTION_DECL nodes), which are described in 18.6 Functions.

Some macros can be used with any kind of declaration. These include:

DECL_NAME

This macro returns an IDENTIFIER_NODE giving the name of the entity.

TREE_TYPE

This macro returns the type of the entity declared.

DECL_SOURCE_FILE

This macro returns the name of the file in which the entity was declared, as a char*. For an entity declared implicitly by the compiler (like __builtin_memcpy), this will be the string "<internal>".

DECL_SOURCE_LINE

This macro returns the line number at which the entity was declared, as an int.

DECL_ARTIFICIAL

This predicate holds if the declaration was implicitly generated by the compiler. For example, this predicate will hold of an implicitly declared member function, or of the TYPE_DECL implicitly generated for a class type. Recall that in C++ code like:

struct S {};

is roughly equivalent to C code like:

struct S {};
typedef struct S S;

The implicitly generated typedef declaration is represented by a TYPE_DECL for which DECL_ARTIFICIAL holds.

DECL_NAMESPACE_SCOPE_P

This predicate holds if the entity was declared at a namespace scope.

DECL_CLASS_SCOPE_P

This predicate holds if the entity was declared at a class scope.

DECL_FUNCTION_SCOPE_P

This predicate holds if the entity was declared inside a function body.

The various kinds of declarations include:

LABEL_DECL

These nodes are used to represent labels in function bodies. For more information, see 18.6 Functions. These nodes only appear in block scopes.

CONST_DECL

These nodes are used to represent enumeration constants. The value of the constant is given by DECL_INITIAL which will be an INTEGER_CST with the same type as the TREE_TYPE of the CONST_DECL, i.e., an ENUMERAL_TYPE.

RESULT_DECL

These nodes represent the value returned by a function. When a value is assigned to a RESULT_DECL, that indicates that the value should be returned, via bitwise copy, by the function. You can use DECL_SIZE and DECL_ALIGN on a RESULT_DECL, just as with a VAR_DECL.

TYPE_DECL

These nodes represent typedef declarations. The TREE_TYPE is the type declared to have the name given by DECL_NAME. In some cases, there is no associated name.

VAR_DECL

These nodes represent variables with namespace or block scope, as well as static data members. The DECL_SIZE and DECL_ALIGN are analogous to TYPE_SIZE and TYPE_ALIGN. For a declaration, you should always use the DECL_SIZE and DECL_ALIGN rather than the TYPE_SIZE and TYPE_ALIGN given by the TREE_TYPE, since special attributes may have been applied to the variable to give it a particular size and alignment. You may use the predicates DECL_THIS_STATIC or DECL_THIS_EXTERN to test whether the storage class specifiers static or extern were used to declare a variable.

If this variable is initialized (but does not require a constructor), the DECL_INITIAL will be an expression for the initializer. The initializer should be evaluated, and a bitwise copy into the variable performed. If the DECL_INITIAL is the error_mark_node, there is an initializer, but it is given by an explicit statement later in the code; no bitwise copy is required.

GCC provides an extension that allows either automatic variables, or global variables, to be placed in particular registers. This extension is being used for a particular VAR_DECL if DECL_REGISTER holds for the VAR_DECL, and if DECL_ASSEMBLER_NAME is not equal to DECL_NAME. In that case, DECL_ASSEMBLER_NAME is the name of the register into which the variable will be placed.

PARM_DECL

Used to represent a parameter to a function. Treat these nodes similarly to VAR_DECL nodes. These nodes only appear in the DECL_ARGUMENTS for a FUNCTION_DECL.

The DECL_ARG_TYPE for a PARM_DECL is the type that will actually be used when a value is passed to this function. It may be a wider type than the TREE_TYPE of the parameter; for example, the ordinary type might be short while the DECL_ARG_TYPE is int.

FIELD_DECL

These nodes represent non-static data members. The DECL_SIZE and DECL_ALIGN behave as for VAR_DECL nodes. The DECL_FIELD_BITPOS gives the first bit used for this field, as an INTEGER_CST. These values are indexed from zero, where zero indicates the first bit in the object.

If DECL_C_BIT_FIELD holds, this field is a bit-field.

NAMESPACE_DECL

See section 18.4.1 Namespaces.

TEMPLATE_DECL

These nodes are used to represent class, function, and variable (static data member) templates. The DECL_TEMPLATE_SPECIALIZATIONS are a TREE_LIST. The TREE_VALUE of each node in the list is a TEMPLATE_DECLs or FUNCTION_DECLs representing specializations (including instantiations) of this template. Back ends can safely ignore TEMPLATE_DECLs, but should examine FUNCTION_DECL nodes on the specializations list just as they would ordinary FUNCTION_DECL nodes.

For a class template, the DECL_TEMPLATE_INSTANTIATIONS list contains the instantiations. The TREE_VALUE of each node is an instantiation of the class. The DECL_TEMPLATE_SPECIALIZATIONS contains partial specializations of the class.

USING_DECL

Back ends can safely ignore these nodes.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.6 Functions

A function is represented by a FUNCTION_DECL node. A set of overloaded functions is sometimes represented by a OVERLOAD node.

An OVERLOAD node is not a declaration, so none of the `DECL_' macros should be used on an OVERLOAD. An OVERLOAD node is similar to a TREE_LIST. Use OVL_CURRENT to get the function associated with an OVERLOAD node; use OVL_NEXT to get the next OVERLOAD node in the list of overloaded functions. The macros OVL_CURRENT and OVL_NEXT are actually polymorphic; you can use them to work with FUNCTION_DECL nodes as well as with overloads. In the case of a FUNCTION_DECL, OVL_CURRENT will always return the function itself, and OVL_NEXT will always be NULL_TREE.

To determine the scope of a function, you can use the DECL_REAL_CONTEXT macro. This macro will return the class (either a RECORD_TYPE or a UNION_TYPE) or namespace (a NAMESPACE_DECL) of which the function is a member. For a virtual function, this macro returns the class in which the function was actually defined, not the base class in which the virtual declaration occurred. If a friend function is defined in a class scope, the DECL_CLASS_CONTEXT macro can be used to determine the class in which it was defined. For example, in

class C { friend void f() {} };
the DECL_REAL_CONTEXT for f will be the global_namespace, but the DECL_CLASS_CONTEXT will be the RECORD_TYPE for C.

The DECL_REAL_CONTEXT and DECL_CLASS_CONTEXT are not available in C; instead you should simply use DECL_CONTEXT. In C, the DECL_CONTEXT for a function maybe another function. This representation indicates that the GNU nested function extension is in use. For details on the semantics of nested functions, see the GCC Manual. The nested function can refer to local variables in its containing function. Such references are not explicitly marked in the tree structure; back ends must look at the DECL_CONTEXT for the referenced VAR_DECL. If the DECL_CONTEXT for the referenced VAR_DECL is not the same as the function currently being processed, and neither DECL_EXTERNAL nor DECL_STATIC hold, then the reference is to a local variable in a containing function, and the back end must take appropriate action.

18.6.1 Function Basics Function names, linkage, and so forth.

18.6.2 Function Bodies The statements that make up a function body.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.6.1 Function Basics

The following macros and functions can be used on a FUNCTION_DECL:

DECL_MAIN_P

This predicate holds for a function that is the program entry point ::code.

DECL_NAME

This macro returns the unqualified name of the function, as an IDENTIFIER_NODE. For an instantiation of a function template, the DECL_NAME is the unqualified name of the template, not something like f<int>. The value of DECL_NAME is undefined when used on a constructor, destructor, overloaded operator, or type-conversion operator, or any function that is implicitly generated by the compiler. See below for macros that can be used to distinguish these cases.

DECL_ASSEMBLER_NAME

This macro returns the mangled name of the function, also an IDENTIFIER_NODE. This name does not contain leading underscores on systems that prefix all identifiers with underscores. The mangled name is computed in the same way on all platforms; if special processing is required to deal with the object file format used on a particular platform, it is the responsibility of the back end to perform those modifications. (Of course, the back end should not modify DECL_ASSEMBLER_NAME itself.)

DECL_EXTERNAL

This predicate holds if the function is undefined.

TREE_PUBLIC

This predicate holds if the function has external linkage.

DECL_LOCAL_FUNCTION_P

This predicate holds if the function was declared at block scope, even though it has a global scope.

DECL_ANTICIPATED

This predicate holds if the function is a built-in function but its prototype is not yet explicitly declared.

DECL_EXTERN_C_FUNCTION_P

This predicate holds if the function is declared as an `extern "C"' function.

DECL_LINKONCE_P

This macro holds if multiple copies of this function may be emitted in various translation units. It is the responsibility of the linker to merge the various copies. Template instantiations are the most common example of functions for which DECL_LINKONCE_P holds; G++ instantiates needed templates in all translation units which require them, and then relies on the linker to remove duplicate instantiations.

FIXME: This macro is not yet implemented.

DECL_FUNCTION_MEMBER_P

This macro holds if the function is a member of a class, rather than a member of a namespace.

DECL_STATIC_FUNCTION_P

This predicate holds if the function a static member function.

DECL_NONSTATIC_MEMBER_FUNCTION_P

This macro holds for a non-static member function.

DECL_CONST_MEMFUNC_P

This predicate holds for a const-member function.

DECL_VOLATILE_MEMFUNC_P

This predicate holds for a volatile-member function.

DECL_CONSTRUCTOR_P

This macro holds if the function is a constructor.

DECL_NONCONVERTING_P

This predicate holds if the constructor is a non-converting constructor.

DECL_COMPLETE_CONSTRUCTOR_P

This predicate holds for a function which is a constructor for an object of a complete type.

DECL_BASE_CONSTRUCTOR_P

This predicate holds for a function which is a constructor for a base class sub-object.

DECL_COPY_CONSTRUCTOR_P

This predicate holds for a function which is a copy-constructor.

DECL_DESTRUCTOR_P

This macro holds if the function is a destructor.

DECL_COMPLETE_DESTRUCTOR_P

This predicate holds if the function is the destructor for an object a complete type.

DECL_OVERLOADED_OPERATOR_P

This macro holds if the function is an overloaded operator.

DECL_CONV_FN_P

This macro holds if the function is a type-conversion operator.

DECL_GLOBAL_CTOR_P

This predicate holds if the function is a file-scope initialization function.

DECL_GLOBAL_DTOR_P

This predicate holds if the function is a file-scope finalization function.

DECL_THUNK_P

This predicate holds if the function is a thunk.

These functions represent stub code that adjusts the this pointer and then jumps to another function. When the jumped-to function returns, control is transferred directly to the caller, without returning to the thunk. The first parameter to the thunk is always the this pointer; the thunk should add THUNK_DELTA to this value. (The THUNK_DELTA is an int, not an INTEGER_CST.)

Then, if THUNK_VCALL_OFFSET (an INTEGER_CST) is non-zero the adjusted this pointer must be adjusted again. The complete calculation is given by the following pseudo-code:

this += THUNK_DELTA if (THUNK_VCALL_OFFSET) this += (*((ptrdiff_t **) this))[THUNK_VCALL_OFFSET]

Finally, the thunk should jump to the location given by DECL_INITIAL; this will always be an expression for the address of a function.

DECL_NON_THUNK_FUNCTION_P

This predicate holds if the function is not a thunk function.

GLOBAL_INIT_PRIORITY

If either DECL_GLOBAL_CTOR_P or DECL_GLOBAL_DTOR_P holds, then this gives the initialization priority for the function. The linker will arrange that all functions for which DECL_GLOBAL_CTOR_P holds are run in increasing order of priority before main is called. When the program exits, all functions for which DECL_GLOBAL_DTOR_P holds are run in the reverse order.

DECL_ARTIFICIAL

This macro holds if the function was implicitly generated by the compiler, rather than explicitly declared. In addition to implicitly generated class member functions, this macro holds for the special functions created to implement static initialization and destruction, to compute run-time type information, and so forth.

DECL_ARGUMENTS

This macro returns the PARM_DECL for the first argument to the function. Subsequent PARM_DECL nodes can be obtained by following the TREE_CHAIN links.

DECL_RESULT

This macro returns the RESULT_DECL for the function.

TREE_TYPE

This macro returns the FUNCTION_TYPE or METHOD_TYPE for the function.

TYPE_RAISES_EXCEPTIONS

This macro returns the list of exceptions that a (member-)function can raise. The returned list, if non NULL, is comprised of nodes whose TREE_VALUE represents a type.

TYPE_NOTHROW_P

This predicate holds when the exception-specification of its arguments if of the form `()'.

DECL_ARRAY_DELETE_OPERATOR_P

This predicate holds if the function an overloaded operator delete[].

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.6.2 Function Bodies

A function that has a definition in the current translation unit will have a non-NULL DECL_INITIAL. However, back ends should not make use of the particular value given by DECL_INITIAL.

The DECL_SAVED_TREE macro will give the complete body of the function. This node will usually be a COMPOUND_STMT representing the outermost block of the function, but it may also be a TRY_BLOCK, a RETURN_INIT, or any other valid statement.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.6.2.1 Statements

There are tree nodes corresponding to all of the source-level statement constructs. These are enumerated here, together with a list of the various macros that can be used to obtain information about them. There are a few macros that can be used with all statements:

STMT_LINENO

This macro returns the line number for the statement. If the statement spans multiple lines, this value will be the number of the first line on which the statement occurs. Although we mention CASE_LABEL below as if it were a statement, they do not allow the use of STMT_LINENO. There is no way to obtain the line number for a CASE_LABEL.

Statements do not contain information about the file from which they came; that information is implicit in the FUNCTION_DECL from which the statements originate.

STMT_IS_FULL_EXPR_P

In C++, statements normally constitute "full expressions"; temporaries created during a statement are destroyed when the statement is complete. However, G++ sometimes represents expressions by statements; these statements will not have STMT_IS_FULL_EXPR_P set. Temporaries created during such statements should be destroyed when the innermost enclosing statement with STMT_IS_FULL_EXPR_P set is exited.

Here is the list of the various statement nodes, and the macros used to access them. This documentation describes the use of these nodes in non-template functions (including instantiations of template functions). In template functions, the same nodes are used, but sometimes in slightly different ways.

Many of the statements have substatements. For example, a while loop will have a body, which is itself a statement. If the substatement is NULL_TREE, it is considered equivalent to a statement consisting of a single ;, i.e., an expression statement in which the expression has been omitted. A substatement may in fact be a list of statements, connected via their TREE_CHAINs. So, you should always process the statement tree by looping over substatements, like this:

void process_stmt (stmt) tree stmt; { while (stmt) { switch (TREE_CODE (stmt)) { case IF_STMT: process_stmt (THEN_CLAUSE (stmt)); /* More processing here. */ break; ... } stmt = TREE_CHAIN (stmt); } }
In other words, while the then clause of an if statement in C++ can be only one statement (although that one statement may be a compound statement), the intermediate representation will sometimes use several statements chained together.

ASM_STMT

Used to represent an inline assembly statement. For an inline assembly statement like:

asm ("mov x, y");
The ASM_STRING macro will return a STRING_CST node for "mov x, y". If the original statement made use of the extended-assembly syntax, then ASM_OUTPUTS, ASM_INPUTS, and ASM_CLOBBERS will be the outputs, inputs, and clobbers for the statement, represented as STRING_CST nodes. The extended-assembly syntax looks like:

asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));
The first string is the ASM_STRING, containing the instruction template. The next two strings are the output and inputs, respectively; this statement has no clobbers. As this example indicates, "plain" assembly statements are merely a special case of extended assembly statements; they have no cv-qualifiers, outputs, inputs, or clobbers. All of the strings will be NUL-terminated, and will contain no embedded NUL-characters.

If the assembly statement is declared volatile, or if the statement was not an extended assembly statement, and is therefore implicitly volatile, then the predicate ASM_VOLATILE_P will hold of the ASM_STMT.

BREAK_STMT

Used to represent a break statement. There are no additional fields.

CASE_LABEL

Use to represent a case label, range of case labels, or a default label. If CASE_LOW is NULL_TREE, then this is a a default label. Otherwise, if CASE_HIGH is NULL_TREE, then this is an ordinary case label. In this case, CASE_LOW is an expression giving the value of the label. Both CASE_LOW and CASE_HIGH are INTEGER_CST nodes. These values will have the same type as the condition expression in the switch statement.

Otherwise, if both CASE_LOW and CASE_HIGH are defined, the statement is a range of case labels. Such statements originate with the extension that allows users to write things of the form:

case 2 ... 5:
The first value will be CASE_LOW, while the second will be CASE_HIGH.

CLEANUP_STMT

Used to represent an action that should take place upon exit from the enclosing scope. Typically, these actions are calls to destructors for local objects, but back ends cannot rely on this fact. If these nodes are in fact representing such destructors, CLEANUP_DECL will be the VAR_DECL destroyed. Otherwise, CLEANUP_DECL will be NULL_TREE. In any case, the CLEANUP_EXPR is the expression to execute. The cleanups executed on exit from a scope should be run in the reverse order of the order in which the associated CLEANUP_STMTs were encountered.

COMPOUND_STMT

Used to represent a brace-enclosed block. The first substatement is given by COMPOUND_BODY. Subsequent substatements are found by following the TREE_CHAIN link from one substatement to the next. The COMPOUND_BODY will be NULL_TREE if there are no substatements.

CONTINUE_STMT

Used to represent a continue statement. There are no additional fields.

CTOR_STMT

Used to mark the beginning (if CTOR_BEGIN_P holds) or end (if CTOR_END_P holds of the main body of a constructor. See also SUBOBJECT for more information on how to use these nodes.

DECL_STMT

Used to represent a local declaration. The DECL_STMT_DECL macro can be used to obtain the entity declared. This declaration may be a LABEL_DECL, indicating that the label declared is a local label. (As an extension, GCC allows the declaration of labels with scope.) In C, this declaration may be a FUNCTION_DECL, indicating the use of the GCC nested function extension. For more information, see section 18.6 Functions.

DO_STMT

Used to represent a do loop. The body of the loop is given by DO_BODY while the termination condition for the loop is given by DO_COND. The condition for a do-statement is always an expression.

EMPTY_CLASS_EXPR

Used to represent a temporary object of a class with no data whose address is never taken. (All such objects are interchangeable.) The TREE_TYPE represents the type of the object.

EXPR_STMT

Used to represent an expression statement. Use EXPR_STMT_EXPR to obtain the expression.

FOR_STMT

Used to represent a for statement. The FOR_INIT_STMT is the initialization statement for the loop. The FOR_COND is the termination condition. The FOR_EXPR is the expression executed right before the FOR_COND on each loop iteration; often, this expression increments a counter. The body of the loop is given by FOR_BODY. Note that FOR_INIT_STMT and FOR_BODY return statements, while FOR_COND and FOR_EXPR return expressions.

GOTO_STMT

Used to represent a goto statement. The GOTO_DESTINATION will usually be a LABEL_DECL. However, if the "computed goto" extension has been used, the GOTO_DESTINATION will be an arbitrary expression indicating the destination. This expression will always have pointer type.

IF_STMT

Used to represent an if statement. The IF_COND is the expression.

If the condition is a TREE_LIST, then the TREE_PURPOSE is a statement (usually a DECL_STMT). Each time the condition is evaluated, the statement should be executed. Then, the TREE_VALUE should be used as the conditional expression itself. This representation is used to handle C++ code like this:

if (int i = 7) ...

where there is a new local variable (or variables) declared within the condition.

The THEN_CLAUSE represents the statement given by the then condition, while the ELSE_CLAUSE represents the statement given by the else condition.

LABEL_STMT

Used to represent a label. The LABEL_DECL declared by this statement can be obtained with the LABEL_STMT_LABEL macro. The IDENTIFIER_NODE giving the name of the label can be obtained from the LABEL_DECL with DECL_NAME.

RETURN_INIT

If the function uses the G++ "named return value" extension, meaning that the function has been defined like:

S f(int) return s {...}
then there will be a RETURN_INIT. There is never a named returned value for a constructor. The first argument to the RETURN_INIT is the name of the object returned; the second argument is the initializer for the object. The object is initialized when the RETURN_INIT is encountered. The object referred to is the actual object returned; this extension is a manual way of doing the "return-value optimization." Therefore, the object must actually be constructed in the place where the object will be returned.

RETURN_STMT

Used to represent a return statement. The RETURN_EXPR is the expression returned; it will be NULL_TREE if the statement was just

return;

SCOPE_STMT

A scope-statement represents the beginning or end of a scope. If SCOPE_BEGIN_P holds, this statement represents the beginning of a scope; if SCOPE_END_P holds this statement represents the end of a scope. On exit from a scope, all cleanups from CLEANUP_STMTs occurring in the scope must be run, in reverse order to the order in which they were encountered. If SCOPE_NULLIFIED_P or SCOPE_NO_CLEANUPS_P holds of the scope, back ends should behave as if the SCOPE_STMT were not present at all.

START_CATCH_STMT

These statements represent the location to which control is transferred when an exception is thrown. The START_CATCH_TYPE is the type of exception that will be caught by this handler; it is equal (by pointer equality) to CATCH_ALL_TYPE if this handler is for all types.

SUBOBJECT

In a constructor, these nodes are used to mark the point at which a subobject of this is fully constructed. If, after this point, an exception is thrown before a CTOR_STMT with CTOR_END_P set is encountered, the SUBOBJECT_CLEANUP must be executed. The cleanups must be executed in the reverse order in which they appear.

SWITCH_STMT

Used to represent a switch statement. The SWITCH_COND is the expression on which the switch is occurring. See the documentation for an IF_STMT for more information on the representation used for the condition. The SWITCH_BODY is the body of the switch statement.

TRY_BLOCK

Used to represent a try block. The body of the try block is given by TRY_STMTS. Each of the catch blocks is a HANDLER node. The first handler is given by TRY_HANDLERS. Subsequent handlers are obtained by following the TREE_CHAIN link from one handler to the next. The body of the handler is given by HANDLER_BODY.

If CLEANUP_P holds of the TRY_BLOCK, then the TRY_HANDLERS will not be a HANDLER node. Instead, it will be an expression that should be executed if an exception is thrown in the try block. It must rethrow the exception after executing that code. And, if an exception is thrown while the expression is executing, terminate must be called.

USING_STMT

Used to represent a using directive. The namespace is given by USING_STMT_NAMESPACE, which will be a NAMESPACE_DECL. This node is needed inside template functions, to implement using directives during instantiation.

WHILE_STMT

Used to represent a while loop. The WHILE_COND is the termination condition for the loop. See the documentation for an IF_STMT for more information on the representation used for the condition.

The WHILE_BODY is the body of the loop.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.7 Attributes in trees

Attributes, as specified using the __attribute__ keyword, are represented internally as a TREE_LIST. The TREE_PURPOSE is the name of the attribute, as an IDENTIFIER_NODE. The TREE_VALUE is a TREE_LIST of the arguments of the attribute, if any, or NULL_TREE if there are no arguments; the arguments are stored as the TREE_VALUE of successive entries in the list, and may be identifiers or expressions. The TREE_CHAIN of the attribute is the next attribute in a list of attributes applying to the same declaration or type, or NULL_TREE if there are no further attributes in the list.

Attributes may be attached to declarations and to types; these attributes may be accessed with the following macros. At present only machine-dependent attributes are stored in this way (other attributes cause changes to the declaration or type or to other internal compiler data structures, but are not themselves stored along with the declaration or type), but in future all attributes may be stored like this.

Tree Macro: tree DECL_MACHINE_ATTRIBUTES (tree decl): This macro returns the attributes on the declaration decl.

Tree Macro: tree TYPE_ATTRIBUTES (tree type): This macro returns the attributes on the type type.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

18.8 Expressions

The internal representation for expressions is for the most part quite straightforward. However, there are a few facts that one must bear in mind. In particular, the expression "tree" is actually a directed acyclic graph. (For example there may be many references to the integer constant zero throughout the source program; many of these will be represented by the same expression node.) You should not rely on certain kinds of node being shared, nor should rely on certain kinds of nodes being unshared.

The following macros can be used with all expression nodes:

TREE_TYPE: Returns the type of the expression. This value may not be precisely the same type that would be given the expression in the original program.

In what follows, some nodes that one might expect to always have type bool are documented to have either integral or boolean type. At some point in the future, the C front end may also make use of this same intermediate representation, and at this point these nodes will certainly have integral type. The previous sentence is not meant to imply that the C++ front end does not or will not give these nodes integral type.

Below, we list the various kinds of expression nodes. Except where noted otherwise, the operands to an expression are accessed using the TREE_OPERAND macro. For example, to access the first operand to a binary plus expression expr, use:

TREE_OPERAND (expr, 0)
As this example indicates, the operands are zero-indexed.

The table below begins with constants, moves on to unary expressions, then proceeds to binary expressions, and concludes with various other kinds of expressions:

INTEGER_CST

These nodes represent integer constants. Note that the type of these constants is obtained with TREE_TYPE; they are not always of type int. In particular, char constants are represented with INTEGER_CST nodes. The value of the integer constant e is given by @example ((TREE_INT_CST_HIGH (e) << HOST_BITS_PER_WIDE_INT) + TREE_INST_CST_LOW (e)) HOST_BITS_PER_WIDE_INT is at least thirty-two on all platforms. Both TREE_INT_CST_HIGH and TREE_INT_CST_LOW return a HOST_WIDE_INT. The value of an INTEGER_CST is interpreted as a signed or unsigned quantity depending on the type of the constant. In general, the expression given above will overflow, so it should not be used to calculate the value of the constant.

The variable integer_zero_node is a integer constant with value zero. Similarly, integer_one_node is an integer constant with value one. The size_zero_node and size_one_node variables are analogous, but have type size_t rather than int.

The function tree_int_cst_lt is a predicate which holds if its first argument is less than its second. Both constants are assumed to have the same signedness (i.e., either both should be signed or both should be unsigned.) The full width of the constant is used when doing the comparison; the usual rules about promotions and conversions are ignored. Similarly, tree_int_cst_equal holds if the two constants are equal. The tree_int_cst_sgn function returns the sign of a constant. The value is 1, 0, or -1 according on whether the constant is greater than, equal to, or less than zero. Again, the signedness of the constant's type is taken into account; an unsigned constant is never less than zero, no matter what its bit-pattern.

REAL_CST

FIXME: Talk about how to obtain representations of this constant, do comparisons, and so forth.

COMPLEX_CST

These nodes are used to represent complex number constants, that is a __complex__ whose parts are constant nodes. The TREE_REALPART and TREE_IMAGPART return the real and the imaginary parts respectively.

STRING_CST

These nodes represent string-constants. The TREE_STRING_LENGTH returns the length of the string, as an int. The TREE_STRING_POINTER is a char* containing the string itself. The string may not be NUL-terminated, and it may contain embedded NUL characters. Therefore, the TREE_STRING_LENGTH includes the trailing NUL if it is present.

For wide string constants, the TREE_STRING_LENGTH is the number of bytes in the string, and the TREE_STRING_POINTER points to an array of the bytes of the string, as represented on the target system (that is, as integers in the target endianness). Wide and non-wide string constants are distinguished only by the TREE_TYPE of the STRING_CST.

FIXME: The formats of string constants are not well-defined when the target system bytes are not the same width as host system bytes.

PTRMEM_CST

These nodes are used to represent pointer-to-member constants. The PTRMEM_CST_CLASS is the class type (either a RECORD_TYPE or UNION_TYPE within which the pointer points), and the PTRMEM_CST_MEMBER is the declaration for the pointed to object. Note that the DECL_CONTEXT for the PTRMEM_CST_MEMBER is in general different from from the PTRMEM_CST_CLASS. For example, given:

struct B { int i; };
struct D : public B {};
int D::*dp = &D::i;

The PTRMEM_CST_CLASS for &D::i is D, even though the DECL_CONTEXT for the PTRMEM_CST_MEMBER is B, since B::i is a member of B, not D.

VAR_DECL

These nodes represent variables, including static data members. For more information, see section 18.5 Declarations.

NEGATE_EXPR

These nodes represent unary negation of the single operand, for both integer and floating-point types. The type of negation can be determined by looking at the type of the expression.

BIT_NOT_EXPR

These nodes represent bitwise complement, and will always have integral type. The only operand is the value to be complemented.

TRUTH_NOT_EXPR

These nodes represent logical negation, and will always have integral (or boolean) type. The operand is the value being negated.

PREDECREMENT_EXPR

PREINCREMENT_EXPR

POSTDECREMENT_EXPR

POSTINCREMENT_EXPR

These nodes represent increment and decrement expressions. The value of the single operand is computed, and the operand incremented or decremented. In the case of PREDECREMENT_EXPR and PREINCREMENT_EXPR, the value of the expression is the value resulting after the increment or decrement; in the case of POSTDECREMENT_EXPR and POSTINCREMENT_EXPR is the value before the increment or decrement occurs. The type of the operand, like that of the result, will be either integral, boolean, or floating-point.

ADDR_EXPR

These nodes are used to represent the address of an object. (These expressions will always have pointer or reference type.) The operand may be another expression, or it may be a declaration.

As an extension, GCC allows users to take the address of a label. In this case, the operand of the ADDR_EXPR will be a LABEL_DECL. The type of such an expression is void*.

If the object addressed is not an lvalue, a temporary is created, and the address of the temporary is used.

INDIRECT_REF

These nodes are used to represent the object pointed to by a pointer. The operand is the pointer being dereferenced; it will always have pointer or reference type.

FIX_TRUNC_EXPR

These nodes represent conversion of a floating-point value to an integer. The single operand will have a floating-point type, while the the complete expression will have an integral (or boolean) type. The operand is rounded towards zero.

FLOAT_EXPR

These nodes represent conversion of an integral (or boolean) value to a floating-point value. The single operand will have integral type, while the complete expression will have a floating-point type.

FIXME: How is the operand supposed to be rounded? Is this dependent on `-mieee'?

COMPLEX_EXPR

These nodes are used to represent complex numbers constructed from two expressions of the same (integer or real) type. The first operand is the real part and the second operand is the imaginary part.

CONJ_EXPR

These nodes represent the conjugate of their operand.

REALPART_EXPR

IMAGPART_EXPR

These nodes represent respectively the real and the imaginary parts of complex numbers (their sole argument).

NON_LVALUE_EXPR

These nodes indicate that their one and only operand is not an lvalue. A back end can treat these identically to the single operand.

NOP_EXPR

These nodes are used to represent conversions that do not require any code-generation. For example, conversion of a char* to an int* does not require any code be generated; such a conversion is represented by a NOP_EXPR. The single operand is the expression to be converted. The conversion from a pointer to a reference is also represented with a NOP_EXPR.

CONVERT_EXPR

These nodes are similar to NOP_EXPRs, but are used in those situations where code may need to be generated. For example, if an int* is converted to an int code may need to be generated on some platforms. These nodes are never used for C++-specific conversions, like conversions between pointers to different classes in an inheritance hierarchy. Any adjustments that need to be made in such cases are always indicated explicitly. Similarly, a user-defined conversion is never represented by a CONVERT_EXPR; instead, the function calls are made explicit.

THROW_EXPR

These nodes represent throw expressions. The single operand is an expression for the code that should be executed to throw the exception. However, there is one implicit action not represented in that expression; namely the call to __throw. This function takes no arguments. If setjmp/longjmp exceptions are used, the function __sjthrow is called instead. The normal GCC back end uses the function emit_throw to generate this code; you can examine this function to see what needs to be done.

LSHIFT_EXPR

RSHIFT_EXPR

These nodes represent left and right shifts, respectively. The first operand is the value to shift; it will always be of integral type. The second operand is an expression for the number of bits by which to shift. Right shift should be treated as arithmetic, i.e., the high-order bits should be zero-filled when the expression has unsigned type and filled with the sign bit when the expression has signed type.

BIT_IOR_EXPR

BIT_XOR_EXPR

BIT_AND_EXPR

These nodes represent bitwise inclusive or, bitwise exclusive or, and bitwise and, respectively. Both operands will always have integral type.

TRUTH_ANDIF_EXPR

TRUTH_ORIF_EXPR

These nodes represent logical and and logical or, respectively. These operators are not strict; i.e., the second operand is evaluated only if the value of the expression is not determined by evaluation of the first operand. The type of the operands, and the result type, is always of boolean or integral type.

TRUTH_AND_EXPR

TRUTH_OR_EXPR

TRUTH_XOR_EXPR

These nodes represent logical and, logical or, and logical exclusive or. They are strict; both arguments are always evaluated. There are no corresponding operators in C or C++, but the front end will sometimes generate these expressions anyhow, if it can tell that strictness does not matter.

PLUS_EXPR

MINUS_EXPR

MULT_EXPR

TRUNC_DIV_EXPR

TRUNC_MOD_EXPR

RDIV_EXPR

These nodes represent various binary arithmetic operations. Respectively, these operations are addition, subtraction (of the second operand from the first), multiplication, integer division, integer remainder, and floating-point division. The operands to the first three of these may have either integral or floating type, but there will never be case in which one operand is of floating type and the other is of integral type.

The result of a TRUNC_DIV_EXPR is always rounded towards zero. The TRUNC_MOD_EXPR of two operands a and b is always a - a/b where the division is as if computed by a TRUNC_DIV_EXPR.

ARRAY_REF

These nodes represent array accesses. The first operand is the array; the second is the index. To calculate the address of the memory accessed, you must scale the index by the size of the type of the array elements.

EXACT_DIV_EXPR

Document.

LT_EXPR

LE_EXPR

GT_EXPR

GE_EXPR

EQ_EXPR

NE_EXPR

These nodes represent the less than, less than or equal to, greater than, greater than or equal to, equal, and not equal comparison operators. The first and second operand with either be both of integral type or both of floating type. The result type of these expressions will always be of integral or boolean type.

MODIFY_EXPR

These nodes represent assignment. The left-hand side is the first operand; the right-hand side is the second operand. The left-hand side will be a VAR_DECL, INDIRECT_REF, COMPONENT_REF, or other lvalue.

These nodes are used to represent not only assignment with `=' but also compound assignments (like `+='), by reduction to `=' assignment. In other words, the representation for `i += 3' looks just like that for `i = i + 3'.

INIT_EXPR

These nodes are just like MODIFY_EXPR, but are used only when a variable is initialized, rather than assigned to subsequently.

COMPONENT_REF

These nodes represent non-static data member accesses. The first operand is the object (rather than a pointer to it); the second operand is the FIELD_DECL for the data member.

COMPOUND_EXPR

These nodes represent comma-expressions. The first operand is an expression whose value is computed and thrown away prior to the evaluation of the second operand. The value of the entire expression is the value of the second operand.

COND_EXPR

These nodes represent ?: expressions. The first operand is of boolean or integral type. If it evaluates to a non-zero value, the second operand should be evaluated, and returned as the value of the expression. Otherwise, the third operand is evaluated, and returned as the value of the expression. As a GNU extension, the middle operand of the ?: operator may be omitted in the source, like this:

x ? : 3
which is equivalent to

x ? x : 3

assuming that x is an expression without side-effects. However, in the case that the first operation causes side effects, the side-effects occur only once. Consumers of the internal representation do not need to worry about this oddity; the second operand will be always be present in the internal representation.

CALL_EXPR

These nodes are used to represent calls to functions, including non-static member functions. The first operand is a pointer to the function to call; it is always an expression whose type is a POINTER_TYPE. The second argument is a TREE_LIST. The arguments to the call appear left-to-right in the list. The TREE_VALUE of each list node contains the expression corresponding to that argument. (The value of TREE_PURPOSE for these nodes is unspecified, and should be ignored.) For non-static member functions, there will be an operand corresponding to the this pointer. There will always be expressions corresponding to all of the arguments, even if the function is declared with default arguments and some arguments are not explicitly provided at the call sites.

STMT_EXPR

These nodes are used to represent GCC's statement-expression extension. The statement-expression extension allows code like this:

int f() { return ({ int j; j = 3; j + 7; }); }

In other words, an sequence of statements may occur where a single expression would normally appear. The STMT_EXPR node represents such an expression. The STMT_EXPR_STMT gives the statement contained in the expression; this is always a COMPOUND_STMT. The value of the expression is the value of the last sub-statement in the COMPOUND_STMT. More precisely, the value is the value computed by the last EXPR_STMT in the outermost scope of the COMPOUND_STMT. For example, in:

({ 3; })

the value is 3 while in:

({ if (x) { 3; } })

(represented by a nested COMPOUND_STMT), there is no value. If the STMT_EXPR does not yield a value, it's type will be void.

BIND_EXPR

These nodes represent local blocks. The first operand is a list of temporary variables, connected via their TREE_CHAIN field. These will never require cleanups. The scope of these variables is just the body of the BIND_EXPR. The body of the BIND_EXPR is the second operand.

LOOP_EXPR

These nodes represent "infinite" loops. The LOOP_EXPR_BODY represents the body of the loop. It should be executed forever, unless an EXIT_EXPR is encountered.

EXIT_EXPR

These nodes represent conditional exits from the nearest enclosing LOOP_EXPR. The single operand is the condition; if it is non-zero, then the loop should be exited. An EXIT_EXPR will only appear within a LOOP_EXPR.

CLEANUP_POINT_EXPR

These nodes represent full-expressions. The single operand is an expression to evaluate. Any destructor calls engendered by the creation of temporaries during the evaluation of that expression should be performed immediately after the expression is evaluated.

CONSTRUCTOR

These nodes represent the brace-enclosed initializers for a structure or array. The first operand is reserved for use by the back end. The second operand is a TREE_LIST. If the TREE_TYPE of the CONSTRUCTOR is a RECORD_TYPE or UNION_TYPE, then the TREE_PURPOSE of each node in the TREE_LIST will be a FIELD_DECL and the TREE_VALUE of each node will be the expression used to initialize that field. You should not depend on the fields appearing in any particular order, nor should you assume that all fields will be represented. Unrepresented fields may be assigned any value.

If the TREE_TYPE of the CONSTRUCTOR is an ARRAY_TYPE, then the TREE_PURPOSE of each element in the TREE_LIST will be an INTEGER_CST. This constant indicates which element of the array (indexed from zero) is being assigned to; again, the TREE_VALUE is the corresponding initializer. If the TREE_PURPOSE is NULL_TREE, then the initializer is for the next available array element.

Conceptually, before any initialization is done, the entire area of storage is initialized to zero.

SAVE_EXPR

A SAVE_EXPR represents an expression (possibly involving side-effects) that is used more than once. The side-effects should occur only the first time the expression is evaluated. Subsequent uses should just reuse the computed value. The first operand to the SAVE_EXPR is the expression to evaluate. The side-effects should be executed where the SAVE_EXPR is first encountered in a depth-first preorder traversal of the expression tree.

TARGET_EXPR

A TARGET_EXPR represents a temporary object. The first operand is a VAR_DECL for the temporary variable. The second operand is the initializer for the temporary. The initializer is evaluated, and copied (bitwise) into the temporary.

Often, a TARGET_EXPR occurs on the right-hand side of an assignment, or as the second operand to a comma-expression which is itself the right-hand side of an assignment, etc. In this case, we say that the TARGET_EXPR is "normal"; otherwise, we say it is "orphaned". For a normal TARGET_EXPR the temporary variable should be treated as an alias for the left-hand side of the assignment, rather than as a new temporary variable.

The third operand to the TARGET_EXPR, if present, is a cleanup-expression (i.e., destructor call) for the temporary. If this expression is orphaned, then this expression must be executed when the statement containing this expression is complete. These cleanups must always be executed in the order opposite to that in which they were encountered. Note that if a temporary is created on one branch of a conditional operator (i.e., in the second or third operand to a COND_EXPR), the cleanup must be run only if that branch is actually executed.

See STMT_IS_FULL_EXPR_P for more information about running these cleanups.

AGGR_INIT_EXPR

An AGGR_INIT_EXPR represents the initialization as the return value of a function call, or as the result of a constructor. An AGGR_INIT_EXPR will only appear as the second operand of a TARGET_EXPR. The first operand to the AGGR_INIT_EXPR is the address of a function to call, just as in a CALL_EXPR. The second operand are the arguments to pass that function, as a TREE_LIST, again in a manner similar to that of a CALL_EXPR. The value of the expression is that returned by the function.

If AGGR_INIT_VIA_CTOR_P holds of the AGGR_INIT_EXPR, then the initialization is via a constructor call. The address of the third operand of the AGGR_INIT_EXPR, which is always a VAR_DECL, is taken, and this value replaces the first argument in the argument list. In this case, the value of the expression is the VAR_DECL given by the third operand to the AGGR_INIT_EXPR; constructors do not return a value.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by GCC Administrator on August, 28 2001 using texi2html

18.1 Deficiencies		Topics net yet covered in this document.
18.2 Overview		All about `tree`s.
18.3 Types		Fundamental and aggregate types.
18.4 Scopes		Namespaces and classes.
18.6 Functions		Overloading, function bodies, and linkage.
18.5 Declarations		Type declarations and variables.
18.7 Attributes in trees		Declaration and type attributes.
18.8 Expressions		From `typeid` to `throw`.

18.2.1 Trees		Macros and functions that can be used with all trees.
18.2.2 Identifiers		The names of things.
18.2.3 Containers		Lists and vectors.

18.4.1 Namespaces		Member functions, types, etc.
18.4.2 Classes		Members, bases, friends, etc.

18.6.1 Function Basics		Function names, linkage, and so forth.
18.6.2 Function Bodies		The statements that make up a function body.