Node: Expression trees, Previous: Attributes, Up: Trees


The internal representation for expressions is for the most part quite straightforward. However, there are a few facts that one must bear in mind. In particular, the expression "tree" is actually a directed acyclic graph. (For example there may be many references to the integer constant zero throughout the source program; many of these will be represented by the same expression node.) You should not rely on certain kinds of node being shared, nor should rely on certain kinds of nodes being unshared.

The following macros can be used with all expression nodes:

Returns the type of the expression. This value may not be precisely the same type that would be given the expression in the original program.

In what follows, some nodes that one might expect to always have type bool are documented to have either integral or boolean type. At some point in the future, the C front end may also make use of this same intermediate representation, and at this point these nodes will certainly have integral type. The previous sentence is not meant to imply that the C++ front end does not or will not give these nodes integral type.

Below, we list the various kinds of expression nodes. Except where noted otherwise, the operands to an expression are accessed using the TREE_OPERAND macro. For example, to access the first operand to a binary plus expression expr, use:

     TREE_OPERAND (expr, 0)

As this example indicates, the operands are zero-indexed.

The table below begins with constants, moves on to unary expressions, then proceeds to binary expressions, and concludes with various other kinds of expressions:

These nodes represent integer constants. Note that the type of these constants is obtained with TREE_TYPE; they are not always of type int. In particular, char constants are represented with INTEGER_CST nodes. The value of the integer constant e is given by
          + TREE_INST_CST_LOW (e))

HOST_BITS_PER_WIDE_INT is at least thirty-two on all platforms. Both TREE_INT_CST_HIGH and TREE_INT_CST_LOW return a HOST_WIDE_INT. The value of an INTEGER_CST is interpreted as a signed or unsigned quantity depending on the type of the constant. In general, the expression given above will overflow, so it should not be used to calculate the value of the constant.

The variable integer_zero_node is an integer constant with value zero. Similarly, integer_one_node is an integer constant with value one. The size_zero_node and size_one_node variables are analogous, but have type size_t rather than int.

The function tree_int_cst_lt is a predicate which holds if its first argument is less than its second. Both constants are assumed to have the same signedness (i.e., either both should be signed or both should be unsigned.) The full width of the constant is used when doing the comparison; the usual rules about promotions and conversions are ignored. Similarly, tree_int_cst_equal holds if the two constants are equal. The tree_int_cst_sgn function returns the sign of a constant. The value is 1, 0, or -1 according on whether the constant is greater than, equal to, or less than zero. Again, the signedness of the constant's type is taken into account; an unsigned constant is never less than zero, no matter what its bit-pattern.

FIXME: Talk about how to obtain representations of this constant, do comparisons, and so forth.
These nodes are used to represent complex number constants, that is a __complex__ whose parts are constant nodes. The TREE_REALPART and TREE_IMAGPART return the real and the imaginary parts respectively.
These nodes are used to represent vector constants, whose parts are constant nodes. Each individual constant node is either an integer or a double constant node. The first operand is a TREE_LIST of the constant nodes and is accessed through TREE_VECTOR_CST_ELTS.
These nodes represent string-constants. The TREE_STRING_LENGTH returns the length of the string, as an int. The TREE_STRING_POINTER is a char* containing the string itself. The string may not be NUL-terminated, and it may contain embedded NUL characters. Therefore, the TREE_STRING_LENGTH includes the trailing NUL if it is present.

For wide string constants, the TREE_STRING_LENGTH is the number of bytes in the string, and the TREE_STRING_POINTER points to an array of the bytes of the string, as represented on the target system (that is, as integers in the target endianness). Wide and non-wide string constants are distinguished only by the TREE_TYPE of the STRING_CST.

FIXME: The formats of string constants are not well-defined when the target system bytes are not the same width as host system bytes.

These nodes are used to represent pointer-to-member constants. The PTRMEM_CST_CLASS is the class type (either a RECORD_TYPE or UNION_TYPE within which the pointer points), and the PTRMEM_CST_MEMBER is the declaration for the pointed to object. Note that the DECL_CONTEXT for the PTRMEM_CST_MEMBER is in general different from the PTRMEM_CST_CLASS. For example, given:
          struct B { int i; };
          struct D : public B {};
          int D::*dp = &D::i;

The PTRMEM_CST_CLASS for &D::i is D, even though the DECL_CONTEXT for the PTRMEM_CST_MEMBER is B, since B::i is a member of B, not D.

These nodes represent variables, including static data members. For more information, see Declarations.
These nodes represent unary negation of the single operand, for both integer and floating-point types. The type of negation can be determined by looking at the type of the expression.
These nodes represent bitwise complement, and will always have integral type. The only operand is the value to be complemented.
These nodes represent logical negation, and will always have integral (or boolean) type. The operand is the value being negated.
These nodes represent increment and decrement expressions. The value of the single operand is computed, and the operand incremented or decremented. In the case of PREDECREMENT_EXPR and PREINCREMENT_EXPR, the value of the expression is the value resulting after the increment or decrement; in the case of POSTDECREMENT_EXPR and POSTINCREMENT_EXPR is the value before the increment or decrement occurs. The type of the operand, like that of the result, will be either integral, boolean, or floating-point.
These nodes are used to represent the address of an object. (These expressions will always have pointer or reference type.) The operand may be another expression, or it may be a declaration.

As an extension, GCC allows users to take the address of a label. In this case, the operand of the ADDR_EXPR will be a LABEL_DECL. The type of such an expression is void*.

If the object addressed is not an lvalue, a temporary is created, and the address of the temporary is used.

These nodes are used to represent the object pointed to by a pointer. The operand is the pointer being dereferenced; it will always have pointer or reference type.
These nodes represent conversion of a floating-point value to an integer. The single operand will have a floating-point type, while the the complete expression will have an integral (or boolean) type. The operand is rounded towards zero.
These nodes represent conversion of an integral (or boolean) value to a floating-point value. The single operand will have integral type, while the complete expression will have a floating-point type.

FIXME: How is the operand supposed to be rounded? Is this dependent on -mieee?

These nodes are used to represent complex numbers constructed from two expressions of the same (integer or real) type. The first operand is the real part and the second operand is the imaginary part.
These nodes represent the conjugate of their operand.

These nodes represent respectively the real and the imaginary parts of complex numbers (their sole argument).
These nodes indicate that their one and only operand is not an lvalue. A back end can treat these identically to the single operand.
These nodes are used to represent conversions that do not require any code-generation. For example, conversion of a char* to an int* does not require any code be generated; such a conversion is represented by a NOP_EXPR. The single operand is the expression to be converted. The conversion from a pointer to a reference is also represented with a NOP_EXPR.
These nodes are similar to NOP_EXPRs, but are used in those situations where code may need to be generated. For example, if an int* is converted to an int code may need to be generated on some platforms. These nodes are never used for C++-specific conversions, like conversions between pointers to different classes in an inheritance hierarchy. Any adjustments that need to be made in such cases are always indicated explicitly. Similarly, a user-defined conversion is never represented by a CONVERT_EXPR; instead, the function calls are made explicit.
These nodes represent throw expressions. The single operand is an expression for the code that should be executed to throw the exception. However, there is one implicit action not represented in that expression; namely the call to __throw. This function takes no arguments. If setjmp/longjmp exceptions are used, the function __sjthrow is called instead. The normal GCC back end uses the function emit_throw to generate this code; you can examine this function to see what needs to be done.
These nodes represent left and right shifts, respectively. The first operand is the value to shift; it will always be of integral type. The second operand is an expression for the number of bits by which to shift. Right shift should be treated as arithmetic, i.e., the high-order bits should be zero-filled when the expression has unsigned type and filled with the sign bit when the expression has signed type. Note that the result is undefined if the second operand is larger than the first operand's type size.
These nodes represent bitwise inclusive or, bitwise exclusive or, and bitwise and, respectively. Both operands will always have integral type.
These nodes represent logical and and logical or, respectively. These operators are not strict; i.e., the second operand is evaluated only if the value of the expression is not determined by evaluation of the first operand. The type of the operands, and the result type, is always of boolean or integral type.
These nodes represent logical and, logical or, and logical exclusive or. They are strict; both arguments are always evaluated. There are no corresponding operators in C or C++, but the front end will sometimes generate these expressions anyhow, if it can tell that strictness does not matter.
These nodes represent various binary arithmetic operations. Respectively, these operations are addition, subtraction (of the second operand from the first), multiplication, integer division, integer remainder, and floating-point division. The operands to the first three of these may have either integral or floating type, but there will never be case in which one operand is of floating type and the other is of integral type.

The result of a TRUNC_DIV_EXPR is always rounded towards zero. The TRUNC_MOD_EXPR of two operands a and b is always a - a/b where the division is as if computed by a TRUNC_DIV_EXPR.

These nodes represent array accesses. The first operand is the array; the second is the index. To calculate the address of the memory accessed, you must scale the index by the size of the type of the array elements. The type of these expressions must be the type of a component of the array.
These nodes represent access to a range (or "slice") of an array. The operands are the same as that for ARRAY_REF and have the same meanings. The type of these expressions must be an array whose component type is the same as that of the first operand. The range of that array type determines the amount of data these expressions access.
These nodes represent the less than, less than or equal to, greater than, greater than or equal to, equal, and not equal comparison operators. The first and second operand with either be both of integral type or both of floating type. The result type of these expressions will always be of integral or boolean type.
These nodes represent assignment. The left-hand side is the first operand; the right-hand side is the second operand. The left-hand side will be a VAR_DECL, INDIRECT_REF, COMPONENT_REF, or other lvalue.

These nodes are used to represent not only assignment with = but also compound assignments (like +=), by reduction to = assignment. In other words, the representation for i += 3 looks just like that for i = i + 3.

These nodes are just like MODIFY_EXPR, but are used only when a variable is initialized, rather than assigned to subsequently.
These nodes represent non-static data member accesses. The first operand is the object (rather than a pointer to it); the second operand is the FIELD_DECL for the data member.
These nodes represent comma-expressions. The first operand is an expression whose value is computed and thrown away prior to the evaluation of the second operand. The value of the entire expression is the value of the second operand.
These nodes represent ?: expressions. The first operand is of boolean or integral type. If it evaluates to a nonzero value, the second operand should be evaluated, and returned as the value of the expression. Otherwise, the third operand is evaluated, and returned as the value of the expression. As a GNU extension, the middle operand of the ?: operator may be omitted in the source, like this:
          x ? : 3

which is equivalent to

          x ? x : 3

assuming that x is an expression without side-effects. However, in the case that the first operation causes side effects, the side-effects occur only once. Consumers of the internal representation do not need to worry about this oddity; the second operand will be always be present in the internal representation.

These nodes are used to represent calls to functions, including non-static member functions. The first operand is a pointer to the function to call; it is always an expression whose type is a POINTER_TYPE. The second argument is a TREE_LIST. The arguments to the call appear left-to-right in the list. The TREE_VALUE of each list node contains the expression corresponding to that argument. (The value of TREE_PURPOSE for these nodes is unspecified, and should be ignored.) For non-static member functions, there will be an operand corresponding to the this pointer. There will always be expressions corresponding to all of the arguments, even if the function is declared with default arguments and some arguments are not explicitly provided at the call sites.
These nodes are used to represent GCC's statement-expression extension. The statement-expression extension allows code like this:
          int f() { return ({ int j; j = 3; j + 7; }); }
In other words, an sequence of statements may occur where a single expression would normally appear. The STMT_EXPR node represents such an expression. The STMT_EXPR_STMT gives the statement contained in the expression; this is always a COMPOUND_STMT. The value of the expression is the value of the last sub-statement in the COMPOUND_STMT. More precisely, the value is the value computed by the last EXPR_STMT in the outermost scope of the COMPOUND_STMT. For example, in:
          ({ 3; })
the value is 3 while in:
          ({ if (x) { 3; } })
(represented by a nested COMPOUND_STMT), there is no value. If the STMT_EXPR does not yield a value, it's type will be void.
These nodes represent local blocks. The first operand is a list of temporary variables, connected via their TREE_CHAIN field. These will never require cleanups. The scope of these variables is just the body of the BIND_EXPR. The body of the BIND_EXPR is the second operand.
These nodes represent "infinite" loops. The LOOP_EXPR_BODY represents the body of the loop. It should be executed forever, unless an EXIT_EXPR is encountered.
These nodes represent conditional exits from the nearest enclosing LOOP_EXPR. The single operand is the condition; if it is nonzero, then the loop should be exited. An EXIT_EXPR will only appear within a LOOP_EXPR.
These nodes represent full-expressions. The single operand is an expression to evaluate. Any destructor calls engendered by the creation of temporaries during the evaluation of that expression should be performed immediately after the expression is evaluated.
These nodes represent the brace-enclosed initializers for a structure or array. The first operand is reserved for use by the back end. The second operand is a TREE_LIST. If the TREE_TYPE of the CONSTRUCTOR is a RECORD_TYPE or UNION_TYPE, then the TREE_PURPOSE of each node in the TREE_LIST will be a FIELD_DECL and the TREE_VALUE of each node will be the expression used to initialize that field. You should not depend on the fields appearing in any particular order, nor should you assume that all fields will be represented. Unrepresented fields may be assigned any value.

If the TREE_TYPE of the CONSTRUCTOR is an ARRAY_TYPE, then the TREE_PURPOSE of each element in the TREE_LIST will be an INTEGER_CST. This constant indicates which element of the array (indexed from zero) is being assigned to; again, the TREE_VALUE is the corresponding initializer. If the TREE_PURPOSE is NULL_TREE, then the initializer is for the next available array element.

Conceptually, before any initialization is done, the entire area of storage is initialized to zero.

These nodes represent ISO C99 compound literals. The COMPOUND_LITERAL_EXPR_DECL_STMT is a DECL_STMT containing an anonymous VAR_DECL for the unnamed object represented by the compound literal; the DECL_INITIAL of that VAR_DECL is a CONSTRUCTOR representing the brace-enclosed list of initializers in the compound literal. That anonymous VAR_DECL can also be accessed directly by the COMPOUND_LITERAL_EXPR_DECL macro.
A SAVE_EXPR represents an expression (possibly involving side-effects) that is used more than once. The side-effects should occur only the first time the expression is evaluated. Subsequent uses should just reuse the computed value. The first operand to the SAVE_EXPR is the expression to evaluate. The side-effects should be executed where the SAVE_EXPR is first encountered in a depth-first preorder traversal of the expression tree.
A TARGET_EXPR represents a temporary object. The first operand is a VAR_DECL for the temporary variable. The second operand is the initializer for the temporary. The initializer is evaluated, and copied (bitwise) into the temporary.

Often, a TARGET_EXPR occurs on the right-hand side of an assignment, or as the second operand to a comma-expression which is itself the right-hand side of an assignment, etc. In this case, we say that the TARGET_EXPR is "normal"; otherwise, we say it is "orphaned". For a normal TARGET_EXPR the temporary variable should be treated as an alias for the left-hand side of the assignment, rather than as a new temporary variable.

The third operand to the TARGET_EXPR, if present, is a cleanup-expression (i.e., destructor call) for the temporary. If this expression is orphaned, then this expression must be executed when the statement containing this expression is complete. These cleanups must always be executed in the order opposite to that in which they were encountered. Note that if a temporary is created on one branch of a conditional operator (i.e., in the second or third operand to a COND_EXPR), the cleanup must be run only if that branch is actually executed.

See STMT_IS_FULL_EXPR_P for more information about running these cleanups.

An AGGR_INIT_EXPR represents the initialization as the return value of a function call, or as the result of a constructor. An AGGR_INIT_EXPR will only appear as the second operand of a TARGET_EXPR. The first operand to the AGGR_INIT_EXPR is the address of a function to call, just as in a CALL_EXPR. The second operand are the arguments to pass that function, as a TREE_LIST, again in a manner similar to that of a CALL_EXPR. The value of the expression is that returned by the function.

If AGGR_INIT_VIA_CTOR_P holds of the AGGR_INIT_EXPR, then the initialization is via a constructor call. The address of the third operand of the AGGR_INIT_EXPR, which is always a VAR_DECL, is taken, and this value replaces the first argument in the argument list. In this case, the value of the expression is the VAR_DECL given by the third operand to the AGGR_INIT_EXPR; constructors do not return a value.

A VTABLE_REF indicates that the interior expression computes a value that is a vtable entry. It is used with -fvtable-gc to track the reference through to front end to the middle end, at which point we transform this to a REG_VTABLE_REF note, which survives the balance of code generation.

The first operand is the expression that computes the vtable reference. The second operand is the VAR_DECL of the vtable. The third operand is an INTEGER_CST of the byte offset into the vtable.