= Simplifying the generation of GIMPLE IL = Original proposal and discussion: http://gcc.gnu.org/ml/gcc/2012-11/msg00223.html <> == Problem == Generating GIMPLE and tree expressions require lots of detail, which is hard to remember and easy to get wrong. There is some amount of boilerplate code that can, in most cases, be reduced and managed automatically. We will add a set of helper classes to be used as local variables to manage the details of handling the existing types. That is, a layer over {{{gimple_build_*}}}. We intend to provide helpers for those facilities that are both commonly used and have room for significant simplification. == Generating an Expression == Suppose one wants to generate the expression {{{(shadow != 0) & (((base_addr & 7) + offset) >= shadow)}}}, where offset is a value and the other identifiers are variables. The current code to generate this expression is as follows: {{{ /* t = shadow != 0 */ g = gimple_build_assign_with_ops (NE_EXPR, make_ssa_name (boolean_type_node, NULL), shadow, build_int_cst (shadow_type, 0)); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); t = gimple_assign_lhs (g); /* a = base_addr & 7 */ g = gimple_build_assign_with_ops (BIT_AND_EXPR, make_ssa_name (uintptr_type, NULL), base_addr, build_int_cst (uintptr_type, 7)); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); /* b = (shadow_type)a */ g = gimple_build_assign_with_ops (NOP_EXPR, make_ssa_name (shadow_type, NULL), gimple_assign_lhs (g), NULL_TREE); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); /* c = b + offset */ g = gimple_build_assign_with_ops (PLUS_EXPR, make_ssa_name (shadow_type, NULL), gimple_assign_lhs (g), build_int_cst (shadow_type, offset)); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); /* d = c >= shadow */ g = gimple_build_assign_with_ops (GE_EXPR, make_ssa_name (boolean_type_node, NULL), gimple_assign_lhs (g), shadow); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); /* r = t & d */ g = gimple_build_assign_with_ops (BIT_AND_EXPR, make_ssa_name (boolean_type_node, NULL), t, gimple_assign_lhs (g)); gimple_set_location (g, location); gsi_insert_after (&gsi, g, GSI_NEW_STMT); r = gimple_assign_lhs (g); }}} We propose a simplified form using a new build helper class {{{gimple_stmt}}} and an extension to {{{gimple_seq}}} that would allow the above code to be written as follows: {{{ gimple_seq q; gimple_stmt t = q.add_ssa_stmt (NE_EXPR, shadow, 0); gimple_stmt a = q.add_ssa_stmt (BIT_AND_EXPR, base_addr, 7); gimple_stmt b = q.add_ssa_stmt (NOP_EXPR, shadow_type, a); gimple_stmt c = q.add_ssa_stmt (PLUS_EXPR, b, offset); gimple_stmt d = q.add_ssa_stmt (GE_EXPR, c, shadow); gimple_stmt e = q.add_ssa_stmt (BIT_AND_EXPR, t, d); q.set_location (location); r = e.lhs (); }}} There are a few important things to note about this example. * We extend the type {{{gimple_seq}}} to know how to sequence statements automatically and can build expressions out of types. * Every statement created produces an SSA name, if generated with {{{gimple_seq::add_ssa_stmt}}}. Referencing a {{{gimple_stmt}}} instance in an argument to {{{gimple_seq::add_ssa_stmt}}} retrieves the SSA name generated by that statement. Likewise, if the statement was generated with {{{gimple_seq::gimple_stmt}}}, the new statement will produce a GIMPLE temporary as its LHS. * The statement result type is that of the arguments. * The type of integral constant arguments is that of the other argument. (Which implies that only one argument can be constant.) * The {{{gimple_seq::add_ssa_stmt}}} method handles linking the statement into the sequence. * The {{{gimple_seq::set_location}}} method iterates over all statements. There will be another class of builders for generating GIMPLE in normal form ({{{gimple_seq::gimple_stmt}}}). We expect that this will mostly affect all transformations that need to generate new expressions and statements, for example: * The vectorizer passes that generate SIMD expressions and statements with semantics equivalent to the scalar instructions that are being vectorized. * Expression lowering passes, e.g. the {{{pass_convert_switch}}} pass that lowers {{{GIMPLE_SWITCH}}} statements to sequences of branch-and-compare, and the {{{pass_lower_complex*}}} passes that lower {{{complex}}} expressions to scalar expressions. * Instrumentation passes, e.g. the counters instrumentation generated by {{{pass_profile}}}, and the {{{pass_asan}}} and {{{pass_tsan}}} sanitizer instrumentation passes. * Complex code transformations that rewrite into and out of different intermediate representations, e.g. GRAPHITE. We also expect to reduce calls to tree expression builders by allowing the use of numeric and string constants to be converted to the appropriate {{{tree *_CST}}} node. This will only work when the type of the constant can be deduced from the other argument in some expressions, of course. In addition to the new builder methods, we will provide access to all the {{{gimple_*}}} functions from the {{{gimple_seq}}} class. This will facilitate turning {{{gimple}}} into a proper class hierarchy. == Generating statements that affect control flow == Statements that affect the CFG (e.g., conditional branches) require accompanying code that creates new basic blocks, edges, etc. We will provide helpers that will automatically take care of the boilerplate code. For example, if we wanted to generate the following code: {{{ if (cond) { s1; s2; } else { r1; r2; } }}} Currently, the user needs to create the basic blocks needed to hold the predicate and the two branches of the above conditional. We could shorten that with: {{{ gimple_seq then_body(s1); then_body.add_ssa_stmt(s2); gimple_seq else_body(r1); else_body.add_ssa_stmt(r2); gimple_stmt if_then_else(cond, then_body, else_body); }}} Inserting {{{if_then_else}}} in a basic block or an edge would then split the container and create the corresponding connections. The interface would also allow the user to specify branch probabilities. == Generating a Type == Consider the generation of the following type: {{{ struct __asan_global { const_ptr_type_node __beg; inttype __size; inttype __size_with_redzone; const_ptr_type_node __name; inttype __has_dynamic_init; }; }}} The current code to generate it is as follows: {{{ tree inttype = build_nonstandard_integer_type (POINTER_SIZE, 1); tree ret = make_node (RECORD_TYPE); TYPE_NAME (ret) = get_identifier ("__asan_global"); tree beg = build_decl (UNKNOWN_LOCATION, FIELD_DECL, get_identifier ("__beg"), const_ptr_type_node); DECL_CONTEXT (beg) = ret; TYPE_FIELDS (ret) = beg; tree size = build_decl (UNKNOWN_LOCATION, FIELD_DECL, get_identifier ("__size"), inttype); DECL_CONTEXT (size) = ret; DECL_CHAIN (beg) = size; tree red = build_decl (UNKNOWN_LOCATION, FIELD_DECL, get_identifier ("__size_with_redzone"), inttype); DECL_CONTEXT (red) = ret; DECL_CHAIN (size) = red; tree name = build_decl (UNKNOWN_LOCATION, FIELD_DECL, get_identifier ("__name"), const_ptr_type_node); DECL_CONTEXT (name) = ret; DECL_CHAIN (red) = name; tree init = build_decl (UNKNOWN_LOCATION, FIELD_DECL, get_identifier ("__has_dynamic_init"), inttype); DECL_CONTEXT (init) = ret; DECL_CHAIN (name) = init; layout_type (ret); }}} We propose a form as follows: {{{ tree inttype = build_nonstandard_integer_type (POINTER_SIZE, 1); record_builder rec ("__asan_global"); rec.field ("__beg", const_ptr_type_node); rec.field ("__size", inttype); rec.field ("__size_with_redzone", inttype); rec.field ("__name", const_ptr_type_node); rec.field ("__has_dynamic_init", inttype); rec.finish (); tree ret = rec.as_tree (); }}} There are a few important things to note about this example. * The {{{record_builder::field}}} method will add context and chain links. * The {{{record_builder::field}}} method is overloaded on both strings and identifiers. * The {{{record_builder::finish}}} method lays out the type. Optionally this method could take flags to direct the lay-out of the fields, e.g. to sort the fields by alignment requirements. We expect that this interface can be used by transformations that need to generate new types. These transformations currently include at least: * The instrumentation passes mentioned above (ASAN/TSAN/coverage/profiling). * Lowering of {{{setjmp/longjmp}}} exception handing in {{{gcc/except.c:init_eh()}}}. * Lowering of nested functions in {{{gcc/tree-nested.c}}}. * Lowering of TLS variables for emulated TLS (e.g. {{{gcc/tree-emutls.c:default_emutls_var_fields()}}} and other implementations of the {{{targetm.emutls.var_fields}}} hook). == Proposal == Create a set of IL builder classes that provide a simplified IL building interface. Essentially, these builder classes will abstract most of the bookkeeping code required by the current interfaces. These classes will not replace the existing interfaces. We do not expect that all the IL generation done in current transformations will be able to use the simplified interfaces. The goal is to simplify most of them, however.