Debug Support for LTO
With Link-Time Optimization (LTO) enabled, the compiler stores an intermediate representation (IR) of the code in the object file rather than compiled object code (the LGEN phase). The IR is then combined with IR from other object files at link time, and the actual compilation then takes place (the LTRANS phase). This approach naturally divides the compilation process between its front-end and its back-end, and the design of the IR is such that it contains only the language-independent information that would normally be needed by the back-end.
In gcc, the symbolic debug information is generated from the tree representation late in the compilation process, and it assumes that all of the original information is still present in the trees. Much of the information that has been discarded in the process of storing the IR at the end of the LGEN phase and reading it back in at the beginning of the LTRANS phase is needed in order to produce the symbolic debug information, even though it is not otherwise needed by the back-end of the compiler.
This proposal presents a design for augmenting the IR with the additional information necessary for the generation of symbolic debug information during the LTRANS phase.
One simple approach is to preserve all of the front-end information in the IR when streaming it to the object files, so that it can all be reconstructed for the LTRANS phase. This approach would significantly increase the size of the IR, so for practical purposes, the compiler would need to arrange to preserve the additional information only when the -g option is used, and the IR for debug and non-debug compilations would differ. This could result in subtle bugs or differences in code generation. In addition, some of the front-end information is language-specific, and since the LTRANS phase may be combining IR from more than one language, langhooks still need to be removed from the back-end.
Another approach is to generate the debug information earlier -- in the front-end. This approach would significantly alter the structure of the compiler and would be a major undertaking. In addition, many back-end transformations affect the debug information, so the back-end would then need an infrastructure for decoding the debug information, modifying it, then re-encoding it. Such an approach might be practical for a single debug format, but in order to support the several formats that gcc currently supports, it would also become a major undertaking.
An improvement would be to partition the debug information in such a way that the information generated early is not subject to back-end transformations, and that the information preserved for the LTRANS phase is sufficient for generating the remainder of the debug information. In a sense, the design presented here does this, but the early generation does not commit to a specific debug format. Instead, it stores the early information in a higher-level data structure that is written separately to the object file.
Near the end of the LGEN phase of the compilation, gcc runs the free_lang_specifics pass, which removes all of the information from the trees that will not be needed by the LTRANS phase. During this pass, just before discarding the information, if debug information has been requested, we will call a new set of debug-related APIs to record debug-related information that is about to be discarded.
The debug information for a given tree node will be stored in one of two separate global hash tables (one for decls, one for types), indexed by the UID, as a list of properties. Each separate fact that we need to record will be stored as a property, represented as a (key, value) pair. The property keys are simple small integers that identify the kind of property being recorded (e.g., Context, Base Classes, Member Methods, ...). The property values may be references to another tree, a list of trees, or simple integer or string values.
When the IR is streamed out to the object file, we will stream the contents of the debug hash table out to a new section, .gnu.lto_debug. Some of the properties to be streamed out will refer to other trees, some of which may not have been streamed out to the main part of the IR. For references to trees that have already been streamed, we will simply use the "pickle" that was already generated for those trees. For references to trees that were not already streamed out, we will stream those trees out to a second new section, .gnu.lto_debug_trees, generating new "pickles" for each one.
When reading an IR file at the beginning of the LTRANS phase, if generation of debug information has been requested, we will read in the two debug streams and reconstruct the additional trees and the debug hash table.
When generating debug information, each individual debug info generator (starting with dwarf2out.c) will use the new debug-related APIs to obtain any debug-related information that was discarded from the trees.
The design is such that, eventually, gcc should be able to run the free_lang_specifics pass for non-LTO compilations as well as LTO compilations. The APIs for recording an retrieving debug-related information are orthogonal to the process of streaming the IR to an object file and reading it back in.
The design also allows the extra debug information to be easily stripped from the object file, or the debug sections can simply be ignored when reading an object file that was compiled with -g during the LGEN phase, but without -g during the LTRANS phase.
APIs for Storing and Retrieving Debug Information
In the following APIs, TBL is one of the two global hash tables, debug_decl_info or debug_type_info, UID is the identifier of the decl or type node to which the information belongs, and KEY is the specific property being stored or retrieved.
debug_store_tree_ref (tbl, uid, key, tree); tree = debug_retrieve_tree_ref (tbl, uid, key);
- Store and retrieve a reference to a tree.
debug_store_tree_list (tbl, uid, key, VEC(tree)*); VEC(tree)* debug_retrieve_tree_list (tbl, uid, key);
- Store and retrieve a list of references to a tree.
debug_store_intval (tbl, uid, key, val); val = debug_retrieve_intval (tbl, uid, key);
- Store and retrieve an integer value.
debug_store_string (tbl, uid, key, val); val = debug_retrieve_string (tbl, uid, key);
- Store and retrieve a string value. (Not currently needed.)
Property Keys for Decls
- Integer value for the size of a field (in bits or bytes, respectively). (If these properties aren't integer constants, they do not need to be stored at all, since the debug info cannot handle anything but integer constants.)
- Reference to a tree that provides the enclosing context for the decl.
- Reference to a list of trees that are declared as friends of a type. (Not currently needed for DWARF.)
Property Keys for Types
- Integer value identifying a record type as struct, class, or interface.
- Reference to a tree that contains array descriptor info.
- List of trees with default argument types.
- Integer value for the size of a field (in bits or bytes, respectively). (If this property isn't an integer constant, it does not need to be stored at all, since the debug info cannot handle anything but integer constants.)
List of trees containing the members of a record type that are not FIELD_DECLs. (This may need to provide all fields in order to list the fields in the correct order. Not sure yet if non-FIELD_DECLs even need to be stored at all.)
List of trees containing the method members of a record type. (Methods with DECL_ABSTRACT_ORIGIN set can be excluded.)
- List of trees that refer to the base classes of a type.
- Integer values containing the max and min value, respectively, for a subrange type.
- Reference to a tree that provides the enclosing context for the type.