This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[lto] [RFC] Design proposal for debug support in LTO

LTO currently doesn't support the generation of debug info very well,
as we discard much of the front-end information that is needed for
debug info before streaming the IR to the intermediate file. I've
written up the following proposal to fix this, and have also posted it
on the gcc wiki:

A goal of this design is that it can ultimately enable the use of the
free_lang_specifics pass in all compilations, even when not doing LTO.


Debug Support for LTO


With Link-Time Optimization (LTO) enabled, the compiler stores an
intermediate representation (IR) of the code in the object file rather
than compiled object code (the LGEN phase). The IR is then combined
with IR from other object files at link time, and the actual
compilation then takes place (the LTRANS phase). This approach
naturally divides the compilation process between its front-end and
its back-end, and the design of the IR is such that it contains only
the language-independent information that would normally be needed by
the back-end.

In gcc, the symbolic debug information is generated from the tree
representation late in the compilation process, and it assumes that
all of the original information is still present in the trees. Much of
the information that has been discarded in the process of storing the
IR at the end of the LGEN phase and reading it back in at the
beginning of the LTRANS phase is needed in order to produce the
symbolic debug information, even though it is not otherwise needed by
the back-end of the compiler.

This proposal presents a design for augmenting the IR with the
additional information necessary for the generation of symbolic debug
information during the LTRANS phase.


One simple approach is to preserve all of the front-end information in
the IR when streaming it to the object files, so that it can all be
reconstructed for the LTRANS phase. This approach would significantly
increase the size of the IR, so for practical purposes, the compiler
would need to arrange to preserve the additional information only when
the -g option is used, and the IR for debug and non-debug compilations
would differ. This could result in subtle bugs or differences in code
generation. In addition, some of the front-end information is
language-specific, and since the LTRANS phase may be combining IR from
more than one language, langhooks still need to be removed from the

Another approach is to generate the debug information earlier -- in
the front-end. This approach would significantly alter the structure
of the compiler and would be a major undertaking. In addition, many
back-end transformations affect the debug information, so the back-end
would then need an infrastructure for decoding the debug information,
modifying it, then re-encoding it. Such an approach might be practical
for a single debug format, but in order to support the several formats
that gcc currently supports, it would also become a major undertaking.

An improvement would be to partition the debug information in such a
way that the information generated early is not subject to back-end
transformations, and that the information preserved for the LTRANS
phase is sufficient for generating the remainder of the debug
information. In a sense, the design presented here does this, but the
early generation does not commit to a specific debug format. Instead,
it stores the early information in a higher-level data structure that
is written separately to the object file.


Near the end of the LGEN phase of the compilation, gcc runs the
free_lang_specifics pass, which removes all of the information from
the trees that will not be needed by the LTRANS phase. During this
pass, just before discarding the information, if debug information has
been requested, we will call a new set of debug-related APIs to record
debug-related information that is about to be discarded.

The debug information for a given tree node will be stored in one of
two separate global hash tables (one for decls, one for types),
indexed by the UID, as a list of properties. Each separate fact that
we need to record will be stored as a property, represented as a (key,
value) pair. The property keys are simple small integers that identify
the kind of property being recorded (e.g., Context, Base Classes,
Member Methods, ...). The property values may be references to another
tree, a list of trees, or simple integer or string values.

When the IR is streamed out to the object file, we will stream the
contents of the debug hash table out to a new section, .gnu.lto_debug.
Some of the properties to be streamed out will refer to other trees,
some of which may not have been streamed out to the main part of the
IR. For references to trees that have already been streamed, we will
simply use the "pickle" that was already generated for those trees.
For references to trees that were not already streamed out, we will
stream those trees out to a second new section, .gnu.lto_debug_trees,
generating new "pickles" for each one.

When reading an IR file at the beginning of the LTRANS phase, if
generation of debug information has been requested, we will read in
the two debug streams and reconstruct the additional trees and the
debug hash table.

When generating debug information, each individual debug info
generator (starting with dwarf2out.c) will use the new debug-related
APIs to obtain any debug-related information that was discarded from
the trees.

The design is such that, eventually, gcc should be able to run the
free_lang_specifics pass for non-LTO compilations as well as LTO
compilations. The APIs for recording an retrieving debug-related
information are orthogonal to the process of streaming the IR to an
object file and reading it back in.

The design also allows the extra debug information to be easily
stripped from the object file, or the debug sections can simply be
ignored when reading an object file that was compiled with -g during
the LGEN phase, but without -g during the LTRANS phase.

APIs for Storing and Retrieving Debug Information

In the following APIs, TBL is one of the two global hash tables,
debug_decl_info or debug_type_info, UID is the identifier of the decl
or type node to which the information belongs, and KEY is the specific
property being stored or retrieved.

debug_store_tree_ref (tbl, uid, key, tree);
tree = debug_retrieve_tree_ref (tbl, uid, key);
  Store and retrieve a reference to a tree.

debug_store_tree_list (tbl, uid, key, VEC(tree)*);
VEC(tree)* debug_retrieve_tree_list (tbl, uid, key);
  Store and retrieve a list of references to a tree.

debug_store_intval (tbl, uid, key, val);
val = debug_retrieve_intval (tbl, uid, key);
  Store and retrieve an integer value.

debug_store_string (tbl, uid, key, val);
val = debug_retrieve_string (tbl, uid, key);
  Store and retrieve a string value. (Not currently needed.)

Property Keys for Decls

  Integer value for the size of a field (in bits or bytes,
  respectively). (If these properties aren't integer
  constants, they do not need to be stored at all, since the
  debug info cannot handle anything but integer constants.)

  Reference to a tree that provides the enclosing context
  for the decl.

  Reference to a list of trees that are declared as friends
  of a type. (Not currently needed for DWARF.)

Property Keys for Types

  Integer value identifying a record type as struct, class, or interface.

  Reference to a tree that contains array descriptor info.

  List of trees with default argument types.

  Integer value for the size of a field (in bits or bytes,
  respectively). (If this property isn't an integer
  constant, it does not need to be stored at all, since the
  debug info cannot handle anything but integer constants.)

  List of trees containing the members of a record type
  that are not FIELD_DECLs. (This may need to provide all
  fields in order to list the fields in the correct
  order. Not sure yet if non-FIELD_DECLs even need to be
  stored at all.)

  List of trees containing the method members of a record
  type. (Methods with DECL_ABSTRACT_ORIGIN set can be

  List of trees that refer to the base classes of a type.

  Integer values containing the max and min value,
  respectively, for a subrange type.

  Reference to a tree that provides the enclosing context
  for the type.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]