Early Generation of Debug Information

The final goals of this project are

generate better debug information with LTO
make it possible to strip down trees in GIMPLE

With LTO the issue is that at the time we generate debug information the language frontends are no longer available as to generate language specific debug information (via debug langhooks) nor is all necessary information for elaborate debug information retained.

There is the project to use a different representation for what we use trees for on GIMPLE. With that we still need to emit proper debug information and thus you arrive at the same issues as we do for LTO as you need to transition information embedded in frontend dependent parts of trees to that new representation (see GCC Re-Architecture I + II BoF at the GNU Caulrdons 2013 and 2014).

Patch prototype against debug-early branch here: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00807.html

General non-LTO Overview

The main idea is as simple as to emit debug information for source entities (declarations and types) from the language frontends or rather very early in the compilation process. At this point all information is still there and frontends can be still queried.

This "early debug information" gets amended at the time we emit assembler for functions by information such as location information for the actual entities of the declarations.

LTO Overview

For LTO operation we need to be able to amend the debug information generated early during the compile phase at the ltrans phase where we emit assembler. This needs to track the association between decls and early debug DIEs throughout the LTO process so annotation can happen by refering to the early debug DIEs via abstract origins pointing to the early generated DIEs.

Overview of the compilation flow

For example

int main()
{
  int ret = 0;
  return ret;
}

would generate the following early debug information during compile phase

 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x0): GNU C
    <10>   DW_AT_language    : 1        (ANSI C)
    <11>   DW_AT_name        : t.c
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x56): /tmp
 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : (indirect string, offset: 0x74): main
    <32>   DW_AT_decl_file   : 1
    <33>   DW_AT_decl_line   : 1
    <34>   DW_AT_type        : <0x5d>
    <4a>   DW_AT_sibling     : <0x5d>
 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
    <4f>   DW_AT_name        : ret
    <53>   DW_AT_decl_file   : 1
    <54>   DW_AT_decl_line   : 3
    <55>   DW_AT_type        : <0x5d>
 <2><5c>: Abbrev Number: 0
 <1><5d>: Abbrev Number: 4 (DW_TAG_base_type)
    <5e>   DW_AT_byte_size   : 4
    <5f>   DW_AT_encoding    : 5        (signed)
    <60>   DW_AT_name        : int

and during ltrans phase we'd generate the following annotation

 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x3b): GNU GIMPLE
    <10>   DW_AT_language    : 1        (ANSI C)
    <19>   DW_AT_low_pc      : 0x0
    <21>   DW_AT_high_pc     : 0x10
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
           DW_AT_abstract_origin
    <38>   DW_AT_low_pc      : 0x0
    <40>   DW_AT_high_pc     : 0x10
    <48>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <4a>   DW_AT_GNU_all_call_sites: 1
    <4a>   DW_AT_sibling     : <0x5d>
 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
           DW_AT_abstract_origin
    <59>   DW_AT_location    : 2 byte block: 91 6c      (DW_OP_fbreg: -20)

where the DW_AT_abstract_origins should refer to the early debug DIEs for the subprogram and the variable. During the final link we have to link the early debug information object file and the LTRANS object which contains the actual code and the debug information annotating the early debug information into the final linker target.

LTO Details

Compile-time

write out early dwarf, latest at the start of ipa_write_summaries
- conveniently that computes DIE offsets (relative to DW_TAG_compile_unit)
- emit a global symbol before each DW_TAG_compile_unit we emit (must be unique, so eventually hash the dwarf tree in some way)
when writing LTO bytecode for any decl, call lookup_decl_die and stream its DW_TAG_compile_unit symbol and offset

WPA phase

nothing - the DW_TAG_compile_unit symbol and offset info transitions to LTRANS phase

LTRANS phase

when reading LTO bytecode for any decl we get DW_TAG_compile_unit symbol and offset
build up a shadow early debug dwarf tree from that info at this point by creating an empty DIE for each such decl with just a DW_AT_abstract_origin refering to the DIE via DW_FORM_ref_addr symbol + offset (causing a relocation to be emitted by the assembler), parent/child relationship is given by the decls DECL_CONTEXT
compile
dwarf2out_finish will amend the shadow early debug dwarf tree and emit it

LINK time

the linker will resolve the data relocations in the (merged) .debug_info sections when linking the LTRANS produced objects containing code and late debug info with the object files produced during compilation which contain the early debug info. Tools like dwz could optimize the result and "inline" the abstract origins into single-uses.

In theory all the above can also be done when not doing LTO. We'd emit assembler for the early dwarf together with (now local) symbols for the DW_TAG_compile_unit and strip the DIEs in the dwarf2out internal representation down, adding the DW_AT_abstract_origin like we'd do at LTRANS stage. dwarf2out finish will then emit a second "half" of the debug info. That means the assembler file would start with early debug .debug_info sections, then assembler for the .text, and then the late debug .debug_info sections.

Challenges for trunk adoption

With FAT LTO objects we need to put LTO early-debug in LTO specific sections (also with slim objects if we want to mark them as slim). This means that at WPA phase we need to strip that out and get it to the final link in some way. The current idea is to emit for early LTO debug .gnu.lto_ prefixed (plus hash suffixed) debug sections (.debug_info, .debug_abbrev, .debug_str, .debug_line). We'll also have rela.gnu.lto_.debug_info. More interestingly (and problematically) we have the single (or multiple for multiple translation units) hidden symbol that identifies the early debug piece for a TU. That symbol is in .symtab and thus visible to the FAT part (and in turn hard to split out). Especially that symbol and its handling might pose an issue so we probably have to delay creating it until WPA phase in some way (it always points to .debug_info section starts). Especially as simple-object doesn't seem to contain any symbol handling.

So I wonder if both GNU ld and gold handle all of the extraction and symbol adding via a linker script...