DWARF Package File Format
This document describes the file format for DWARF package files, version 2. A DWARF package file is an ELF-format file that collects the contents from the separate DWARF object (.dwo) files produced during the compilation of an application. DWARF object (.dwo) files are produced when compiling with GCC's -gsplit-dwarf option (see DWARF Extensions for Separate Debug Info Files).
DWARF package files can be built with the dwp utility, which is built along with the gold linker in binutils. Its usage is:
Usage: dwp [options] [file...] -h, --help Print this help message -e EXE, --exec EXE Get list of dwo files from EXE (defaults output to EXE.dwp) -o FILE, --output FILE Set output dwp file name -v, --verbose Verbose output -V, --version Print version number
If given a list of .dwo files, it will build a package file with the name given by the -o option. If given an executable or shared library with the -e option, it will read the executable to obtain the list of .dwo files, and build a package file with the extension .dwp appended to the name of the executable file (unless the -o option is also given).
Differences from Version 1
Version 1 was an experimental version that is no longer supported.
In the first version of this format, each compilation unit has a separate set of sections for .debug_info and associated sections. In version 2, the .debug_info sections from each compilation unit are combined into a single .debug_info section in the .dwp file. Likewise, the .debug_types, .debug_abbrev, .debug_line, .debug_loc, .debug_str_offsets, .debug_macinfo, and .debug_macro sections are combined, and the CU and TU index sections now record the various offsets into those sections where the contributions for each CU or TU begin.
Version 2 adds additional fast lookup tables to the package file. These tables are described in New DWARF Fast Lookup Tables.
Version 2 also adds the concept of a “thin” package file, which contains an index of debug information without copying the actual information from the .dwo files.
Design Goals
The design of the DWARF package (.dwp) file is guided by the following goals:
It must provide quick and efficient access to individual compilation units, keyed by a compilation unit signature, which is a 64-bit signature that uniquely identifies a DWARF compilation unit. The signature is given by the DW_AT_GNU_dwo_id attribute in a skeleton compilation unit in the application binary.
Likewise, it must provide quick and efficient access to individual type units, keyed by a type signature, which is also a 64-bit signature that uniquely identifies the debug information for a type definition. The type signature is typically obtained from the DW_AT_type attribute in variable and subprogram DIEs, and may also come from a skeleton type unit in the application binary.
- It must allow for the removal of duplicate type units. Each .dwo file will contain a set of type units defining types that are referenced by that compilation unit. Many such type units will appear in more than on .dwo file, and the .dwp file should contain only one copy of each.
It must allow for the removal of duplicate strings. Each .dwo file will contain a .debug_str.dwo section with all the strings referenced by that unit. The .dwp file should contain a single combined string table where all duplicate strings have been coalesced.
- The package format must be re-combinable; that is, it must be possible to combine one set of .dwo files into a .dwp file, combine another set of .dwo files into another .dwp file, then combine the two .dwp files into a new .dwp file that is equivalent to one created in one step.
High-Level File Structure
The .dwp file is an ELF-format file, using the same byte order and size as the corresponding application binary. It consists only of a file header, section table, a number of DWARF debug information sections, two index sections, and (optionally) the new fast lookup table sections.
Each .dwp file will contain no more than one of each of the following sections:
.debug_info.dwo .debug_types.dwo .debug_abbrev.dwo .debug_line.dwo .debug_loc.dwo .debug_str_offsets.dwo .debug_str.dwo .debug_macinfo.dwo .debug_macro.dwo
The string table section in .debug_str.dwo contains all the strings referenced from DWARF attributes using the form DW_FORM_str_index. Any attribute in a compilation unit or a type unit using this form will refer to an entry in that unit's contribution to the .debug_str_offsets.dwo section, which in turn will provide the offset of a string in the .debug_str.dwo section.
For the purposes of recording offsets to DIEs in a compilation unit or in a type unit, we define a virtual debug address space consisting of the .debug_info.dwo section combined with the .debug_types.dwo section. The .debug_info.dwo section begins at offset 0 within this address space, and the .debug_types.dwo section begins immediately following the end of the .debug_info.dwo section. In the fast lookup table sections, references to a CU, TU, or a DIE within a CU or TU, will use offsets within the virtual debug address space.
Package files may be “thin” or self-contained. A thin package file contains index information only, and refers to the debug information in the original .dwo files. A self-contained file contains index information as well as all the debugging information. A self-contained file must contain a .debug_info.dwo and .debug_abbrev.dwo section; all others are present only if needed.
The CU Index Section
The first index section is a compilation unit index that maps a compilation unit signature to the offset of a CU within the virtual debug address space, and to a set of offsets into the various other debug information sections. This section is named .debug_cu_index.
Each compilation unit set must contain a contribution from each of the following sections:
.debug_info.dwo .debug_abbrev.dwo
Each compilation unit set may also contain a contribution from each of the following sections:
.debug_line.dwo .debug_loc.dwo .debug_str_offsets.dwo .debug_macinfo.dwo .debug_macro.dwo
(Note that a set should not contain both .debug_macinfo.dwo and .debug_macro.dwo. The latter is an extension that is intended to replace the former in a future version of DWARF.)
The TU Index Section
The second index section is a type unit index that maps a type signature to the offset of a TU within the virtual debug address space, and to a set of offsets into the various other debug information sections. This section is named .debug_tu_index.
Each type unit set must contain a contribution from each of the following sections:
.debug_types.dwo .debug_abbrev.dwo
Each type unit set may also contain a contribution from each of the following sections:
.debug_line.dwo .debug_str_offsets.dwo
The Fast Lookup Table Sections
The following fast lookup table sections may also be present in the .dwp file:
.debug_names .debug_typenames .debug_namespaces
The index entries in these sections refer to DIEs in the .debug_info or .debug_types sections by their offset in virtual debug address space.
The fast lookup table sections are a separate DWARF proposal, currently still under development.
The File Directory Section
A thin package file requires one additional section that provides a directory of the referenced .dwo files.
Format of the CU and TU Index Sections
Both index sections have the same format, and serve to map a 64-bit signature to a set of contributions to the debug sections. Each section begins with a header, followed by a hash table of signatures, a parallel table of indexes, a table of offsets, and a table of sizes. The index sections will be aligned at 8-byte boundaries in the file.
The index section header contains four unsigned 32-bit values (using the byte order of the application binary):
- The version number of the format of this index (currently 2)
- L, the number of columns in the table of section offsets
- N, the number of compilation units or type units in the index
- M, the number of slots in the hash table
(We assume that N and M will not exceed 232.)
The size of the hash table, M, must be 2k such that 2k > 3 * N / 2.
The hash table begins at offset 16 in the section, and consists of an array of M 64-bit slots. Each slot contains a 64-bit signature (using the byte order of the application binary).
The parallel table begins immediately after the hash table (at offset 16 + 8 * M from the beginning of the section), and consists of an array of M 32-bit slots (using the byte order of the application binary), corresponding 1-1 with slots in the hash table. Each entry in the parallel table contains a row index into the pool of offsets.
Unused slots in the hash table will have 0 in both the hash table entry and the parallel table entry. While 0 is a valid hash value, the row index in a used slot will always be non-zero.
Given a 64-bit compilation unit signature or a type signature S, an entry in the hash table is located as follows:
Calculate a primary hash H = S & MASK(k), where MASK(k) is a mask with the low-order k bits all set to 1.
Calculate a secondary hash H' = (((S >> 32) & MASK(k)) | 1).
- If the hash table entry at index H matches the signature, use that entry. If the hash table entry at index H is unused (all zeroes), terminate the search: the signature is not present in the table.
- Let H = (H + H') modulo M. Repeat at Step 3.
Because M > N and H' and M are relatively prime, the search is guaranteed to stop at an unused slot or find the match.
The table of offsets begins immediately following the parallel table (at offset 16 + 12 * M from the beginning of the section). The table is a two-dimensional array of 32-bit words (using the byte order of the application binary), with L columns and N+1 rows, in row-major order. Each row in the array is indexed starting from 0. The first row provides a key to the remaining rows: each column in this row provides an identifier for a debug section, and the offsets in the same column of subsequent rows refer to that section. The section identifiers are:
DW_SECT_INFO |
1 |
.debug_info.dwo |
DW_SECT_TYPES |
2 |
.debug_types.dwo |
DW_SECT_ABBREV |
3 |
.debug_abbrev.dwo |
DW_SECT_LINE |
4 |
.debug_line.dwo |
DW_SECT_LOC |
5 |
.debug_loc.dwo |
DW_SECT_STR_OFFSETS |
6 |
.debug_str_offsets.dwo |
DW_SECT_MACINFO |
7 |
.debug_macinfo.dwo |
DW_SECT_MACRO |
8 |
.debug_macro.dwo |
The offsets provided by the CU and TU index sections are the base offsets for the contributions made by each CU or TU to the corresponding section in the package file. Each CU and TU header contains an abbrev_offset field, used to find the abbreviations table for that CU or TU within the contribution to the .debug_abbrev.dwo section for that CU or TU, and should be interpreted as relative to the base offset given in the index section. Likewise, offsets into .debug_line.dwo from DW_AT_stmt_list attributes should be interpreted as relative to the base offset for .debug_line.dwo, and offsets into other debug sections obtained from DWARF attributes should also be interpreted as relative to the corresponding base offset.
The table of sizes begins immediately following the table of offsets, and provides the sizes of the contributions made by each CU or TU to the corresponding section in the package file. Like the table of offsets, it is a two-dimensional array of 32-bit words, with L columns and N rows, in row-major order. Each row in the array is indexed starting from 1 (row 0 is shared by the two tables).
Format of the File Directory Section
[TBD. For thin package files only.]
Notes
Eventually, we plan to incorporate the functionality of Jakub Jelinek’s DWARF compression tool, dwz, into the packaging tool.