This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [debug-early] LTO streaming of on-the-side dwarf data structures

On Tue, Oct 14, 2014 at 10:07 PM, Aldy Hernandez <> wrote:
> On 10/14/14 06:21, Richard Biener wrote:
>> On Tue, Oct 14, 2014 at 2:48 AM, Aldy Hernandez <> wrote:
>>> Gentlemen, your feedback would be greatly appreciated!
>>> I was investigating why locals were not being early dumped, and realized
>>> Michael's patch was skipping decls_for_scope() unless
>>> DECL_STRUCT_FUNCTION->gimple_df was set.  I assume this was to wait until
>>> location information was available.  That design caused locals to be
>>> dumped
>>> LATE, which defeats the whole purpose of this exercise.
>>> Since we want the local DECL DIEs to be generated early as well, we'd
>>> want
>>> the location information to be amended in the second dwarf pass. This got
>>> me
>>> thinking about deferred_locations, and all these on-the-side data
>>> structures
>>> that a second dwarf pass would depend on.  Unless I'm misunderstanding
>>> something, we need a plan...
>>> Basically, any early collected data that dwarf2out_finish() and
>>> dwarf2out_function_decl() need, would need to be LTO streamed out after
>>> early dwarf generation and streamed back before the second dwarf pass.
>>> For
>>> instance, I see at least the following that need to be streamed in/out:
>>>          file_table
>>>          deferred_locations
>>>          limbo_die_list (and deferred_asm_name)
>>>          decl_die_table
>>>          pubname_table
>>>          pubtype_table
>> I think that all but decl_die_table should not be needed (which may
>> need implementation changes in dwarf2out.c of course).  Maybe you
>> can explain why you think they are needed late.
> I see what you mean.  With some minor surgery I was able to remove all
> references to deferred_locations.  I assume limbo_die_list and
> deferred_asm_name can be submitted to similar surgery.
> How about file_table?  In dwarf2out_finish() we need file_table to emit
> DW_AT_comp_dir if no relative file names are used.  I suppose we could
> determine that information by traversing the DIE table and scanning all
> DW_AT_decl_file, albeit slower.  Would this be acceptable?

How can this all not be part of the early debug we emit?

> How about pubname_table (and pubtype_type??)?  It looks like we need a list
> of all publicly accessible names, but output_pubnames() ends up writing
> directly to the assembly file, and this can only happen at the very end of
> dwarf generation.  I suppose we could also traverse the DIE table and pick
> publicly accessible names (direct children of DW_TAG_compile_units, and/or
> some other static/extern flag in the DIE??)??.  Am I missing something?

Sounds like this information is also complete at early debug time?
That we write this at the very end is an implementation detail(?)

>> For decl_die_table the idea was to be able to create references to
>> the early output DIEs via decl->die_offset (to be added and LTO streamed)
>> and the translation unit decls symbol of the dwarf tree root.
> How so?  Do you mean by storing the DECL's DECL_UID in the corresponding
> die_offset, since die_offset will be zero (and unused) after early dwarf
> dumping?  If so, that's kinda neat.  We could recreate the hash from that.

No, I'd really store the byte offset from the start of the early dwarf tree
here (and the early dwarf tree should be referable to by a label).  So
each DIE can be referenced to via a symbol+offset relocation
(well, of course a dwarf expert has to think about how to actually
do the dwarf here).

> Another similar issue I've seen is handling DW_TAG_lexical_block
> (gen_lexical_block_die).  Ideally we should generate the
> DW_TAG_lexical_block and the corresponding locals in early dumping, and then
> fill in the high/low attributes of the lexical block the second time around.
> We would need a hash similar to decl_die_table to get from
> BLOCK->DW_TAG_lexical_block, similar to die_table_offset.  For that matter,
> we could store the relationship in die_table_offset, or in the die_offset if
> I understood things correctly.

Yes, we'd create the DW_TAG_lexical_block early, store and stream a
reference to the early DIE in the BLOCK and annotate that DIE
late by means of some clever dwarf tricks.

For non-LTO operation we can of course continue to modify the
dwarf2out DIE tree in-place and emit that late.  Thus the "reference to
the early DIE" may just be a pointer to the DIE in the dwarf2out DIE
tree in that case.

So we'd have to add a

union dw_ref {
   dw_die_ref direct_ref;
   off_t indirect_ref;

in each DECL and in each BLOCK.  Which means the dwarf2out.c
code could get rid of its decl -> DIE hash.

Or we don't add this to the nodes directly but only construct it on-the-fly
when streaming LTO.

>>> We could either stream the hash tables and/or data structures above (and
>>> merge them from different compilation units upon stream-in), or we could
>>> come up with some way of annotating existing dwarf (to be read/merged
>>> back
>>> in and annotated).  For instance, deferred_locations, decl_die_table, and
>>> limbo_die_lists need to associate a DIE with a TREE.  We could tag the
>>> DIE
>>> with an artificial DW_AT_* that has some byte representation of the TREE
>>> and
>>> recreate the hash at LTO stream-in time.  For other data structures
>>> (perhaps
>>> file_table and pub*_table), perhaps we could come up with yet another way
>>> o
>>> representing the data in Dwarf.
>> Why do we end up with any deferred stuff after early dwarf?  Similarly
>> nothing should be on the limbo list (instead we should properly construct
>> early dwarf!).
>>> However...I don't know if this is worth the trouble, or if we should just
>>> stream the individual hash tables and data structures, and not bother
>>> with
>>> this dwarf gymnastics.
>>> Did anybody have a plan for this?  Am I misunderstanding something, or do
>>> we
>>> need to stream a lot of these on-the-side structures?
>> I indeed hope we don't need to stream all this, but it may need more
>> "structured" generation of early dwarf (so we always have an origin
>> and thus do not need the limbo list for example).
> What's this "so we always have an origin..." bit?  I'm not following.

IIRC DIEs end up on the limbo list if they do not have a "parent" where
the DIE can be hooked in to.  That happens when we created a DIE
for an entity where for some reason we don't know a context or we
don't want to generate a DIE for the context at the very moment.

> Thanks for great insight.  The code looks much cleaner now without
> deferred_locations (and soon with many more deletions :)).



> Aldy

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]