Bug 86654 - [9 Regression] ICE in gen_member_die, at dwarf2out.c:24933
Summary: [9 Regression] ICE in gen_member_die, at dwarf2out.c:24933
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: 9.0
Assignee: Richard Biener
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2018-07-24 08:18 UTC by Martin Liška
Modified: 2018-08-01 06:56 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-07-24 00:00:00


Attachments
test-case 1/4 (766.57 KB, application/x-bzip)
2018-07-24 08:20 UTC, Martin Liška
Details
test-case 2/4 (754.18 KB, application/x-bzip)
2018-07-24 08:20 UTC, Martin Liška
Details
test-case 3/4 (220.19 KB, application/x-bzip)
2018-07-24 08:23 UTC, Martin Liška
Details
test-case 4/4 (235.02 KB, application/x-bzip)
2018-07-24 08:23 UTC, Martin Liška
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2018-07-24 08:18:50 UTC
I have slightly reduced test-case from Firefox:

$ g++ -flto=8  -shared -O2 [1234].ii -fPIC -g
...
lto1: internal compiler error: in gen_member_die, at dwarf2out.c:24933
0x5c8117 gen_member_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:24933
0x5c8117 gen_struct_or_union_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25128
0x871ebf gen_tagged_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25329
0x88bc0f gen_typedef_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25243
0x86fc0a gen_decl_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:26229
0x8727bc gen_type_die_with_usage
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25394
0x873416 gen_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25578
0x86fef2 gen_decl_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:26297
0x8719e2 gen_member_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25032
0x8719e2 gen_struct_or_union_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25128
0x871ebf gen_tagged_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25329
0x872d37 gen_type_die_with_usage
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25524
0x871f49 gen_tagged_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25298
0x872d37 gen_type_die_with_usage
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25524
0x872d63 gen_type_die_with_usage
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25482
0x873416 gen_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25578
0x86fef2 gen_decl_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:26297
0x8719e2 gen_member_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25032
0x8719e2 gen_struct_or_union_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25128
0x871ebf gen_tagged_type_die
	/home/marxin/Programming/gcc/gcc/dwarf2out.c:25329
Comment 1 Martin Liška 2018-07-24 08:20:20 UTC
Created attachment 44426 [details]
test-case 1/4
Comment 2 Martin Liška 2018-07-24 08:20:43 UTC
Created attachment 44427 [details]
test-case 2/4
Comment 3 Martin Liška 2018-07-24 08:23:00 UTC
Created attachment 44428 [details]
test-case 3/4
Comment 4 Martin Liška 2018-07-24 08:23:47 UTC
Created attachment 44429 [details]
test-case 4/4
Comment 5 Richard Biener 2018-07-24 08:28:08 UTC
I'll try to investigate.
Comment 6 Richard Biener 2018-07-24 09:59:43 UTC
So it doesn't seem to be the same issue as the last one but with SCC size != 1
since the following doesn't make it ICE for me:

Index: lto/lto.c
===================================================================
--- lto/lto.c   (revision 262940)
+++ lto/lto.c   (working copy)
@@ -1670,6 +1670,24 @@ unify_scc (struct data_in *data_in, unsi
                {
                  lto_maybe_register_decl (data_in, map[2*i],
                                           (uintptr_t)map2[2*i]);
+                 tree prevail = map[2*i];
+                 if (dref_queue.length () != 0
+                     && ((DECL_P (prevail)
+                          && TREE_CODE (prevail) != FIELD_DECL
+                          && TREE_CODE (prevail) != DEBUG_EXPR_DECL
+                          && TREE_CODE (prevail) != TYPE_DECL)
+                         || TREE_CODE (prevail) == BLOCK))
+                   {
+                     tree nonprevail = streamer_tree_cache_get_tree (cache, (uintptr_t)map2[2*i]);
+                     const char *sym;
+                     unsigned HOST_WIDE_INT off;
+                     if (!debug_hooks->die_ref_for_decl (prevail, &sym, &off))
+                       {
+                         for (unsigned k = 0; k < dref_queue.length (); ++k)
+                           if (dref_queue[k].decl == nonprevail)
+                             gcc_unreachable ();
+                       }
+                   }
                  streamer_tree_cache_replace_tree (cache, map[2*i],
                                                    (uintptr_t)map2[2*i]);
                }


For the testcase we are missing a DIE for the context of operator().constprop
which non-type-context is $2 = <function_decl 0x7ffff6696700 FilterMatches>

Ah, we create operator().constprop late where we _do_ have the DIE for
FilterMatches available but we do not look at DECL_ABSTRACT_ORIGIN
when setting a context die.  Instead we start with comp_unit_die ()
and run into

static void
dwarf2out_decl (tree decl)
{
  dw_die_ref context_die = comp_unit_die ();

  switch (TREE_CODE (decl))
    {
...
    case FUNCTION_DECL:
      /* If we're a nested function, initially use a parent of NULL; if we're
         a plain function, this will be fixed up in decls_for_scope.  If
         we're a method, it will be ignored, since we already have a DIE.  */
      if (decl_function_context (decl)
          /* But if we're in terse mode, we don't care about scope.  */
          && debug_info_level > DINFO_LEVEL_TERSE)
        context_die = NULL;
      break;

my gut feeling would be to guard the above with early_dwarf ...

The other option would be to assign a more appropriate DECL_CONTEXT to
clones rather than simply copying the DECL_CONTEXT of the origin.

So the following otherwise untested patch fixes the testcase:

Index: dwarf2out.c
===================================================================
--- dwarf2out.c (revision 262940)
+++ dwarf2out.c (working copy)
@@ -26703,7 +26703,8 @@ dwarf2out_decl (tree decl)
       /* If we're a nested function, initially use a parent of NULL; if we're
         a plain function, this will be fixed up in decls_for_scope.  If
         we're a method, it will be ignored, since we already have a DIE.  */
-      if (decl_function_context (decl)
+      if (early_dwarf
+         && decl_function_context (decl)
          /* But if we're in terse mode, we don't care about scope.  */
          && debug_info_level > DINFO_LEVEL_TERSE)
        context_die = NULL;
Comment 7 Martin Liška 2018-07-24 12:14:22 UTC
With the dwarf2out.c file patches, now the library builds. But it took my ~30 minutes of linking, seeing perf top:

    36.96%  lto1           [.] lookup_external_ref
    18.60%  lto1           [.] hash_table<external_ref_hasher, xcallocator>::find_empty_slot_for_expand
     4.68%  as             [.] hash_lookup.isra.0
     1.92%  as             [.] resolve_symbol_value
     0.74%  lto1           [.] mark_used_flags
     0.72%  as             [.] relax_segment

and debug info of the shared library looks huge:

bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
     VM SIZE                       FILE SIZE
 --------------                 --------------
   0.0%       0 .debug_info       937Mi  52.7%
   0.0%       0 .debug_loc        339Mi  19.1%
   0.0%       0 .debug_str        159Mi   9.0%
   0.0%       0 .debug_ranges     110Mi   6.2%
   0.0%       0 .debug_line      69.0Mi   3.9%
  68.3%  65.3Mi .text            65.3Mi   3.7%
   0.0%       0 .strtab          33.1Mi   1.9%
   0.0%       0 .symtab          24.4Mi   1.4%
   0.0%       0 .debug_abbrev    9.99Mi   0.6%
   8.3%  7.91Mi .rela.dyn        7.91Mi   0.4%
   8.0%  7.67Mi .rodata          7.67Mi   0.4%
   6.2%  5.89Mi .eh_frame        5.89Mi   0.3%
   4.1%  3.90Mi .data.rel.ro     3.90Mi   0.2%
   1.7%  1.59Mi .dynstr          1.59Mi   0.1%
   1.4%  1.35Mi .eh_frame_hdr    1.35Mi   0.1%
   1.0%   990Ki [Other]          1003Ki   0.1%
   0.6%   616Ki .bss                  0   0.0%
   0.4%   398Ki .dynsym           398Ki   0.0%
   0.0%       0 .debug_pubtypes   349Ki   0.0%
   0.0%       0 .debug_pubnames   285Ki   0.0%
   0.0%      23 [None]                0   0.0%
 100.0%  95.6Mi TOTAL            1.74Gi 100.0%
Comment 8 rguenther@suse.de 2018-07-24 13:23:49 UTC
On Tue, 24 Jul 2018, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86654
> 
> --- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
> With the dwarf2out.c file patches, now the library builds. But it took my ~30
> minutes of linking, seeing perf top:
> 
>     36.96%  lto1           [.] lookup_external_ref
>     18.60%  lto1           [.] hash_table<external_ref_hasher,
> xcallocator>::find_empty_slot_for_expand
>      4.68%  as             [.] hash_lookup.isra.0
>      1.92%  as             [.] resolve_symbol_value
>      0.74%  lto1           [.] mark_used_flags
>      0.72%  as             [.] relax_segment

So you applied the first patch as well?  That was for debugging.  And
it didn't fire?  That's very good ;)

> and debug info of the shared library looks huge:
> 
> bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
>      VM SIZE                       FILE SIZE
>  --------------                 --------------
>    0.0%       0 .debug_info       937Mi  52.7%
>    0.0%       0 .debug_loc        339Mi  19.1%
>    0.0%       0 .debug_str        159Mi   9.0%
>    0.0%       0 .debug_ranges     110Mi   6.2%
>    0.0%       0 .debug_line      69.0Mi   3.9%
>   68.3%  65.3Mi .text            65.3Mi   3.7%
>    0.0%       0 .strtab          33.1Mi   1.9%
>    0.0%       0 .symtab          24.4Mi   1.4%
>    0.0%       0 .debug_abbrev    9.99Mi   0.6%
>    8.3%  7.91Mi .rela.dyn        7.91Mi   0.4%
>    8.0%  7.67Mi .rodata          7.67Mi   0.4%
>    6.2%  5.89Mi .eh_frame        5.89Mi   0.3%
>    4.1%  3.90Mi .data.rel.ro     3.90Mi   0.2%
>    1.7%  1.59Mi .dynstr          1.59Mi   0.1%
>    1.4%  1.35Mi .eh_frame_hdr    1.35Mi   0.1%
>    1.0%   990Ki [Other]          1003Ki   0.1%
>    0.6%   616Ki .bss                  0   0.0%
>    0.4%   398Ki .dynsym           398Ki   0.0%
>    0.0%       0 .debug_pubtypes   349Ki   0.0%
>    0.0%       0 .debug_pubnames   285Ki   0.0%
>    0.0%      23 [None]                0   0.0%
>  100.0%  95.6Mi TOTAL            1.74Gi 100.0%

Not so bad I think.  How's its size without LTO?

   0.0%       0 .debug_info     67.1Mi  52.8%
  58.2%  22.1Mi .text           22.1Mi  17.4%

but yes, PR83941 could be a reason for some bloat.  You could try
"counting" the number of DIEs that just contain a single 
DW_AT_abstract_origin attribute and no children.
Comment 9 Martin Liška 2018-07-24 14:38:49 UTC
(In reply to rguenther@suse.de from comment #8)
> On Tue, 24 Jul 2018, marxin at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86654
> > 
> > --- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
> > With the dwarf2out.c file patches, now the library builds. But it took my ~30
> > minutes of linking, seeing perf top:
> > 
> >     36.96%  lto1           [.] lookup_external_ref
> >     18.60%  lto1           [.] hash_table<external_ref_hasher,
> > xcallocator>::find_empty_slot_for_expand
> >      4.68%  as             [.] hash_lookup.isra.0
> >      1.92%  as             [.] resolve_symbol_value
> >      0.74%  lto1           [.] mark_used_flags
> >      0.72%  as             [.] relax_segment
> 
> So you applied the first patch as well?  That was for debugging.  And
> it didn't fire?  That's very good ;)

No, no, only the one-liner in dwarwf2out.c.

> 
> > and debug info of the shared library looks huge:
> > 
> > bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
> >      VM SIZE                       FILE SIZE
> >  --------------                 --------------
> >    0.0%       0 .debug_info       937Mi  52.7%
> >    0.0%       0 .debug_loc        339Mi  19.1%
> >    0.0%       0 .debug_str        159Mi   9.0%
> >    0.0%       0 .debug_ranges     110Mi   6.2%
> >    0.0%       0 .debug_line      69.0Mi   3.9%
> >   68.3%  65.3Mi .text            65.3Mi   3.7%
> >    0.0%       0 .strtab          33.1Mi   1.9%
> >    0.0%       0 .symtab          24.4Mi   1.4%
> >    0.0%       0 .debug_abbrev    9.99Mi   0.6%
> >    8.3%  7.91Mi .rela.dyn        7.91Mi   0.4%
> >    8.0%  7.67Mi .rodata          7.67Mi   0.4%
> >    6.2%  5.89Mi .eh_frame        5.89Mi   0.3%
> >    4.1%  3.90Mi .data.rel.ro     3.90Mi   0.2%
> >    1.7%  1.59Mi .dynstr          1.59Mi   0.1%
> >    1.4%  1.35Mi .eh_frame_hdr    1.35Mi   0.1%
> >    1.0%   990Ki [Other]          1003Ki   0.1%
> >    0.6%   616Ki .bss                  0   0.0%
> >    0.4%   398Ki .dynsym           398Ki   0.0%
> >    0.0%       0 .debug_pubtypes   349Ki   0.0%
> >    0.0%       0 .debug_pubnames   285Ki   0.0%
> >    0.0%      23 [None]                0   0.0%
> >  100.0%  95.6Mi TOTAL            1.74Gi 100.0%
> 
> Not so bad I think.  How's its size without LTO?

Oh, you were right, it's really improvement:

bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
     VM SIZE                       FILE SIZE
 --------------                 --------------
   0.0%       0 .debug_info       979Mi  48.6%
   0.0%       0 .debug_loc        458Mi  22.8%
   0.0%       0 .debug_str        158Mi   7.9%
   0.0%       0 .debug_ranges     132Mi   6.6%
   0.0%       0 .debug_line       112Mi   5.6%
  67.6%  74.6Mi .text            74.6Mi   3.7%
   0.0%       0 .strtab          37.8Mi   1.9%
   0.0%       0 .symtab          14.0Mi   0.7%
   0.0%       0 .debug_abbrev    11.4Mi   0.6%
   7.9%  8.74Mi .eh_frame        8.74Mi   0.4%
   7.7%  8.49Mi .rodata          8.49Mi   0.4%
   7.7%  8.47Mi .rela.dyn        8.47Mi   0.4%
   3.8%  4.20Mi .data.rel.ro     4.20Mi   0.2%
   1.9%  2.05Mi .eh_frame_hdr    2.05Mi   0.1%
   1.5%  1.65Mi .dynstr          1.65Mi   0.1%
   0.9%  1.04Mi [Other]          1.32Mi   0.1%
   0.0%       0 .debug_aranges   1.29Mi   0.1%
   0.6%   650Ki .bss                  0   0.0%
   0.4%   413Ki .dynsym           413Ki   0.0%
   0.0%       0 .debug_pubtypes   349Ki   0.0%
   0.0%      15 [None]                0   0.0%
 100.0%   110Mi TOTAL            1.97Gi 100.0%

diff:

./obj-x86_64-pc-linux-gnu2/toolkit/library/libxul.so -- ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
     VM SIZE                      FILE SIZE
 ++++++++++++++ GROWING        ++++++++++++++
  [ = ]       0 .symtab        +10.3Mi   +74%
  [ = ]       0 .debug_str      +500Ki  +0.3%
  +1.0%     +16 .gnu.version_r     +16  +1.0%
   +53%      +8 [None]               0  [ = ]

 -------------- SHRINKING      --------------
  [ = ]       0 .debug_loc      -119Mi -26.0%
  [ = ]       0 .debug_line    -43.1Mi -38.4%
  [ = ]       0 .debug_info    -42.1Mi  -4.3%
  [ = ]       0 .debug_ranges  -22.8Mi -17.2%
 -12.4% -9.24Mi .text          -9.24Mi -12.4%
  [ = ]       0 .strtab        -4.73Mi -12.5%
 -32.7% -2.86Mi .eh_frame      -2.86Mi -32.7%
  [ = ]       0 .debug_abbrev  -1.46Mi -12.8%
  [ = ]       0 .debug_aranges -1.28Mi -99.7%
  -9.6%  -830Ki .rodata         -830Ki  -9.6%
 -34.1%  -716Ki .eh_frame_hdr   -716Ki -34.1%
  -6.7%  -578Ki .rela.dyn       -578Ki  -6.7%
  -7.1%  -304Ki .data.rel.ro    -304Ki  -7.1%
  -3.6% -61.3Ki .dynstr        -61.3Ki  -3.6%
  -5.3% -34.6Ki .bss                 0  [ = ]
 -13.4% -32.9Ki .data          -32.9Ki -13.4%
  -3.7% -15.4Ki .dynsym        -15.4Ki  -3.7%
  -5.2% -13.4Ki .rela.plt      -13.4Ki  -5.2%
  -3.8% -11.3Ki [Other]        -11.8Ki  -3.9%
  -5.2% -8.92Ki .plt           -8.92Ki  -5.2%
  -5.2% -4.46Ki .got.plt       -4.46Ki  -5.2%

 -+-+-+-+-+-+-+ MIXED          +-+-+-+-+-+-+-
 -68.2%    -161 [Unmapped]        +531   +21%

 -13.3% -14.7Mi TOTAL           -238Mi -11.8%


> 
>    0.0%       0 .debug_info     67.1Mi  52.8%
>   58.2%  22.1Mi .text           22.1Mi  17.4%
> 
> but yes, PR83941 could be a reason for some bloat.  You could try
> "counting" the number of DIEs that just contain a single 
> DW_AT_abstract_origin attribute and no children.

Can you please prepare patch for that? Looks the non-LTO speed is slightly
faster, but still not much. Thus the patch looks promising.
Comment 10 rguenther@suse.de 2018-07-24 14:57:16 UTC
On Tue, 24 Jul 2018, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86654
> 
> --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #8)
> > On Tue, 24 Jul 2018, marxin at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86654
> > > 
> > > --- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
> > > With the dwarf2out.c file patches, now the library builds. But it took my ~30
> > > minutes of linking, seeing perf top:
> > > 
> > >     36.96%  lto1           [.] lookup_external_ref
> > >     18.60%  lto1           [.] hash_table<external_ref_hasher,
> > > xcallocator>::find_empty_slot_for_expand
> > >      4.68%  as             [.] hash_lookup.isra.0
> > >      1.92%  as             [.] resolve_symbol_value
> > >      0.74%  lto1           [.] mark_used_flags
> > >      0.72%  as             [.] relax_segment
> > 
> > So you applied the first patch as well?  That was for debugging.  And
> > it didn't fire?  That's very good ;)
> 
> No, no, only the one-liner in dwarwf2out.c.

Ok, so optimize_external_refs is somehow expensive.  Note it
wont' actually do anything but it still builds a map of
all external debug refs ...

I guess I should try to optimize this.  Can you open a PR for this?

> > 
> > > and debug info of the shared library looks huge:
> > > 
> > > bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
> > >      VM SIZE                       FILE SIZE
> > >  --------------                 --------------
> > >    0.0%       0 .debug_info       937Mi  52.7%
> > >    0.0%       0 .debug_loc        339Mi  19.1%
> > >    0.0%       0 .debug_str        159Mi   9.0%
> > >    0.0%       0 .debug_ranges     110Mi   6.2%
> > >    0.0%       0 .debug_line      69.0Mi   3.9%
> > >   68.3%  65.3Mi .text            65.3Mi   3.7%
> > >    0.0%       0 .strtab          33.1Mi   1.9%
> > >    0.0%       0 .symtab          24.4Mi   1.4%
> > >    0.0%       0 .debug_abbrev    9.99Mi   0.6%
> > >    8.3%  7.91Mi .rela.dyn        7.91Mi   0.4%
> > >    8.0%  7.67Mi .rodata          7.67Mi   0.4%
> > >    6.2%  5.89Mi .eh_frame        5.89Mi   0.3%
> > >    4.1%  3.90Mi .data.rel.ro     3.90Mi   0.2%
> > >    1.7%  1.59Mi .dynstr          1.59Mi   0.1%
> > >    1.4%  1.35Mi .eh_frame_hdr    1.35Mi   0.1%
> > >    1.0%   990Ki [Other]          1003Ki   0.1%
> > >    0.6%   616Ki .bss                  0   0.0%
> > >    0.4%   398Ki .dynsym           398Ki   0.0%
> > >    0.0%       0 .debug_pubtypes   349Ki   0.0%
> > >    0.0%       0 .debug_pubnames   285Ki   0.0%
> > >    0.0%      23 [None]                0   0.0%
> > >  100.0%  95.6Mi TOTAL            1.74Gi 100.0%
> > 
> > Not so bad I think.  How's its size without LTO?
> 
> Oh, you were right, it's really improvement:
> 
> bloaty ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
>      VM SIZE                       FILE SIZE
>  --------------                 --------------
>    0.0%       0 .debug_info       979Mi  48.6%
>    0.0%       0 .debug_loc        458Mi  22.8%
>    0.0%       0 .debug_str        158Mi   7.9%
>    0.0%       0 .debug_ranges     132Mi   6.6%
>    0.0%       0 .debug_line       112Mi   5.6%
>   67.6%  74.6Mi .text            74.6Mi   3.7%
>    0.0%       0 .strtab          37.8Mi   1.9%
>    0.0%       0 .symtab          14.0Mi   0.7%
>    0.0%       0 .debug_abbrev    11.4Mi   0.6%
>    7.9%  8.74Mi .eh_frame        8.74Mi   0.4%
>    7.7%  8.49Mi .rodata          8.49Mi   0.4%
>    7.7%  8.47Mi .rela.dyn        8.47Mi   0.4%
>    3.8%  4.20Mi .data.rel.ro     4.20Mi   0.2%
>    1.9%  2.05Mi .eh_frame_hdr    2.05Mi   0.1%
>    1.5%  1.65Mi .dynstr          1.65Mi   0.1%
>    0.9%  1.04Mi [Other]          1.32Mi   0.1%
>    0.0%       0 .debug_aranges   1.29Mi   0.1%
>    0.6%   650Ki .bss                  0   0.0%
>    0.4%   413Ki .dynsym           413Ki   0.0%
>    0.0%       0 .debug_pubtypes   349Ki   0.0%
>    0.0%      15 [None]                0   0.0%
>  100.0%   110Mi TOTAL            1.97Gi 100.0%
> 
> diff:
> 
> ./obj-x86_64-pc-linux-gnu2/toolkit/library/libxul.so --
> ./obj-x86_64-pc-linux-gnu/toolkit/library/libxul.so
>      VM SIZE                      FILE SIZE
>  ++++++++++++++ GROWING        ++++++++++++++
>   [ = ]       0 .symtab        +10.3Mi   +74%
>   [ = ]       0 .debug_str      +500Ki  +0.3%
>   +1.0%     +16 .gnu.version_r     +16  +1.0%
>    +53%      +8 [None]               0  [ = ]
> 
>  -------------- SHRINKING      --------------
>   [ = ]       0 .debug_loc      -119Mi -26.0%
>   [ = ]       0 .debug_line    -43.1Mi -38.4%
>   [ = ]       0 .debug_info    -42.1Mi  -4.3%
>   [ = ]       0 .debug_ranges  -22.8Mi -17.2%
>  -12.4% -9.24Mi .text          -9.24Mi -12.4%
>   [ = ]       0 .strtab        -4.73Mi -12.5%
>  -32.7% -2.86Mi .eh_frame      -2.86Mi -32.7%
>   [ = ]       0 .debug_abbrev  -1.46Mi -12.8%
>   [ = ]       0 .debug_aranges -1.28Mi -99.7%
>   -9.6%  -830Ki .rodata         -830Ki  -9.6%
>  -34.1%  -716Ki .eh_frame_hdr   -716Ki -34.1%
>   -6.7%  -578Ki .rela.dyn       -578Ki  -6.7%
>   -7.1%  -304Ki .data.rel.ro    -304Ki  -7.1%
>   -3.6% -61.3Ki .dynstr        -61.3Ki  -3.6%
>   -5.3% -34.6Ki .bss                 0  [ = ]
>  -13.4% -32.9Ki .data          -32.9Ki -13.4%
>   -3.7% -15.4Ki .dynsym        -15.4Ki  -3.7%
>   -5.2% -13.4Ki .rela.plt      -13.4Ki  -5.2%
>   -3.8% -11.3Ki [Other]        -11.8Ki  -3.9%
>   -5.2% -8.92Ki .plt           -8.92Ki  -5.2%
>   -5.2% -4.46Ki .got.plt       -4.46Ki  -5.2%
> 
>  -+-+-+-+-+-+-+ MIXED          +-+-+-+-+-+-+-
>  -68.2%    -161 [Unmapped]        +531   +21%
> 
>  -13.3% -14.7Mi TOTAL           -238Mi -11.8%

Heh, any "improvement" here is of course a possible loss
in debug info precision...

> >    0.0%       0 .debug_info     67.1Mi  52.8%
> >   58.2%  22.1Mi .text           22.1Mi  17.4%
> > 
> > but yes, PR83941 could be a reason for some bloat.  You could try
> > "counting" the number of DIEs that just contain a single 
> > DW_AT_abstract_origin attribute and no children.
> 
> Can you please prepare patch for that? Looks the non-LTO speed is slightly
> faster, but still not much. Thus the patch looks promising.

I don't have a good handle on that issue yet but I'll "test" the
patch shortly so firefox is fixed.

I suspect the best approach is to generate the ref DIEs lazily
on lookup... (this could get really messy though).
Comment 11 Richard Biener 2018-07-24 17:34:55 UTC
OK, so patch at least regresses

FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++11  scan-assembler-times DW_TAG_variable[^.]*.ascii "this.0" 2
FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++14  scan-assembler-times DW_TAG_variable[^.]*.ascii "this.0" 2

but otherwise passes LTO bootstrap.  Have to investigate the above tomorrow.
Comment 12 Richard Biener 2018-07-25 07:57:00 UTC
(In reply to Richard Biener from comment #11)
> OK, so patch at least regresses
> 
> FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++11  scan-assembler-times
> DW_TAG_variable[^.]*.ascii "this.0" 2
> FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++14  scan-assembler-times
> DW_TAG_variable[^.]*.ascii "this.0" 2
> 
> but otherwise passes LTO bootstrap.  Have to investigate the above tomorrow.

The reason is that in the late phase we fail DIE re-use and instead generate
a specification DIE because

      if (((is_unit_die (old_die->die_parent)
            /* This condition fixes the inconsistency/ICE with the
               following Fortran test (or some derivative thereof) while
               building libgfortran:

                  module some_m
                  contains
                     logical function funky (FLAG)
                       funky = .true.
                    end function
                  end module
             */
            || (old_die->die_parent
                && old_die->die_parent->die_tag == DW_TAG_module)
            || context_die == NULL)
...
          subr_die = old_die;

triggers only because of context_die == NULL.  This is a funky area
but the comment "For local class methods, this doesn't apply; we just use the old DIE." suggests

            || local_scope_p (old_die->die_parent)

pre early-debug merge already had the context_die == NULL check which
_possibly_ was supposed to match that comment.
Comment 13 Martin Liška 2018-07-25 09:06:11 UTC
Just for record, building Firefox w/ GCC 8.1 w/o LTO produces:

     VM SIZE                       FILE SIZE
 --------------                 --------------
   0.0%       0 .debug_info       978Mi  48.5%
   0.0%       0 .debug_loc        460Mi  22.8%
   0.0%       0 .debug_str        158Mi   7.9%
   0.0%       0 .debug_ranges     132Mi   6.6%
   0.0%       0 .debug_line       111Mi   5.5%
  67.7%  74.9Mi .text            74.9Mi   3.7%
   0.0%       0 .strtab          37.8Mi   1.9%
   0.0%       0 .symtab          14.1Mi   0.7%
   0.0%       0 .debug_abbrev    11.4Mi   0.6%
   7.9%  8.75Mi .eh_frame        8.75Mi   0.4%
   7.7%  8.47Mi .rela.dyn        8.47Mi   0.4%
   7.7%  8.47Mi .rodata          8.47Mi   0.4%
   3.8%  4.20Mi .data.rel.ro     4.20Mi   0.2%
   1.9%  2.05Mi .eh_frame_hdr    2.05Mi   0.1%
   1.5%  1.65Mi .dynstr          1.65Mi   0.1%
   0.9%  1.04Mi [Other]          1.33Mi   0.1%
   0.0%       0 .debug_aranges   1.29Mi   0.1%
   0.6%   650Ki .bss                  0   0.0%
   0.4%   413Ki .dynsym           413Ki   0.0%
   0.0%       0 .debug_pubtypes   349Ki   0.0%
   0.0%      15 [None]                0   0.0%
 100.0%   110Mi TOTAL            1.97Gi 100.0%
Comment 14 rguenther@suse.de 2018-07-25 09:48:24 UTC
On Wed, 25 Jul 2018, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86654
> 
> --- Comment #13 from Martin Liška <marxin at gcc dot gnu.org> ---
> Just for record, building Firefox w/ GCC 8.1 w/o LTO produces:
> 
>      VM SIZE                       FILE SIZE
>  --------------                 --------------
>    0.0%       0 .debug_info       978Mi  48.5%
>    0.0%       0 .debug_loc        460Mi  22.8%
>    0.0%       0 .debug_str        158Mi   7.9%
>    0.0%       0 .debug_ranges     132Mi   6.6%
>    0.0%       0 .debug_line       111Mi   5.5%
>   67.7%  74.9Mi .text            74.9Mi   3.7%
>    0.0%       0 .strtab          37.8Mi   1.9%
>    0.0%       0 .symtab          14.1Mi   0.7%
>    0.0%       0 .debug_abbrev    11.4Mi   0.6%
>    7.9%  8.75Mi .eh_frame        8.75Mi   0.4%
>    7.7%  8.47Mi .rela.dyn        8.47Mi   0.4%
>    7.7%  8.47Mi .rodata          8.47Mi   0.4%
>    3.8%  4.20Mi .data.rel.ro     4.20Mi   0.2%
>    1.9%  2.05Mi .eh_frame_hdr    2.05Mi   0.1%
>    1.5%  1.65Mi .dynstr          1.65Mi   0.1%
>    0.9%  1.04Mi [Other]          1.33Mi   0.1%
>    0.0%       0 .debug_aranges   1.29Mi   0.1%
>    0.6%   650Ki .bss                  0   0.0%
>    0.4%   413Ki .dynsym           413Ki   0.0%
>    0.0%       0 .debug_pubtypes   349Ki   0.0%
>    0.0%      15 [None]                0   0.0%
>  100.0%   110Mi TOTAL            1.97Gi 100.0%

If you throw dwz on it?  It should be able to compress the
early debug quite well (header file stuff)
Comment 15 Richard Biener 2018-07-25 12:10:44 UTC
Author: rguenth
Date: Wed Jul 25 12:10:13 2018
New Revision: 262965

URL: https://gcc.gnu.org/viewcvs?rev=262965&root=gcc&view=rev
Log:
2018-07-25  Richard Biener  <rguenther@suse.de>

	PR debug/86654
	* dwarf2out.c (dwarf2out_decl): Do not handle nested functions
	special wrt context_die late.
	(gen_subprogram_die): Re-use DIEs in local scope.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/dwarf2out.c
Comment 16 Martin Liška 2018-07-25 12:39:32 UTC
(In reply to Martin Liška from comment #13)
> Just for record, building Firefox w/ GCC 8.1 w/o LTO produces:
> 
>      VM SIZE                       FILE SIZE
>  --------------                 --------------
>    0.0%       0 .debug_info       978Mi  48.5%
>    0.0%       0 .debug_loc        460Mi  22.8%
>    0.0%       0 .debug_str        158Mi   7.9%
>    0.0%       0 .debug_ranges     132Mi   6.6%
>    0.0%       0 .debug_line       111Mi   5.5%
>   67.7%  74.9Mi .text            74.9Mi   3.7%
>    0.0%       0 .strtab          37.8Mi   1.9%
>    0.0%       0 .symtab          14.1Mi   0.7%
>    0.0%       0 .debug_abbrev    11.4Mi   0.6%
>    7.9%  8.75Mi .eh_frame        8.75Mi   0.4%
>    7.7%  8.47Mi .rela.dyn        8.47Mi   0.4%
>    7.7%  8.47Mi .rodata          8.47Mi   0.4%
>    3.8%  4.20Mi .data.rel.ro     4.20Mi   0.2%
>    1.9%  2.05Mi .eh_frame_hdr    2.05Mi   0.1%
>    1.5%  1.65Mi .dynstr          1.65Mi   0.1%
>    0.9%  1.04Mi [Other]          1.33Mi   0.1%
>    0.0%       0 .debug_aranges   1.29Mi   0.1%
>    0.6%   650Ki .bss                  0   0.0%
>    0.4%   413Ki .dynsym           413Ki   0.0%
>    0.0%       0 .debug_pubtypes   349Ki   0.0%
>    0.0%      15 [None]                0   0.0%
>  100.0%   110Mi TOTAL            1.97Gi 100.0%

When running dwz on that:

bloaty libxul.so
     VM SIZE                      FILE SIZE
 --------------                --------------
   0.0%       0 .debug_info      629Mi  37.8%
   0.0%       0 .debug_loc       460Mi  27.6%
   0.0%       0 .debug_str       158Mi   9.5%
   0.0%       0 .debug_ranges    132Mi   8.0%
   0.0%       0 .debug_line      111Mi   6.7%
  67.7%  74.9Mi .text           74.9Mi   4.5%
   0.0%       0 .strtab         37.8Mi   2.3%
   0.0%       0 .symtab         14.1Mi   0.8%
   0.0%       0 .debug_abbrev   10.3Mi   0.6%
   7.9%  8.75Mi .eh_frame       8.75Mi   0.5%
   7.7%  8.47Mi .rela.dyn       8.47Mi   0.5%
   7.7%  8.47Mi .rodata         8.47Mi   0.5%
   3.8%  4.20Mi .data.rel.ro    4.20Mi   0.3%
   1.9%  2.05Mi .eh_frame_hdr   2.05Mi   0.1%
   1.5%  1.65Mi .dynstr         1.65Mi   0.1%
   0.0%       0 .debug_aranges  1.29Mi   0.1%
   0.7%   805Ki [Other]          815Ki   0.0%
   0.6%   650Ki .bss                 0   0.0%
   0.4%   413Ki .dynsym          413Ki   0.0%
   0.2%   255Ki .rela.plt        255Ki   0.0%
   0.0%      15 [None]               0   0.0%
 100.0%   110Mi TOTAL           1.63Gi 100.0%

I'll test LTO build as well.
Comment 17 Richard Biener 2018-08-01 06:56:02 UTC
Fixed btw.