Bug 94495 - [10 Regression] Debug info size growth since r10-7515-g2c0fa3ecf70d199a
Summary: [10 Regression] Debug info size growth since r10-7515-g2c0fa3ecf70d199a
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: debug (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 10.0
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2020-04-06 07:44 UTC by Martin Liška
Modified: 2020-04-15 05:41 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 9.3.0
Known to fail: 10.0
Last reconfirmed: 2020-04-06 00:00:00


Attachments
gcc10-pr94495.patch (1.67 KB, patch)
2020-04-09 09:33 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Jakub Jelinek 2020-04-06 08:01:30 UTC
The numbers I got is that something grew up a little bit, something shrunk a little bit, sometimes .debug_info grew up and .debug_loc shrunk, sometimes the other way around, but in general it wasn't significant.
Debug info size growth in itself is not something bad if it means better variable location coverage, on the other size shrinking is something bad if it means fewer variables are covered or on a smaller set of instructions.
Comment 2 Richard Biener 2020-04-06 11:58:12 UTC
Looks like ~20% for the first case so possibly worth investigating.  I can very well imagine we now less often run into some cut-offs and generate debug while we gave up previously.
Comment 3 Martin Liška 2020-04-06 13:14:40 UTC
One TU difference from SPEC2006 454.calculix:

$ gfortran -c -o restarts.o -ISPOOLES -Ofast -g -std=legacy restarts.f
...
$ ~/Programming/bloaty/bloaty restarts.o -- /tmp/before.o
     VM SIZE                        FILE SIZE
 --------------                  --------------
  [ = ]       0 .rela.debug_info +7.22Ki  +162%
  [ = ]       0 .debug_loc       +1.07Ki  +615%
  [ = ]       0 .debug_line           +3  +1.5%
  [ = ]       0 [Unmapped]            +2  +7.1%
  [ = ]       0 .debug_abbrev         -3  -1.5%
  [ = ]       0 .debug_info         -272 -31.4%
  [ = ]       0 TOTAL            +8.02Ki   +65%
Comment 4 Martin Liška 2020-04-06 13:34:21 UTC
There's a bigger object file:

$ /Programming/bloaty/bloaty nonlingeo.after.o
     VM SIZE                           FILE SIZE
 --------------                     --------------
  84.0%  32.7Ki .text                32.7Ki  33.9%
   0.0%       0 .rela.debug_info     24.7Ki  25.6%
   0.0%       0 .debug_loc           10.3Ki  10.7%
   0.0%       0 .rela.text           8.16Ki   8.5%
   0.0%       0 .debug_line          6.35Ki   6.6%
  13.4%  5.23Ki .eh_frame            5.23Ki   5.4%
   0.0%       0 .debug_info          3.22Ki   3.3%
   0.0%       0 [ELF Headers]        1.62Ki   1.7%
   0.0%       0 .symtab              1.15Ki   1.2%
   0.0%       0 .debug_str           1.02Ki   1.1%
   1.7%     681 .rodata.str1.8          681   0.7%
   0.6%     250 .rodata.str1.1          250   0.3%
   0.0%       0 .strtab                 238   0.2%
   0.0%       0 .shstrtab               236   0.2%
   0.0%       0 .debug_abbrev           218   0.2%
   0.1%      32 [3 Others]               80   0.1%
   0.0%       0 [Unmapped]               50   0.1%
   0.0%       0 .debug_aranges           48   0.0%
   0.0%       0 .rela.debug_aranges      48   0.0%
   0.1%      48 .rodata.cst8             48   0.0%
   0.0%       0 .comment                 43   0.0%
 100.0%  38.9Ki TOTAL                96.4Ki 100.0%

$ ~/Programming/bloaty/bloaty nonlingeo.after.o -- nonlingeo.before.o
     VM SIZE                        FILE SIZE
 --------------                  --------------
  [ = ]       0 .rela.debug_info +6.05Ki   +32%
  [ = ]       0 .debug_loc       +1.61Ki   +19%
  [ = ]       0 [Unmapped]            +6   +14%
  [ = ]       0 .debug_line           +2  +0.0%
  [ = ]       0 .debug_abbrev         -3  -1.4%
  [ = ]       0 .debug_info         -192  -5.5%
  [ = ]       0 TOTAL            +7.48Ki  +8.4%
Comment 5 Jakub Jelinek 2020-04-06 17:07:48 UTC
Can be also reproduced with
void bar (int, int, int, int, int, int, int, int, int, int, int, int *);

int
foo (int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int k)
{
  int z[64];
  if (a > 37)
    bar (a, b, c, d, e, f, g, h, i, j, k, z);
  if (__builtin_expect (b, 0))
    bar (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0);
  return a + b + c;
}

The difference at -O2 -g -dA in the assembly is (when ignoring debug info):
        # DEBUG g => [argp]
        # DEBUG k => [argp+0x20]
        # DEBUG j => [argp+0x18]
        # DEBUG a => di
        # DEBUG b => si
        # DEBUG c => dx
        # DEBUG d => cx
        # DEBUG h => [argp+0x8]
        # DEBUG e => r8
        # DEBUG i => [argp+0x10]
        # DEBUG f => r9
...
 .LVL4:
+       # DEBUG h => [sp+0x10]
+       # DEBUG i => [sp+0x18]
+       # DEBUG j => [sp+0x20]
+       # DEBUG k => [sp+0x28]
        # DEBUG c => entry_value
 # SUCC: EXIT [always]  count:1073741824 (estimated locally)
        ret
 .LVL5:
+       # DEBUG k => [argp+0x20]
        # DEBUG a => bx
        # DEBUG b => si
        # DEBUG c => dx
        # DEBUG d => cx
        # DEBUG e => r8
        # DEBUG f => r9
+       # DEBUG h => [argp+0x8]
+       # DEBUG i => [argp+0x10]
+       # DEBUG j => [argp+0x18]
i.e. at the end of epilogue we don't find the argp equivalence for some reason, which means we unnecessarily use location list when it is something that can be expressed directly in the DIEs.  Will have a look tomorrow.
Comment 6 Jakub Jelinek 2020-04-08 15:25:33 UTC
Seems I forgot most of how var-tracking.c works :(

Before the return we have two pop instructions and the second one increments the stack pointer to the value it had at the start of the function.

For the pop, add_stores is called on loc (reg sp) and expr (set (reg sp) (plus (reg argp) (const_int -8))).
Now, before my cselib.c sp derived value changes, the cselib lookup of the sp value at that point was a fresh VALUE that wasn't really used by much, but with those changes cselib returns back the SP_DERIVED_VALUE_P which is used very often and has cfa_base_val - 8 as one of its locations.
Now, when processing the MO_VAL_SET created by that add_store, the VALUE is marked as changed (hey, we have a nice location for this VALUE - %rsp!) and everything that is related to that VALUE is marked as changed too and gets new notes emitted.
Except that the %rsp location isn't really a good location when we can express it as argp + constant, (where argp is the cfa value), because then it is something we express using DW_OP_fbreg and it can stay that way through the whole function.
So it isn't beneficial to change all VALUEs/decls that are related to that VALUE when it for a few instruction is live in the stack pointer.
I thought var-tracking has code to analyze if the cur_loc isn't usable anymore and only change cur_loc if it isn't usable, but it seems it doesn't; this SP_DERIVED_VALUE_P has cur_loc NULL all the way until the pop in the epilogue, before that we instead query cselib for the location and find that way the cfa_base_rtx + constant.
So, shall var-tracking itself (other than the vt_initialize phase that does that already) special case the cselib_sp_based_value_p VALUEs if they can be expressed as cfa_base_rtx or cfa_base_rtx + constant somehow and ignore any changes to them?  Or shall what vt_initialize calls special case those?

Alex, any insights on this?
Comment 7 Jakub Jelinek 2020-04-09 09:33:27 UTC
Created attachment 48246 [details]
gcc10-pr94495.patch

Untested fix.  This does two things during var-tracking.  One is try to reuse even more the SP_DERIVED_VALUE_P and VALUEs equated to that + CONST_INT in !frame_pointer_needed functions (ideally have just a single SP_DERIVED_VALUE_P) and through that make sure they can be all expressed using cfa_base_rtx or cfa_base_rtx + CONST_INT (and thus in DWARF using DW_OP_fbreg).
And the second change is that for VALUEs that can be expressed that way it throws away all MO_VAL_SETs, we have the best expression for it (DW_OP_fbreg) that is constant through the function, so it is not beneficial to express it in one part of the function using that, in another part of the function using the stack pointer, in yet another part using some other register.
Comment 8 GCC Commits 2020-04-09 19:21:46 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:33c45e51b4914008064d9b77f2c1fc0eea1ad060

commit r10-7665-g33c45e51b4914008064d9b77f2c1fc0eea1ad060
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Apr 9 21:21:24 2020 +0200

    cselib, var-tracking: Improve debug info after the cselib sp tracking changes [PR94495]
    
    On the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94495#c5
    testcase GCC emits worse debug info after the PR92264 cselib.c
    changes.
    The difference at -O2 -g -dA in the assembly is (when ignoring debug info):
            # DEBUG g => [argp]
            # DEBUG k => [argp+0x20]
            # DEBUG j => [argp+0x18]
            # DEBUG a => di
            # DEBUG b => si
            # DEBUG c => dx
            # DEBUG d => cx
            # DEBUG h => [argp+0x8]
            # DEBUG e => r8
            # DEBUG i => [argp+0x10]
            # DEBUG f => r9
    ...
     .LVL4:
    +       # DEBUG h => [sp+0x10]
    +       # DEBUG i => [sp+0x18]
    +       # DEBUG j => [sp+0x20]
    +       # DEBUG k => [sp+0x28]
            # DEBUG c => entry_value
     # SUCC: EXIT [always]  count:1073741824 (estimated locally)
            ret
     .LVL5:
    +       # DEBUG k => [argp+0x20]
            # DEBUG a => bx
            # DEBUG b => si
            # DEBUG c => dx
            # DEBUG d => cx
            # DEBUG e => r8
            # DEBUG f => r9
    +       # DEBUG h => [argp+0x8]
    +       # DEBUG i => [argp+0x10]
    +       # DEBUG j => [argp+0x18]
    This means that before the changes, h, i, j, k could be all expressed
    in DW_AT_location directly with DW_OP_fbreg <some_offset>, but now we need
    to use a location list, where in the first part of the function and last
    part of the function (everything except the ret instruction) we use that
    DW_OP_fbreg <some_offset>, but for the single ret instruction we instead
    say those values live in something pointed by stack pointer + offset.
    It is true, but only because stack pointer + offset is equal to DW_OP_fbreg
    <some_offset> at that point.
    
    The var-tracking pass has for !frame_pointer_needed functions code to
    canonicalize stack pointer uses in the insns before it hands it over
    to cselib to cfa_base_rtx + offset depending on the stack depth at each
    point.  The problem is that on the last epilogue pop insn (the one right
    before ret) the canonicalization is sp = argp - 8 and add_stores records
    a MO_VAL_SET operation for that argp - 8 value (which is the
    SP_DERIVED_VALUE_P VALUE the cselib changes canonicalize sp based accesses
    on) and thus var-tracking from that point onwards tracks that that VALUE
    (2:2) now lives in sp.  At the end of function it of course needs to forget
    it again (or it would need on any changes to sp).  But when processing
    that uop, we note that the VALUE has changed and anything based on it
    changed too, so emit changes for everything.  Before that var-tracking
    itself doesn't track it in any register, so uses cselib and cselib knows
    through the permanent equivs how to compute it using argp (i.e. what
    will be DW_OP_fbreg).
    
    The following fix has two parts.  One is it detects if cselib can compute
    a certain VALUE using the cfa_base_rtx and for such VALUEs doesn't add
    the MO_VAL_SET operation, as it is better to express them using cfa_base_rtx
    rather than temporarily through something else.  And the other is make sure
    we reuse in !frame_pointer_needed the single SP_DERIVED_VALUE_P VALUE in
    other extended basic blocks too (and other VALUEs) too.  This can be done
    because we have computed the stack depths at the start of each basic block
    in vt_stack_adjustments and while cselib_reset_table is called at the end
    of each extended bb, which throws away all hard registers (but the magic
    cfa_base_rtx) and so can hint cselib.c at the start of the ebb what VALUE
    the sp hard reg has.  That means fewer VALUEs during var-tracking and more
    importantly that they will all have the cfa_base_rtx + offset equivalency.
    
    I have performed 4 bootstraps+regtests (x86_64-linux and i686-linux,
    each with this patch (that is the new cselib + var-tracking variant) and
    once with that patch reverted as well as all other cselib.c changes from
    this month; once that bootstrapped, I've reapplied the cselib.c changes and
    this patch and rebuilt cc1plus, so that the content is comparable, but built
    with the pre-Apr 2 cselib.c+var-tracking.c (that is the old cselib one)).
    
    Below are readelf -WS cc1plus | grep debug_ filtered to only have debug
    sections whose size actually changed, followed by dwlocstat results on
    cc1plus.  This shows that there was about 3% shrink in those .debug*
    sections for 32-bit and 1% shrink for 64-bit, with minor variable coverage
    changes one or the other way that are IMHO insignificant.
    
    32-bit old cselib
      [33] .debug_info       PROGBITS        00000000 29139c0 710e5fa 00      0   0  1
      [34] .debug_abbrev     PROGBITS        00000000 9a21fba 21ad6d 00      0   0  1
      [35] .debug_line       PROGBITS        00000000 9c3cd27 1a05e56 00      0   0  1
      [36] .debug_str        PROGBITS        00000000 b642b7d 7cad09 01  MS  0   0  1
      [37] .debug_loc        PROGBITS        00000000 be0d886 5792627 00      0   0  1
      [38] .debug_ranges     PROGBITS        00000000 1159fead e57218 00      0   0  1
    sum 263075589B
    32-bit new cselib + var-tracking
      [33] .debug_info       PROGBITS        00000000 29129c0 71065d1 00      0   0  1
      [34] .debug_abbrev     PROGBITS        00000000 9a18f91 21af28 00      0   0  1
      [35] .debug_line       PROGBITS        00000000 9c33eb9 195dffc 00      0   0  1
      [36] .debug_str        PROGBITS        00000000 b591eb5 7cace0 01  MS  0   0  1
      [37] .debug_loc        PROGBITS        00000000 bd5cb95 50185bf 00      0   0  1
      [38] .debug_ranges     PROGBITS        00000000 10d75154 e57068 00      0   0  1
    sum 254515196B (8560393B smaller)
    64-bit old cselib
      [33] .debug_info       PROGBITS        0000000000000000 25e64b0 84d7cc9 00      0   0  1
      [34] .debug_abbrev     PROGBITS        0000000000000000 aabe179 225e2d 00      0   0  1
      [35] .debug_line       PROGBITS        0000000000000000 ace3fa6 19a3505 00      0   0  1
      [37] .debug_loc        PROGBITS        0000000000000000 ce6e960 89707bc 00      0   0  1
      [38] .debug_ranges     PROGBITS        0000000000000000 157df11c 1c59a70 00      0   0  1
    sum 342274599B
    64-bit new cselib + var-tracking
      [33] .debug_info       PROGBITS        0000000000000000 25e64b0 84d8e86 00      0   0  1
      [34] .debug_abbrev     PROGBITS        0000000000000000 aabf336 225e8d 00      0   0  1
      [35] .debug_line       PROGBITS        0000000000000000 ace51c3 199ded5 00      0   0  1
      [37] .debug_loc        PROGBITS        0000000000000000 ce6a54d 85f62da 00      0   0  1
      [38] .debug_ranges     PROGBITS        0000000000000000 15460827 1c59a20 00      0   0  1
    sum 338610402B (3664197B smaller)
    32-bit old cselib
    cov%    samples cumul
    0..10   1231599/48%     1231599/48%
    11..20  31017/1%        1262616/49%
    21..30  36495/1%        1299111/51%
    31..40  35846/1%        1334957/52%
    41..50  47179/1%        1382136/54%
    51..60  41203/1%        1423339/56%
    61..70  65504/2%        1488843/58%
    71..80  59656/2%        1548499/61%
    81..90  104399/4%       1652898/65%
    91..100 882231/34%      2535129/100%
    32-bit new cselib + var-tracking
    cov%    samples cumul
    0..10   1230542/48%     1230542/48%
    11..20  30385/1%        1260927/49%
    21..30  36393/1%        1297320/51%
    31..40  36053/1%        1333373/52%
    41..50  47670/1%        1381043/54%
    51..60  41599/1%        1422642/56%
    61..70  65902/2%        1488544/58%
    71..80  59911/2%        1548455/61%
    81..90  104607/4%       1653062/65%
    91..100 882067/34%      2535129/100%
    64-bit old cselib
    cov%    samples cumul
    0..10   1233211/48%     1233211/48%
    11..20  31120/1%        1264331/49%
    21..30  39230/1%        1303561/51%
    31..40  38887/1%        1342448/52%
    41..50  47519/1%        1389967/54%
    51..60  45264/1%        1435231/56%
    61..70  69431/2%        1504662/59%
    71..80  62114/2%        1566776/61%
    81..90  104587/4%       1671363/65%
    91..100 876085/34%      2547448/100%
    64-bit new cselib + var-tracking
    cov%    samples cumul
    0..10   1233471/48%     1233471/48%
    11..20  31093/1%        1264564/49%
    21..30  39217/1%        1303781/51%
    31..40  38851/1%        1342632/52%
    41..50  47488/1%        1390120/54%
    51..60  45224/1%        1435344/56%
    61..70  69409/2%        1504753/59%
    71..80  62140/2%        1566893/61%
    81..90  104616/4%       1671509/65%
    91..100 875939/34%      2547448/100%
    
    2020-04-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR debug/94495
            * cselib.h (cselib_record_sp_cfa_base_equiv,
            cselib_sp_derived_value_p): Declare.
            * cselib.c (cselib_record_sp_cfa_base_equiv,
            cselib_sp_derived_value_p): New functions.
            * var-tracking.c (add_stores): Don't record MO_VAL_SET for
            cselib_sp_derived_value_p values.
            (vt_initialize): Call cselib_record_sp_cfa_base_equiv at the
            start of extended basic blocks other than the first one
            for !frame_pointer_needed functions.
Comment 9 Andreas Schwab 2020-04-10 08:46:42 UTC
This breaks aarch64 -mabi=ilp32.

during RTL pass: vartrack
In file included from ../../../../../../libstdc++-v3/src/c++98/pool_allocator.cc:31:
/opt/gcc/gcc-20200410/Build/aarch64-suse-linux/ilp32/libstdc++-v3/include/ext/pool_allocator.h: In member function '_Tp* __gnu_cxx::__pool_alloc<_Tp>::allocate(__gnu_cxx::__pool_alloc<_Tp>::size_type, const void*) [with _Tp = wchar_t]':
/opt/gcc/gcc-20200410/Build/aarch64-suse-linux/ilp32/libstdc++-v3/include/ext/pool_allocator.h:262:5: internal compiler error: in vt_expand_var_loc_chain, at var-tracking.c:8355
Comment 10 Martin Liška 2020-04-10 10:27:02 UTC
(In reply to Andreas Schwab from comment #9)
> This breaks aarch64 -mabi=ilp32.
> 
> during RTL pass: vartrack
> In file included from
> ../../../../../../libstdc++-v3/src/c++98/pool_allocator.cc:31:
> /opt/gcc/gcc-20200410/Build/aarch64-suse-linux/ilp32/libstdc++-v3/include/
> ext/pool_allocator.h: In member function '_Tp*
> __gnu_cxx::__pool_alloc<_Tp>::allocate(__gnu_cxx::__pool_alloc<_Tp>::
> size_type, const void*) [with _Tp = wchar_t]':
> /opt/gcc/gcc-20200410/Build/aarch64-suse-linux/ilp32/libstdc++-v3/include/
> ext/pool_allocator.h:262:5: internal compiler error: in
> vt_expand_var_loc_chain, at var-tracking.c:8355

Can you please provide a pre-processed source file?
Comment 11 Jakub Jelinek 2020-04-10 10:33:56 UTC
(In reply to Andreas Schwab from comment #9)
> This breaks aarch64 -mabi=ilp32.

Does https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543702.html fix that?
Comment 12 Andreas Schwab 2020-04-10 11:08:33 UTC
Yes, it does.
Comment 13 GCC Commits 2020-04-11 05:34:57 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a615ea71bc8fbf31b9bc71cb373a7ca5b9cca44a

commit r10-7685-ga615ea71bc8fbf31b9bc71cb373a7ca5b9cca44a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Apr 11 07:32:12 2020 +0200

    cselib: Mark the cselib_record_sp_cfa_base_equiv VALUE as preserved [PR94551]
    
    Sometimes the cselib_record_sp_cfa_base_equiv makes it into the var-tracking
    used VALUEs and needs to be preserved.
    
    2020-04-11  Jakub Jelinek  <jakub@redhat.com>
    
            PR debug/94495
            PR target/94551
            * cselib.c (cselib_record_sp_cfa_base_equiv): Set PRESERVED_VALUE_P on
            val->val_rtx.
Comment 14 Jakub Jelinek 2020-04-14 09:29:14 UTC
Should be fixed now.
Comment 15 Martin Liška 2020-04-15 05:41:08 UTC
Just for the record, I believe the same revision improved rapidly mcf benchmark in various configurations:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=232.347.0&plot.1=18.347.0&plot.2=287.347.0