Type representation in CTF and DWARF

Fri Oct 25 03:43:00 GMT 2019


On 10/11/2019 04:41 AM, Jakub Jelinek wrote:
> On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
>>> (coreutils-0.22)
>>>        .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>>> ls   30616           |    1136           |    21098       | 26240               | 0.62
>>> pwd  10734           |    788            |    10433       | 13929               | 0.83
>>> groups 10706         |    811            |    10249       | 13378               | 0.80
>>>
>>> (emacs-26.3)
>>>        .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>>> emacs-26.3.1 674657  |    6402           |   273963       |   273910            | 0.33
>>>
>>> I chose to account for 50% of .debug_str because at this point, it will be
>>> unfair to not account for them. Actually, one could even argue that upto 70%
>>> of the .debug_str are names of entities. CTF section sizes do include the CTF
>>> string tables.
>>>
>>> Across coreutils, I see a geomean of 0.73 (ratio of
>>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
>>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
>>> footprint than CTF (with 50% of .debug_str accounted for).
>> I'm not convinced this "improvement" in size is worth maintainig another
>> debug-info format much less since it lacks desirable features right now
>> and thus evaluation is tricky.
>>
>> At least you can improve dwarf size considerably with a low amount of work.
>>
>> I suspect another factor where dwarf is bigger compared to CTF is that dwarf
>> is recording typedef names as well as qualified type variants.  But maybe
>> CTF just has a more compact representation for the bits it actually implements.
> Does CTF record automatic variables in functions, or just global variables?
> If only the latter, it would be fair to also disable addition of local
> variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
> not, it would be fair to not emit that either in .debug_info.
> -gno-record-gcc-switches so that the compiler command line is not encoded in
> the debug info (unless it is in CTF).

CTF includes file-scope and global-scope entities. So, CTF for a function
defined/declared at these scopes is available in .ctf section, even if it is
inlined.

To not generate DWARF for function-local entities, I made a tweak in the
gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
is FUNCTION_DECL.

@@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct vlr_context *ctx,
    if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
      return NULL;
  
+  /* Do not generate info for function local decl when -gdwarf-like-ctf is
+     enabled.  */
+  if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
+      && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
+    return NULL;
+
    switch (TREE_CODE (decl_or_origin))
      {
      case ERROR_MARK:


For the numbers in the email today:
1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
    generated binaries.
2. At this time, I wanted to account for .debug_str entities appropriately (not
    50% as done previously). Using a small script to count chars for
    accounting the "path-like" strings, specifically those strings that start
    with a ".", I gathered the data in column named D5.

(coreutils-0.22)
      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
ls   14100           |    994            |    16945       | 1328              |   26240             | 0.85
pwd   6341           |    632            |     9311       |  596              |   13929             | 0.88
groups 6410          |    714            |     9218       |  667              |   13378             | 0.85
Average geomean across coreutils = 0.84

(emacs-26.3)
      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
emacs-26.3.1 373678  |    3794           |   219048       |  3842             |     273910          | 0.46

> DWARF is highly extensible format, what exactly is and is not emitted is
> something that consumers can choose.
> Yes, DWARF can be large, but mainly because it provides a lot of
> information, the actual representation has been designed with size concerns
> in mind and newer versions of the standard keep improving that too.
>
> 	Jakub

Yes.

I started out to provide some numbers around the size impact of CTF vs DWARF
as it was a legitimate curiosity many of us have had. Comparing Compactness or
feature matrices is only one dimension of evaluating the utility of supporting
CTF in the toolchain (including GCC; Bintuils and GDB have already accepted
initial CTF support). The other dimension is a user friendly workflow which
supports current users and eases further adoption and growth.

Indu