This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Type representation in CTF and DWARF




On 10/11/2019 04:41 AM, Jakub Jelinek wrote:
On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
(coreutils-0.22)
       .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
ls   30616           |    1136           |    21098       | 26240               | 0.62
pwd  10734           |    788            |    10433       | 13929               | 0.83
groups 10706         |    811            |    10249       | 13378               | 0.80

(emacs-26.3)
       .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
emacs-26.3.1 674657  |    6402           |   273963       |   273910            | 0.33

I chose to account for 50% of .debug_str because at this point, it will be
unfair to not account for them. Actually, one could even argue that upto 70%
of the .debug_str are names of entities. CTF section sizes do include the CTF
string tables.

Across coreutils, I see a geomean of 0.73 (ratio of
.ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
"-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
footprint than CTF (with 50% of .debug_str accounted for).
I'm not convinced this "improvement" in size is worth maintainig another
debug-info format much less since it lacks desirable features right now
and thus evaluation is tricky.

At least you can improve dwarf size considerably with a low amount of work.

I suspect another factor where dwarf is bigger compared to CTF is that dwarf
is recording typedef names as well as qualified type variants.  But maybe
CTF just has a more compact representation for the bits it actually implements.
Does CTF record automatic variables in functions, or just global variables?
If only the latter, it would be fair to also disable addition of local
variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
not, it would be fair to not emit that either in .debug_info.
-gno-record-gcc-switches so that the compiler command line is not encoded in
the debug info (unless it is in CTF).

CTF includes file-scope and global-scope entities. So, CTF for a function
defined/declared at these scopes is available in .ctf section, even if it is
inlined.

To not generate DWARF for function-local entities, I made a tweak in the
gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
is FUNCTION_DECL.

@@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct vlr_context *ctx,
   if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
     return NULL;
+ /* Do not generate info for function local decl when -gdwarf-like-ctf is
+     enabled.  */
+  if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
+      && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
+    return NULL;
+
   switch (TREE_CODE (decl_or_origin))
     {
     case ERROR_MARK:


For the numbers in the email today:
1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
   generated binaries.
2. At this time, I wanted to account for .debug_str entities appropriately (not
   50% as done previously). Using a small script to count chars for
   accounting the "path-like" strings, specifically those strings that start
   with a ".", I gathered the data in column named D5.

(coreutils-0.22)
     .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
ls   14100           |    994            |    16945       | 1328              |   26240             | 0.85
pwd   6341           |    632            |     9311       |  596              |   13929             | 0.88
groups 6410          |    714            |     9218       |  667              |   13378             | 0.85
Average geomean across coreutils = 0.84

(emacs-26.3)
     .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
emacs-26.3.1 373678  |    3794           |   219048       |  3842             |     273910          | 0.46

DWARF is highly extensible format, what exactly is and is not emitted is
something that consumers can choose.
Yes, DWARF can be large, but mainly because it provides a lot of
information, the actual representation has been designed with size concerns
in mind and newer versions of the standard keep improving that too.

	Jakub

Yes.

I started out to provide some numbers around the size impact of CTF vs DWARF
as it was a legitimate curiosity many of us have had. Comparing Compactness or
feature matrices is only one dimension of evaluating the utility of supporting
CTF in the toolchain (including GCC; Bintuils and GDB have already accepted
initial CTF support). The other dimension is a user friendly workflow which
supports current users and eases further adoption and growth.

Indu


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]