This is the mail archive of the
mailing list for the GCC project.
Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!
- From: Mark Mitchell <mark at codesourcery dot com>
- To: Kenneth Zadeck <zadeck at naturalbridge dot com>
- Cc: GCC <gcc at gcc dot gnu dot org>, "Berlin, Daniel" <dberlin at dberlin dot org>, "Hubicha, Jan" <jh at suse dot cz>, "Novillo, Diego" <dnovillo at redhat dot com>, Ian Lance Taylor <ian at airs dot com>, "Edelsohn, David" <dje at watson dot ibm dot com>
- Date: Wed, 30 Aug 2006 16:44:45 -0500
- Subject: Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!
- References: <44F2F642.email@example.com>
Kenneth Zadeck wrote:
This posting is a progress report of my task of encoding and decoding
the GIMPLE stream into LTO. Included in this posting is a patch that
encodes functions and dumps the result to files.
[I'm sorry for not replying to this sooner. I've been on a plane or in
a meeting virtually all of my waking hours since you wrote this...]
2) To have a discussion about the use of DWARF3. I am now against the
use of DWARF3 for encoding the GIMPLE.
As the person who suggested this to you, I'm sorry it doesn't seem to
meet your needs. I'll make a few more specific comments below, but, to
echo what I said at the time: I think it was a good idea to try it, but
if it's not the right tool for the job, then so be it. My opinion has
always been that, for the bodies of functions, DWARF is just an
encoding: if it's a decent encoding, then, it's nice that it's a
standard; if it's a bad encoding, then, by all means, let's not use it!
I do think DWARF is a good choice for the non-executable information,
for the same reasons I did initially:
(a) for debugging, we need to generate most of that information anyhow,
so we're piggy-backing on existing code -- and not making object files
bigger by encoding the same information twice in the case of "-O2 -g",
which is the default way that many GNU applications are built.
(b) it's a standard, and we already have tools for reading DWARF, so it
saves the trouble of coming up with a new encoding,
(c) because bugs in the DWARF emission may not result in problems at
LTO, we'll be validating our LTO information every time we use GDB, and,
similarly, improving the GDB experience every time we fix an LTO bug in
I understand that some of these benefits do not apply to the executable
code, and that, even to the extent they may apply, the tradeoffs are
different. The comments I've made below about specific issues should
therefore be considered as academic responses, not an attempt to argue
the decision you have made.
3) To get some one to tell me what option we are going to add to the
compiler to tell it to write this information.
I think a reasonable spelling is probably "-flto". It should not be a
-m option since it is not machine-specific. I don't think it should be
a -O option either, since writing out LTO information isn't really
supposed to affect optimization per se.
2) The code is, by design, fragile. It takes nothing for granted.
Every case statement has gcc_unreachable as it's default case.
That's the same way I've approached the DWARF reading code, and for the
same reason. I think that's exactly the right decision.
1) ABBREV TABLES ARE BAD FOR LTO.
However, this mechanism is only self descriptive if you do not extend
the set of tags. That is not an option for LTO.
Definitely true. When we talked on the phone, we talked about creating
a tag corresponding to each GIMPLE tree code. However, you could also
create a numeric attribute giving the GIMPLE tree code. If you did
that, you might find that the abbreviation table became extremely small
-- because almost all interior nodes would be DW_TAG_GIMPLE_EXPR nodes.
The downside, of course, is that the storage required to store the
nodes would be greater, as it would now contain the expression code
(e.g., PLUS_EXPR), rather than having a DW_TAG_GIMPLE_PLUS_EXPR.
I strongly believe that for LTO to work, we are going to have to
implement some mechanism where the function bodies are loaded into the
compiler on demand (possibly kept in cache, but maybe not).
will be more cumbersome if we have to keep reloading each object
file's abbrev table just to tear apart a single function in that .o
file. While the abbrev sections average slightly less than %2 of the
of the size of the GIMPLE encoding for an entire file, each abbrev table
averages about the same size as a single function.
(Implied, but not stated, in your mail is the fact that the abbreviation
table cannot be indexed directly. If it could be, then you wouldn't
have to read the entire abbreviation table for each function; you would
just read the referenced abbreviations. Because the abbreviation table
records are of variable length, it is indeed true that you cannot make
random accesses to the table. So, this paragraph is just fleshing out
I think the conclusion that you reach (that the size of the tables is a
problem) depends on how you expect the compiler to process functions at
link-time. My expectation was that you would form a global control-flow
graph for the entire program (by reading CFG data encoded in each .o
file), eliminate unreachable functions, and then inline/optimize
If you sort the function-reading so that you prefer to read functions
from the same object file in order, then I would expect that you would
considerably reduce the impact of reading the abbreviation tables. I'm
making the assumption that it f calls N functions, then they probably
come from < N object files. I have no data to back up that assumption.
(There is nothing that says that you can only have one abbreviation
table for all functions. You can equally well have one abbreviation
table per function. In that mode, you trade space (more abbreviation
tables, and the same abbreviation appearing in multiple tables) against
the fact that you now only need to read the abbreviation tables you
need. I'm not claiming this is a good idea.)
I don't find this particular argument (that the abbreviation tables will
double file I/O) very convincing. I don't think it's likely that the
problem we're going to have with LTO is running out of *virtual* memory,
especially as 64-bit hardware becomes nearly universal. The problem is
going to be running out of physical memory, and thereby paging like
crazy, running out of D-cache. So, I'd assume you'd just read the
tables as-needed, and never both discarding them. As long as there is
reasonable locality of reference to abbreviation tables (i.e., you can
arrange to hit object files in groups), then the cost here doesn't seem
like it would be very big.
2) I PROMISED TO USE THE DWARF3 STACK MACHINE AND I DID NOT.
I never imagined you doing this; as per above, I always expected that
you would use DWARF tags for the expression nodes. I agree that the
stack-machine is ill-suited.
3) THERE IS NO COMPRESSION IN DWARF3.
In 1 file per mode, zlib -9 compression is almost 6:1. In 1 function
per mode, zlib -9 compression averages about 3:1.
In my opinion, if you considered DWARF + zlib to be satisfactory, then I
think that would be fine. For LTO, we're allowed to do whatever we
want. I feel the same about your confession that you invented a new
record form; if DWARF + extensions is a suitable format, that's fine.
In other words, in principle, using a somewhat non-standard variant of
DWARF for LTO doesn't seem evil to me -- if that met our needs.
2) LOCAL DECLARATIONS
Mark was going to do all of the types and all of the declarations.
His plan was to use the existing DWARF3 and enhance it where it was
necessary eventually replacing the GCC type trees with direct
references to the DWARF3 symbol table.
> The types and global variables are likely OK, or at least Mark
should be able to add any missing info.
Yes, I agree that if you're not using DWARF for the function bodies, you
probably want your own encoding for the local variables.
We will also need to add other structures to the object files. We
will need to have a version of the cgraph, in a separate section, that
is in a form so that all of the cgraphs from all of the object files
can be read a processed without looking at the actual function bodies.
function only calls other pure functions and so on... If we simply
label the call graph with the locally pure and locally constant
attributes, the closure phase can be done for all of the functions in
the LTO compilation without having to reprocess their bodies.
Virtually all inteprocedural optimizations, including aliasing, can
and must be structured this way.
You could also label the function declarations. There's a decision to
make here as to whether the nodes of the call graph are the same as the
DWARF nodes for the functions themselves, or are instead separate
entities (which, of course, point to those DWARF nodes). It would be
nice, a priori, to have this information in the DWARF nodes because it
would allow the debugger to show this information to users and to view
it via DWARF readers. However, I can also imagine that it needs to be
in the separate call graph.
I have not done this because I do not rule the earth. That was not
what I was assigned to do, and I agreed that DWARF3 sounded like a
reasonable way to go. Now that I understand the details of DWARF3, I
have changed my mind about the correct direction. Now is the time to
make that change before there is a lot of infrastructure built that
assumes the DWARF3 encoding.
I think it's great that you're asking for feedback. My only feedback is
that you may not need to make this decision *now*. We could conceivably
wire this up, work on the other things (CFG, etc.) and return to the
encoding issue. I'm vaguely in favor of that plan, just in that I'm
eager to actually see us make something work. On the other hand,
building up DWARF reading for this code only to chuck it later does seem
wasteful. But, the DWARF reader is already there; it's mostly filling
in some blanks. But, filling in blanks is always harder than one
expects. So, I think this should really be your call: rework the format
now, or later, as you think best.
(650) 331-3385 x713