First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

Thu Aug 31 03:34:00 GMT 2006

Kenneth Zadeck wrote:

> Even if we decide that we are going to process all of the functions in
> one file at one time, we still have to have access to the functions that
> are going to be inlined into the function being compiled.  Getting at
> those functions that are going to be inlined is where the double the i/o
> arguement comes from. 

I understand -- but it's natural to expect that those functions will be 
clumped together.  In a gigantic program, I expect there are going to be 
clumps of tightly connected object files, with relatively few 
connections between the clumps.  So, you're likely to get good cache 
behavior for any per-object-file specific data that you need to access.

> I have never depended on the kindness of strangers or the virtues of
> virtual memory.  I fear the size of the virtual memory when we go to
> compile really large programs. 

I don't think we're going to blow out a 64-bit address space any time 
soon.  Disks are big, but they are nowhere near *that* big, so it's 
going to be pretty hard for anyone to hand us that many .o files.  And, 
there's no point manually reading/writing stuff (as opposed to mapping 
it into memory), unless we actually run out of address space.

In fact, if you're going to design your own encoding formats, I would 
consider a format with self-relative pointers (or, offsets from some 
fixed base) that you could just map into memory.  It wouldn't be as 
compact as using compression, so the total number of bytes written when 
generating the object files would be bigger.  But, it will be very quick 
to load it into memory.

I guess my overriding concern is that we're focusing heavily on the data 
format here (DWARF?  Something else?  Memory-mappable?  What compression 
scheme?) and we may not have enough data.  I guess we just have to pick 
something and run with it.  I think we should try to keep that code as 
as separate as possible so that we can recover easily if whatever we 
pick turns out to be (another) bad choice. :-)

> One of the comments that was made by a person on the dwarf committee is
> that the abbrev tables really can be used for compression.  If you have
> information that is really common to a bunch of records, you can build
> an abbrev entry with the common info in it. 

Yes.  I was a little bit surprised that you don't seem to have seen much 
commonality.  If you recorded most of the tree flags, and treated them 
as DWARF attributes, I'd expect you would see relatively many 
expressions of a fixed form.  Like, there must be a lot of PLUS_EXPRs 
with TREE_USED set on them.  But, I gather that you're trying to avoid 
recording some of these flags, hoping either that (a) they won't be 
needed, or (b) you can recreate them when reading the file.  I think 
both (a) and (b) hold in many cases, so I think it's reasonable to 
assume we're writing out very few attributes.

> I had a discussion on chat today with drow and he indicated that you
> were busily adding all of the missing stuff here.

"All" is an overstatement. :-) Sandra is busily adding missing stuff and 
I'll be working on the new APIs you need.

> I told him that I
> thought this was fine as long as there is not a temporal drift in
> information encoded for the types and decls between the time I write my
> stuff and when the types and decls are written.

I'm not sure what this means.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713