This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Pre-compiled headers
- To: per at bothner dot com, zack at wolery dot cumb dot org
- Subject: Re: Pre-compiled headers
- From: Mike Stump <mrs at windriver dot com>
- Date: Thu, 13 Jan 2000 16:47:10 -0800 (PST)
- Cc: gcc at gcc dot gnu dot org
> From: Per Bothner <per@bothner.com>
> Date: 12 Jan 2000 10:36:19 -0800
> #2a: Instead of just tokenizing each header file, we store
> a pre-compiled version that actually contains tree nodes
> in the binary format of the host system.
I did this once with gcc. The idea is simple, attach a named database
to the compiler for a given a.out, you then slurp up stuff into the
persistent database (trees, rtl, global variables, the symbol table so
on). You can do semantic checks across files, inline across files,
you can check to see if the md5 of the input stream matches a saved
cache entry, and when it does, replay the results of seeing the
source. One can introduce finer resolution to the databases, and be
able to replay chunks based upon small things (individual function
definitions).
The idea is, in a large company that uses gcc for developement, quite
a few cycles are spent seeing the same source code over and over
again, and if you have a 13 gig cache for the compiler to play with,
you can store enough in there so that a typical edit run debug cycle
will fly (90% of the functions are the same, with the same answer as
last time), the one function that was changed, the compiler sees, and
recompiles. The rest come from the cache in a snarf and barf fashion.
In C++, it isn't uncommon to have 90k lines of dense header stuff that
takes massive amounts of time (templates and whatnot), and 30 lines of
program. My goal was to have the first 89,970 lines transform into a
mmap, snarf, md5 check and barf. You then take that state and then
run in the 30 lines that remain and compile them. This should yeild a
3 order of magnitude improvement in compile times (20% faster compile
time, isn't interesting). This style of technology also eliminates
the exponential nature of C++ include files, thus making C++ realistic
in large scale software.
I don't see that this scheme is any harder to implement than the
simplistic schemes of other methods. And in fact I had a prototype
that could do the mmap state saving and loading), and do semantic
checking and inlining across translation unit without any special
effort to make them work, in about a week as I recall. The one
drawback as additional code would need to be added to the compiler to
handle seeing things twice (int i; extern int i). This work was based
in spirit (no code) upon a much old feature in G++ where one could
unexec the compiler after seeing all the header fodder, and then run
the unexec compiler, skipping the header stuff... That feature has
long since been removed from the compiler (#pragma unexec or #pragma
undump or something like that).
Anyway, I could write more if you feel like pursuing this approach.