This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Precompiled Headers: A proposal

To: gcc at gcc dot gnu dot org
Subject: Precompiled Headers: A proposal
From: Neil Booth <neilb at earthling dot net>
Date: Fri, 8 Dec 2000 19:59:00 +0000
Cc: Zack Weinberg <zackw at Stanford dot EDU>

Proposal for initial implementation of Precompiled Headers for GCC
------------------------------------------------------------------

I intend to implement PCH along the lines detailed at KAI's website; I
think this is a clear description of an ideal PCH implementation:-

http://www.kai.com/C_plus_plus/v4.0/doc/UserGuide/precompiled-headers.html

In particular, I would support the command line options --pch,
--pch_headers, automatic PCH creation, and #pragma hdrstop.  In my
experience features similar to these are common across many commercial
compilers, and I think it is the optimal PCH implementation for GCC.

I prefer this implementation over jar files of system headers, and
similar methods that involve producing system-wide precompiled headers
on a per-header basis.  To me such methods suffer from the
disadvantage that, if they want to go any further than simply being a
already-tokenised header (and we most certainly do), they become
almost useless because they depend far too much on the state that
existed when they were created.  For example:

o compiler version
o macros definitions in force at the time of header entry
o command line options

In particular, I cannot see how you could reasonably cope with simply
reversing the order of

#include <stdio.h>
#include <ctype.h>

unless your idea of PCH was to merely be a list of all the tokens in
the file.

So, I envisage a regime where the user of precompiled headers
specifies a project-local PCH directory on the command line, and an
optional PCH extension.  The user would try and group his source files
so that each group tends to have a common base set of includes done in
the same order.  Members of a group would be free to add to this base
set; preferably doing it the same way as any other group members that
also want those additional headers.

For a patten of header files not seen before, integrated CPP would
create a new PCH file in the PCH directory.  Each PCH file would have
a "header" containing the state in which that PCH is valid - namely
the sequence of includes up to that point, the timestamps and
locations of the header files, the command line options, the compiler
version and creation time, etc.  Each header would also contain some
kind of checksum of the header information.

An additional index file would be maintained by integrated CPP in that
directory (and a command-line option to refresh it if it gets out of
sync).  There would be one index file per extension, to enable you to
use different extensions for different compiler versions, say.  The
index file would simply be a list of checksums and files, for fast
lookup.

For each compilation, integrated CPP would lex until the first
non-preprocessing token in the main file, looking up the header
files of any #include directives encountered along the way.  Just
after each #include directive, it would calculate a checksum of the
information to that point.  For example, if (modulo whitespace) your
file is:-

#include <a.h>  // Checksum1 here
#include <b.h>  // Checksum2 here
#include <c.h>  // Checksum3 here
int foo;

Upon encountering the "int" token, integrated CPP would look up
Checksum3, then Checksum2 and Checksum1 in the index file to see if a
potenetial PCH to that point exists.  If one matches, it would
mmap/read that file and do a full comparison of the header information
in case checksums match unluckily.

I have an initial implementation pretty much worked out in my mind; I
think I know what's needed and where CPP needs to go to achieve it.
This initial implementation would focus on the preprocessing stage
only; by containing:-

1) the header block
2) the macro / assertion / identifier hash table
3) the macro and assertion definitions
4) the post-preprocessing token stream (i.e. after conditional
compilation and macro expansion)

Integrated CPP would simply feed the token stream 4) to the front
ends.  I would hope this would give a noticeable performance
improvement by itself.  After we have a reliable implementation in
this form, we could then move to a second stage by

5) Not storing the token stream 4) above at all, but instead storing
the front-end state it implies, like trees and structure layout
information.  Later we could maybe even go further, down to RTX form.

Of course, this is where a real speed-up would be noticed,
particularly for C++-intensive code.  It would require me to provide
some kind of clean virtual interface where front-ends can store
information.  I imagine it would also entail a significant re-work of
front-ends to enable the work they do to become more modularised (this
is probably a good thing, anyway).

I hope to start on this early next year, probably around the time GCC
3.0 is released.  Before doing this, I want to finish off rough edges
on the existing integrated CPP, tidy up the header files, fix a couple
of bugs, and improve the dependency generator.  To implement PCH,
various bits of cpplib will need to be tweaked; mostly memory
management.

Comments?

Neil.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]