This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Faster compilation speed
- From: "Timothy J. Wood" <tjw at omnigroup dot com>
- To: Mike Stump <mrs at apple dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 9 Aug 2002 14:58:58 -0700
- Subject: Re: Faster compilation speed
On Friday, August 9, 2002, at 12:17 PM, Mike Stump wrote:
I'd like to introduce lots of various changes to improve compiler
speed. I thought I should send out an email and see if others think
this would be good to have in the tree. Also, if it is, I'd like to
solicit any ideas others have for me to pursue. I'd be happy to do all
the hard work, if you come up with the ideas! The target is to be 6x
faster.
Go Mike!
In porting code from Windows to Mac OS X and having to live with the
speed difference between gcc and VC++, I can heartily approve these
efforts :)
My suggestions would be these (they get increasingly crazy .... :)
1) Go stand over the shoulders of the ProjectBuilder folks until they
enable parallel builds. My SMP machine only runs at about 110%
utilization right now when doing a build. It is NOT I/O bound since I
can start two builds in two different projects and max out both CPUs.
This should get you nearly a 2x speedup in perceived performance. Yeah,
only ~3x left to go! (even on uniprocessor machines you'll probably get
30% or so).
2) This one is rather crazy and would involve huge amounts of work
probably....
a) Toss some or all of your PFE code in the bin (yikes!)
b) Build a precompile server that the compiler can attach to and
request precompiled headers (give a path and set of -D flags or whatever
other state is needed to uniquely identify the precompile output).
Requests would be satisfied via shared memory (yes, non-portable, so
this whole mechanism will only work on modern machines).
c) Inside the server, keep parsed representations of all headers that
have been imported and the -D state used when parsing the headers. As
new headers are parsed, they should be able to **layer** on top of
existing parsed headers (so there should only be one parsed version of
std::string). This avoids the confining requirement that you have one
big master precompiled header.
d) Details about concurrency, security, locating the server, and so on
left as an exercise for the reader.
The main advantage here is that people would get fast compiles WITHOUT
having to tune their single PFE header. Additionally, more headers
would get precompiled than would otherwise, yielding faster builds. If
they layering is done correctly, the memory usage of the entire system
could be lower (since if you have two projects to build, both of which
import STL, there would be only one precompiled version of STL).
At the start of a build, a special 'check filesystem' command could be
sent to the server to have it do a one-time check of timestamps of
headers files. Assuming the timestamps haven't changed, the precompiled
headers could be kept across builds!
Naturally doing a 'clean' build from the IDE option would need to be
able to flush and probably shut down the server since it is inevitable
that there will be bugs that will corrupt the precomp database :(
#2 could really take many forms. The key idea is that having a single
PFE file is non-optimal. Developers should not have to spend time
tuning such a file to get the best compile time. The compiler and IDE
should handle all these details by default. Having the developer
involved here just leads to extra (ongoing!) work for the developer and
a sub-optimal set of precompiled headers.
Your goal should be to have the developer open their project and have
it build 6x faster (instead of requiring the developer to do a several
hours of tweaking on their PFE file to get the best performance -- and
then having to keep it up to date over the life of their project).
3) This is possibly even harder... Keep track of what facts in a header
each source file cared about (macro values defined or undefined,
structure layout, function signature, etc, etc, etc). If a header
changes, have the precompile server keep track of the facts that have
changed and then only rebuild source files that care about those changes
(assuming the source file itself hasn't compiled). This could get
really ugly since you'd potentially keep track of multiple fact
timestamps (consider if a build fails or is aborted so some files got
updated for the current state of a header and some didn't).
Extra bonus points for doing this on a lower granularity basis (i.e.,
don't recompile a function if it wouldn't produce different output).
This would clearly be very hard and a large departure from the current
state of affairs :)
Anyway, I think the biggest improvements lie in moving away from the
current batch compile philosophy mandated by the command line tools.
Instead, the command line tools should be a front end onto a much more
powerful persistent compile server.
(Hey, you asked for ideas and said it was OK if they were hard :)
-tim