This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


On Friday, August 9, 2002, at 12:17  PM, Mike Stump wrote:

I'd like to introduce lots of various changes to improve compiler speed. I thought I should send out an email and see if others think this would be good to have in the tree. Also, if it is, I'd like to solicit any ideas others have for me to pursue. I'd be happy to do all the hard work, if you come up with the ideas! The target is to be 6x faster.
Go Mike!

In porting code from Windows to Mac OS X and having to live with the speed difference between gcc and VC++, I can heartily approve these efforts :)

My suggestions would be these (they get increasingly crazy .... :)

1) Go stand over the shoulders of the ProjectBuilder folks until they enable parallel builds. My SMP machine only runs at about 110% utilization right now when doing a build. It is NOT I/O bound since I can start two builds in two different projects and max out both CPUs. This should get you nearly a 2x speedup in perceived performance. Yeah, only ~3x left to go! (even on uniprocessor machines you'll probably get 30% or so).

2) This one is rather crazy and would involve huge amounts of work probably....

a) Toss some or all of your PFE code in the bin (yikes!)
b) Build a precompile server that the compiler can attach to and request precompiled headers (give a path and set of -D flags or whatever other state is needed to uniquely identify the precompile output). Requests would be satisfied via shared memory (yes, non-portable, so this whole mechanism will only work on modern machines).
c) Inside the server, keep parsed representations of all headers that have been imported and the -D state used when parsing the headers. As new headers are parsed, they should be able to **layer** on top of existing parsed headers (so there should only be one parsed version of std::string). This avoids the confining requirement that you have one big master precompiled header.
d) Details about concurrency, security, locating the server, and so on left as an exercise for the reader.

The main advantage here is that people would get fast compiles WITHOUT having to tune their single PFE header. Additionally, more headers would get precompiled than would otherwise, yielding faster builds. If they layering is done correctly, the memory usage of the entire system could be lower (since if you have two projects to build, both of which import STL, there would be only one precompiled version of STL).

At the start of a build, a special 'check filesystem' command could be sent to the server to have it do a one-time check of timestamps of headers files. Assuming the timestamps haven't changed, the precompiled headers could be kept across builds!

Naturally doing a 'clean' build from the IDE option would need to be able to flush and probably shut down the server since it is inevitable that there will be bugs that will corrupt the precomp database :(


#2 could really take many forms. The key idea is that having a single PFE file is non-optimal. Developers should not have to spend time tuning such a file to get the best compile time. The compiler and IDE should handle all these details by default. Having the developer involved here just leads to extra (ongoing!) work for the developer and a sub-optimal set of precompiled headers.

Your goal should be to have the developer open their project and have it build 6x faster (instead of requiring the developer to do a several hours of tweaking on their PFE file to get the best performance -- and then having to keep it up to date over the life of their project).

3) This is possibly even harder... Keep track of what facts in a header each source file cared about (macro values defined or undefined, structure layout, function signature, etc, etc, etc). If a header changes, have the precompile server keep track of the facts that have changed and then only rebuild source files that care about those changes (assuming the source file itself hasn't compiled). This could get really ugly since you'd potentially keep track of multiple fact timestamps (consider if a build fails or is aborted so some files got updated for the current state of a header and some didn't).

Extra bonus points for doing this on a lower granularity basis (i.e., don't recompile a function if it wouldn't produce different output). This would clearly be very hard and a large departure from the current state of affairs :)

Anyway, I think the biggest improvements lie in moving away from the current batch compile philosophy mandated by the command line tools. Instead, the command line tools should be a front end onto a much more powerful persistent compile server.


(Hey, you asked for ideas and said it was OK if they were hard :)

-tim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]