An application for Google Summer of Code 2006, GCC project
Garbage collection tuning
The run-time performance of GCC and its memory usage is widely recognized as a problematic area. Times of garbage collection consistently appear on the top of GCC profiles. As a side effect of its rapid advances in the recent years, GCC has undergone several changes in memory management infrastructure and ended up with the current garbage collector "ggc-page" as an interim measure. The goal of this project is to investigate, what is a suitable garbage collection method for GCC and implement it.
Implemented suitable GC scheme will result in consistent and significant GCC speedup. This improvement is very important for GCC end-users. Moreover, implementation of such GC will result in improvements from GCC maintenance point of view. First of all, one of the two currently present GCC garbage collectors will be removed. Additionally, better internal code self-documentation will result from segregating allocated objects according to their lifetime.
- A tuned garbage collector, integrated with the rest of GCC. Since nature of this project is exploratory, a final delivered collector will depend on the outcome of several investigations and how far I will be able to proceed with implementation. The details are outlined in the plan below.
- Complete removal of one of the two currently present garbage collectors.
- An expanded and elaborated chapter "Memory Management and Type Information" in GCC Internals manual.
- Investigate reuse of external GC, namely Boehm's GC. Issues to be considered:
- Boehm's GC should be portable to all supported GCC host platforms. If some of the hosts do not support all of the features required for GC, graceful degradation should be present.
- Boehm's GC should allow for integration with GCC precompiled header support. Most probably it would require factoring out PCH support from current GCC garbage collectors. It is likely, that this point will result in rejection of Boehm's GC.
- Boehm's GC should support run-time registration of additional roots. Current GCC memory management is not exclusively GC and Boehm's GC knows nothing about roots living in dynamic memory that was not allocated by it.
- Boehm's GC should provide means to specify the desired allocation zone for the object, in a sense that objects with similar known lifetimes should go to the same zone.
- Boehm's GC should outperform ggc-zone by a significant (about 10%) margin.
This investigation will be done by quick prototyping of <code>ggc-boehm.c</code> with little attention to build machinery details. The point here is to find out as soon as possible if Boehm's GC is suitable for GCC or not.
- If Boehms's GC turns out to be suitable for GCC, integrate it.
- Integrate Boehm's GC build system. Since Boehm's GC does not have autotools-style build system for all platforms, write the missing bits for GCC host platforms. As Boehm's GC is already used in libgcj runtime, it can be reasonably expected that there will be few such platforms.
- Integrate PCH, as decided by outcome of 1.2.
Write proper <code>ggc-boehm.c</code>, throw away most of the <code>ggc.h</code> and <code>ggc-*.c</code>.
- If Boehms's GC turns out to be unsuitable: work on existing zone collector.
Work around Cygwin's <code>mmap()</code> deficiencies so that current ggc-zone could be enabled on Cygwin host.
- Remove ggc-page collector.
- Investigate possibility of performing collection on allocation calls (i.e. more often) in order to remove at least some of the explicit ggc_collect() calls.
- Develop a copying collector.
- Find a solution to register root pointers that are residing in local variables. There are several options:
- Do not use local variables for holding such pointers -- this option might harm modularity in GCC.
- From the Boehm's GC investigation results decide if stack walking is viable option for finding root pointers in local variables, if yes, implement it. This is a preferred option because no modifications are required all over GCC.
- There are several places in GCC where copying GC can be called without considering root pointers in local variables. It is possible to use copying GC in such places and fall back to non-copying GC everywhere else.
Implement a fall-back in a case system does not support <code>mmap()</code> or <code>mprotect()</code>.
- Develop a generational collector.
- Preliminary design is to have 3 generations. Objects would be promoted from the 1st to 2nd generation after surviving a single collection, from 2nd to 3rd after surviving additional few (more than one) collections. Object with a known lifetime will be put straight into 2nd or 3rd zone.
The project goals, desirability for GCC, its plan have been discussed on the GCC mailing list, where they received support from GCC maintainers. The discussion can be found at thread starting at http://gcc.gnu.org/ml/gcc/2006-05/msg00041.html. Additionally I've already done some initial technical investigation into Boehm's GC and current zone collector. The results were reported to the same thread.
Note that steps 3-5 are incremental. Successful implementation of any of these is a measureable milestone and a direct benefit for GCC, valuable even if I don't complete the whole plan on time.
GCC team will be asked to provide SVN branch for development. This would help to keep development public and merge accepted patches into mainline when it is in stage 1. Most likely this merge will be possible to do after the end of Summer of Code and I'm commiting to perform it then. Testing will be carried out on ix86 and possibly on amd-x64 and sparc platforms. Need for additional testing will be discussed on GCC development mailing list as testing progresses. Documentation will be written on GCC Wiki during whole duration of the project and put into GCC Internals manual when it stabilizes. All design issues and decisions will discussed on that list and developed code will conform to GCC coding standards.
I believe that I'm a suitable person to carry out this project. As a long-time GCC user on an old machine I'm well too aware of the fact that GCC needs to be faster. I've done some GCC development before: I was involved with DJGPP port, contributed several bugfixes outside that. So I already know GCC codebase and development tools involved. I found compiler topics fascinating at my undergraduate CS studies, so I took all the available compiler courses, designed and implemented my own toy compiler and wrote my undergraduate thesis about loop optimizations.
I was born in 1983 in Lithuania. I was involved with the DJGPP project from my late high school years up to 2nd university year. In 2005 I've finished Computer Science undergraduate studies at Vilnius University, Lithuania. At present I'm employed at and study in at Aalborg University, Denmark, studying for master's degree in Data Engineering. After DJGPP, my involvement in free software has moved to local issues: LaTeX localization for Lithuanian language, Lithuanian localization of Gaim. Additionaly I have about two year of industrial experience, having worked as a software engineer on several enterprise software products.