C++ Conversion

This project is a continuation of the build GCC in C++ project. Its goal is to explore re-implementing some existing GCC components in C++.

Background

What matters for GCC going forward is that it continue to be comprehensible and maintainable. That is a struggle that GCC has faced for its entire existence as a free software project. It is certainly true that using C++ unwisely can make that struggle more difficult. But this issue is not qualitatively different from the issues we face today.

Whether we use C or C++, we need to try to ensure that interfaces are easy to understand, that the code is reasonably modular, that the internal documentation corresponds to the code, that it is possible for new developers to write new passes and to fix bugs. Those are the important issues for us to consider. The C++ features which are not present in C -- features which are well documented in many books and many web sites -- are not an important issue.

For additional background information on this effort and its scope, please check out http://airs.com/ian/cxx-slides.pdf .

Rationale

Migrating GCC to C++ as implementation language:

Accessing the Branch

The branch is accessible via SVN at ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion. It was also registered in GCC's Git mirror, see these instructions for details on how to access it.

Additionally, you can view the branch with your browser at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/.

Contributing

This development branch follows the usual GCC maintainership rules.

It is maintained by Diego Novillo (dnovillo@google.com), who will do periodic merges from trunk.

Patches in the branch should be contributed to trunk following the usual contribution procedures.

Patches should follow the new C++ coding conventions. Note: As of 2012-04-11 these conventions are still in draft form. Before patches can be moved out of this branch into trunk, we need to wait until they are approved and installed in http://gcc.gnu.org/codingconventions.html.

ChangeLog entries should be written in the file ChangeLog.cxx-conversion in the corresponding directories.

All e-mail communication related to the branch, should be tagged with [cxx-conversion] in the subject.

Development Strategy

While conversion is essential, doing so in a manner that limits disruption is important. To that end, we suggest the following development strategy.

Before implementing a change, identify the benefit. Primarily we expect the benefit to be better code adaptability, code writability, or code readibility. However, improvements to memory use, compile time, and run time are feasible.

Prefer to follow the idioms and APIs of the C++ standard library when implementing new abstractions. This approach is most important for abstractions that have equivalents in the standard, but for which using the standard abstraction is undesirable. This approach preserves maximum flexibility in implementation.

Where reasonable, implement the change behind the existing APIs. For example, replace the bodies of an existing macro with the new implementation. Test this configuration for both correctness and performance. Send that change as a patch.

Change the uses of the old API to the new API in bite-size patches. A patch that changes every file is more disruptive than ten patches changing ten distinct sets of files.

Conversion Tasks

We will serve no code before its time.

Prerequisite Tasks

Modify the gcc build to build with C++ and test on a sufficient number of targets. We are tracking progress on the C++ Build Status wiki page.

Finish the C++ coding conventions and adopt them by moving them into the GCC Coding Conventions.

Work out the details of using STL containers with GC allocated objects. This means teaching gengtype how to generate code to traverse STL containers, which would then be used during GC. This is not a task for the faint-hearted. But see also Tom Tromey's hint.

Immediate Tasks

Convert VEC to std::vector or some gcc-specific equivalent. This conversion will reduce the specification of types at VEC uses. It will also reduce the syntactic burden.

Convert used of htab_t to a type-safe template-based hash table. While std::tr1::unordered_map is technically workable, it is not part of some base compilers. See gold/gold.h in the GNU binutils for one approach. This conversion will reduce the specification of types at htab_t uses. It will also reduce the syntactic burden.

Early Tasks

Convert numeric types, e.g. double_int, to C++ classes supporting all the normal operators. This conversion will reduce the specification of types at uses of these types. It will also reduce the syntactic burden by turning function calls into operators.

Convert calls to qsort into calls to std::sort, which typically leads to code which is larger but runs faster.

Add a scoped timevar to stop timers automatically. This change avoids repetitive stopping of counters before each return statement.

Convert tree_list to something else.

Convert accessor macros to use inline functions. This task is easier in C++ because C++ functions can return references. The primary benefit here is to make gcc implementations work with gdb.

Mid-Term Tasks

Convert the various hooks into classes with virtual functions. This conversion will enable easily interposing monitoring on existing hook implementations.

Memory Management Tasks

Convert to scoped deallocation. That is, add destructors to clean up indirect memory. This conversion will reduce peak memory consumption.

Quoting (1): "Longer term, we know that memory usage is an issue in GCC. In the old obstack days, we had a range of obstacks with different lifespans, so we could create RTL with a temporary lifetime which was given a longer lifetime when needed. We got away from that because we spent far too much time chasing bugs in which RTL should have been saved to a longer lifetime but wasn't. However, that model should permit us to run with significantly less memory, which would translate to less compile time. I think we might be able to do it by implementing a custom allocator, such as a pool allocator which permits allocating different sizes of memory, and never frees memory. Then the tree class could take an allocator as a template parameter. Then we would provide convertors which copied the tree class to a different allocation style. Then, for example, fold-const.c could use a temporary pool which lived only for the length of the call to fold. If it returned a new value, the convertor would force a copy out of the temporary pool. If this works out, we can use type safety to enforce memory discipline, use significantly less memory during compilation, and take a big step toward getting rid of the garbage collector."

Add operator new overloads for pool-specific datat types.

Convert to a low-maintenance garbage collector, e.g. the Boehm Collector.

Tree Type Hierarchy Tasks

Directly represent the type hierarchy embodied by tree nodes. This representation leads to fewer bugs through increased fidelity in the representation and static typing.

Convert union tree_node and its members into a class hierarchy. The first stage of this conversion will not be virtual, so as to ensure that there is no change in representation size.

Create conversion functions to dynamically convert a pointer to a general type into a pointer to a specific type.

Change the parameters of functions to use the pointer types for the type of tree they handle. This change is essential to effective static type checking.

Convert attribute macros to postfix method calls.

Convert function names with embedded type names to "verb overloads".

Change type un-safe "if is type" constructs to "if successfully converts to type" constructs. This change reduces the number of dynamic conversions, which aids efficiency.

Other Tasks

Note: Some of these task were briefly suggested and remain unclear.

Shift little-used tree fields to auxillary tables. Trees are densly packed, but they are not necessarily information rich. In the PPH branch, we found that near half the pointers we were streaming were null, which says that many of the fields are unused. The shift would save memory.

Convert the above hierarchy to polymorphic classes. Move representation information that is currently explicitly stored into implicit information based on the virtual pointer.

Convert cgraph to C++ classes.

Convert the optimization pass manager to C++.

Convert the IR text dumper to C++.

Convert the persistent IR dumper/reader.