C++ Conversion

This project is a continuation of the build GCC in C++ project. Its goal is to explore re-implementing some existing GCC components in C++.

Documentation

Status

2012-08-14: The branch has been merged into trunk. There is no code left in the branch for now. If major re-writes require the use of the branch, we will open it again.

2012-08-12: We have started the initial merge into trunk. The following patches have been sent:

Once the review process is finalized, they will be committed as a single revision.

If you have available cycles, please help us test the branch using the following commands:

$ svn co svn://gcc.gnu.org/gcc/branches/cxx-conversion
$ mkdir bld && cd bld
$ ../cxx-conversion/configure --enable-languages=all,go,ada
$ make && make -k check

Of particular interest are systems whose host C++ compiler is other than g++.

Background

What matters for GCC going forward is that it continue to be comprehensible and maintainable. That is a struggle that GCC has faced for its entire existence as a free software project. It is certainly true that using C++ unwisely can make that struggle more difficult. But this issue is not qualitatively different from the issues we face today.

Whether we use C or C++, we need to try to ensure that interfaces are easy to understand, that the code is reasonably modular, that the internal documentation corresponds to the code, that it is possible for new developers to write new passes and to fix bugs. Those are the important issues for us to consider. The C++ features which are not present in C -- features which are well documented in many books and many web sites -- are not an important issue.

For additional background information on this effort and its scope, please check out http://airs.com/ian/cxx-slides.pdf .

Rationale

Migrating GCC to C++ as implementation language:

Accessing the branch

The branch is accessible via SVN at ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion. It was also registered in GCC's Git mirror, see these instructions for details on how to access it.

Additionally, you can view the branch with your browser at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/.

Contributing

This development branch follows the usual GCC maintainership rules.

It is maintained by Diego Novillo (dnovillo@google.com), who will do periodic merges from trunk.

Patches in the branch should be contributed to trunk following the usual contribution procedures.

Patches should follow the new C++ coding conventions.

The branch does not use ChangeLog files. All the changes should be described in their entirety in the SVN commit message. Each commit should contain the exact same content that you sent in the original patch submission: A description of the patch, how it works, why it works that way and all other relevant details.

All e-mail communication related to the branch should be tagged with [cxx-conversion] in the subject.

Development Strategy

While conversion is essential, doing so in a manner that limits disruption is important. To that end, we suggest the following development strategy.

Before implementing a change, identify the benefit. Primarily we expect the benefit to be better code adaptability, code writability, or code readibility. However, improvements to memory use, compile time, and run time are feasible.

Prefer to follow the idioms and APIs of the C++ standard library when implementing new abstractions. This approach is most important for abstractions that have equivalents in the standard, but for which using the standard abstraction is undesirable. This approach preserves maximum flexibility in implementation.

Where reasonable, implement the change behind the existing APIs. For example, replace the bodies of an existing macro with the new implementation. Test this configuration for both correctness and performance. Send that change as a patch.

Change the uses of the old API to the new API in bite-size patches. A patch that changes every file is more disruptive than ten patches changing ten distinct sets of files.

Conversion Tasks

We will serve no code before its time.

Prerequisite Tasks

Modify the gcc build to build with C++

Test on a sufficient number of targets. We are tracking progress on the C++ Build Status wiki page.

Finish the C++ coding conventions

A draft of the document is at C++ coding conventions. The patch modifying the official GCC Coding Conventions is available at http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01207.html.

Convert gengtype to handle C++ structures

To handle containers and other data structures with no GTY markers, gengtype needs to allow user-provided marker and walker functions (see original discussion at http://gcc.gnu.org/ml/gcc/2010-06/msg00143.html).

To support this, gengtype will proceed as follows

template <typename T>
void mark(const T *n);

Immediate Tasks

Convert VEC

Convert VEC to std::vector or some gcc-specific equivalent. This conversion will reduce the specification of types at VEC uses. It will also reduce the syntactic burden.

Convert hash tables

Convert uses of htab_t to a type-safe template-based hash table. While std::tr1::unordered_map is technically workable, it is not part of some base compilers. See gold/gold.h in the GNU binutils for one approach. This conversion will reduce the specification of types at htab_t uses. It will also reduce the syntactic burden.

Early Tasks

Convert numeric types

Convert numeric types, e.g. double_int, to C++ classes supporting all the normal operators. This conversion will reduce the specification of types at uses of these types. It will also reduce the syntactic burden by turning function calls into operators.

Convert qsort

Convert calls to qsort into calls to std::sort, which typically leads to code which is larger but runs faster.

Add a scoped timevar

Add a scoped timevar to stop timers automatically. This change avoids repetitive stopping of counters before each return statement.

Convert tree_list

Convert tree_list to something else.

Convert accessor macros to use inline functions

This task is easier in C++ because C++ functions can return references. The primary benefit here is to make gcc implementations work with gdb.

Mid-Term Tasks

Convert the various hooks into classes with virtual functions

This conversion will enable easily interposing monitoring on existing hook implementations.

Convert l-value accessors to getters and setters

Long-term, this change will enable us to have get and set computed attributes, rather than stored attributes.

Memory Management Tasks

Convert to scoped deallocation

That is, add destructors to clean up indirect memory. This conversion will reduce peak memory consumption.

Quoting (1): "Longer term, we know that memory usage is an issue in GCC. In the old obstack days, we had a range of obstacks with different lifespans, so we could create RTL with a temporary lifetime which was given a longer lifetime when needed. We got away from that because we spent far too much time chasing bugs in which RTL should have been saved to a longer lifetime but wasn't. However, that model should permit us to run with significantly less memory, which would translate to less compile time. I think we might be able to do it by implementing a custom allocator, such as a pool allocator which permits allocating different sizes of memory, and never frees memory. Then the tree class could take an allocator as a template parameter. Then we would provide convertors which copied the tree class to a different allocation style. Then, for example, fold-const.c could use a temporary pool which lived only for the length of the call to fold. If it returned a new value, the convertor would force a copy out of the temporary pool. If this works out, we can use type safety to enforce memory discipline, use significantly less memory during compilation, and take a big step toward getting rid of the garbage collector."

Add operator new overloads for pool-specific datat types

Convert to a low-maintenance garbage collector

e.g. the Boehm Collector.

Tree Type Hierarchy Tasks

Directly represent the type hierarchy embodied by tree nodes

This representation leads to fewer bugs through increased fidelity in the representation and static typing.

Convert union tree_node and its members into a class hierarchy

The first stage of this conversion will not be virtual, so as to ensure that there is no change in representation size.

Create conversion functions

This will allow dynamically convert a pointer to a general type into a pointer to a specific type.

Change the parameters of functions to use tighter types

Change the parameters of functions to use the pointer types for the type of tree they handle. This change is essential to effective static type checking.

Convert attribute macros to postfix method calls

Convert function names with embedded type names to "verb overloads"

Change type un-safe "if is type" constructs

Change type un-safe "if is type" constructs to "if successfully converts to type" constructs. This change reduces the number of dynamic conversions, which aids efficiency.

Other Tasks

Note: Some of these task were briefly suggested and remain unclear.

Shift little-used tree fields to auxillary tables

Trees are densely packed, but they are not necessarily information rich. In the PPH branch, we found that near half the pointers we were streaming were null, which says that many of the fields are unused. The shift would save memory.

Convert the above hierarchy to polymorphic classes

Move representation information that is currently explicitly stored into implicit information based on the virtual pointer.

Convert cgraph to C++ classes

Convert the optimization pass manager to C++

Convert the IR text dumper to C++

Convert the persistent IR dumper/reader

None: cxx-conversion (last edited 2012-12-11 14:02:39 by DiegoNovillo)