This project is a continuation of the build GCC in C++ project. Its goal is to explore re-implementing some existing GCC components in C++.
2012-08-14: The branch has been merged into trunk. There is no code left in the branch for now. If major re-writes require the use of the branch, we will open it again.
2012-08-12: We have started the initial merge into trunk. The following patches have been sent:
Once the review process is finalized, they will be committed as a single revision.
If you have available cycles, please help us test the branch using the following commands:
$ svn co svn://gcc.gnu.org/gcc/branches/cxx-conversion $ mkdir bld && cd bld $ ../cxx-conversion/configure --enable-languages=all,go,ada $ make && make -k check
Of particular interest are systems whose host C++ compiler is other than g++.
What matters for GCC going forward is that it continue to be comprehensible and maintainable. That is a struggle that GCC has faced for its entire existence as a free software project. It is certainly true that using C++ unwisely can make that struggle more difficult. But this issue is not qualitatively different from the issues we face today.
Whether we use C or C++, we need to try to ensure that interfaces are easy to understand, that the code is reasonably modular, that the internal documentation corresponds to the code, that it is possible for new developers to write new passes and to fix bugs. Those are the important issues for us to consider. The C++ features which are not present in C -- features which are well documented in many books and many web sites -- are not an important issue.
For additional background information on this effort and its scope, please check out http://airs.com/ian/cxx-slides.pdf .
Migrating GCC to C++ as implementation language:
- C++ is a standardized, well known, popular language.
- C++ is nearly a superset of C90 used in GCC.
- The C subset of C++ is just as efficient as C.
- C++ supports cleaner code in several significant cases.
- C++ makes it easier to write and enforce cleaner interfaces.
- C++ never requires uglier code.
- C++ is not a panacea but it is an improvement.
Accessing the branch
The branch is accessible via SVN at ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion. It was also registered in GCC's Git mirror, see these instructions for details on how to access it.
Additionally, you can view the branch with your browser at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/.
This development branch follows the usual GCC maintainership rules.
It is maintained by Diego Novillo (email@example.com), who will do periodic merges from trunk.
Patches in the branch should be contributed to trunk following the usual contribution procedures.
Patches should follow the new C++ coding conventions.
The branch does not use ChangeLog files. All the changes should be described in their entirety in the SVN commit message. Each commit should contain the exact same content that you sent in the original patch submission: A description of the patch, how it works, why it works that way and all other relevant details.
All e-mail communication related to the branch should be tagged with [cxx-conversion] in the subject.
While conversion is essential, doing so in a manner that limits disruption is important. To that end, we suggest the following development strategy.
Before implementing a change, identify the benefit. Primarily we expect the benefit to be better code adaptability, code writability, or code readibility. However, improvements to memory use, compile time, and run time are feasible.
Prefer to follow the idioms and APIs of the C++ standard library when implementing new abstractions. This approach is most important for abstractions that have equivalents in the standard, but for which using the standard abstraction is undesirable. This approach preserves maximum flexibility in implementation.
Where reasonable, implement the change behind the existing APIs. For example, replace the bodies of an existing macro with the new implementation. Test this configuration for both correctness and performance. Send that change as a patch.
Change the uses of the old API to the new API in bite-size patches. A patch that changes every file is more disruptive than ten patches changing ten distinct sets of files.
We will serve no code before its time.
Modify the gcc build to build with C++
Test on a sufficient number of targets. We are tracking progress on the C++ Build Status wiki page.
Finish the C++ coding conventions
A draft of the document is at C++ coding conventions. The patch modifying the official GCC Coding Conventions is available at http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01207.html.
Convert gengtype to handle C++ structures
To handle containers and other data structures with no GTY markers, gengtype needs to allow user-provided marker and walker functions (see original discussion at http://gcc.gnu.org/ml/gcc/2010-06/msg00143.html).
To support this, gengtype will proceed as follows
- Support "unknown" types by assuming that marking routines are provided by the user.
- Generate calls to marking functions for user-provided types.
- For automatically-generated markers, instead of generating functions mangled with the type name, it will provide a single template function
template <typename T> void mark(const T *n);
- Generate a specialization of the above template for every known type.
- For user-provided types, the user is responsible for providing a suitable
instance of mark that processes the type and its fields.
Convert VEC to std::vector or some gcc-specific equivalent. This conversion will reduce the specification of types at VEC uses. It will also reduce the syntactic burden.
- Convert declaration, allocation, and deallocation.
- Convert element indexing.
- Convert size-changing operations.
- Convert to iterator-based loops.
Convert hash tables
Convert uses of htab_t to a type-safe template-based hash table. While std::tr1::unordered_map is technically workable, it is not part of some base compilers. See gold/gold.h in the GNU binutils for one approach. This conversion will reduce the specification of types at htab_t uses. It will also reduce the syntactic burden.
Convert numeric types
Convert numeric types, e.g. double_int, to C++ classes supporting all the normal operators. This conversion will reduce the specification of types at uses of these types. It will also reduce the syntactic burden by turning function calls into operators.
Convert calls to qsort into calls to std::sort, which typically leads to code which is larger but runs faster.
Add a scoped timevar
Add a scoped timevar to stop timers automatically. This change avoids repetitive stopping of counters before each return statement.
Convert tree_list to something else.
Convert accessor macros to use inline functions
This task is easier in C++ because C++ functions can return references. The primary benefit here is to make gcc implementations work with gdb.
- Source line information in messages still requires a macro.
Add skip commands to the prototypical .gdbinit file.
Convert the various hooks into classes with virtual functions
This conversion will enable easily interposing monitoring on existing hook implementations.
Convert l-value accessors to getters and setters
Long-term, this change will enable us to have get and set computed attributes, rather than stored attributes.
Memory Management Tasks
Convert to scoped deallocation
That is, add destructors to clean up indirect memory. This conversion will reduce peak memory consumption.
Quoting (1): "Longer term, we know that memory usage is an issue in GCC. In the old obstack days, we had a range of obstacks with different lifespans, so we could create RTL with a temporary lifetime which was given a longer lifetime when needed. We got away from that because we spent far too much time chasing bugs in which RTL should have been saved to a longer lifetime but wasn't. However, that model should permit us to run with significantly less memory, which would translate to less compile time. I think we might be able to do it by implementing a custom allocator, such as a pool allocator which permits allocating different sizes of memory, and never frees memory. Then the tree class could take an allocator as a template parameter. Then we would provide convertors which copied the tree class to a different allocation style. Then, for example, fold-const.c could use a temporary pool which lived only for the length of the call to fold. If it returned a new value, the convertor would force a copy out of the temporary pool. If this works out, we can use type safety to enforce memory discipline, use significantly less memory during compilation, and take a big step toward getting rid of the garbage collector."
Add operator new overloads for pool-specific datat types
Convert to a low-maintenance garbage collector
e.g. the Boehm Collector.
Tree Type Hierarchy Tasks
Directly represent the type hierarchy embodied by tree nodes
This representation leads to fewer bugs through increased fidelity in the representation and static typing.
Convert union tree_node and its members into a class hierarchy
The first stage of this conversion will not be virtual, so as to ensure that there is no change in representation size.
Create conversion functions
This will allow dynamically convert a pointer to a general type into a pointer to a specific type.
Change the parameters of functions to use tighter types
Change the parameters of functions to use the pointer types for the type of tree they handle. This change is essential to effective static type checking.
Convert attribute macros to postfix method calls
Convert function names with embedded type names to "verb overloads"
Change type un-safe "if is type" constructs
Change type un-safe "if is type" constructs to "if successfully converts to type" constructs. This change reduces the number of dynamic conversions, which aids efficiency.
Note: Some of these task were briefly suggested and remain unclear.
Shift little-used tree fields to auxillary tables
Trees are densely packed, but they are not necessarily information rich. In the PPH branch, we found that near half the pointers we were streaming were null, which says that many of the fields are unused. The shift would save memory.
Convert the above hierarchy to polymorphic classes
Move representation information that is currently explicitly stored into implicit information based on the virtual pointer.
Convert cgraph to C++ classes
Convert the optimization pass manager to C++
Convert the IR text dumper to C++