Differences between revisions 3 and 4
Revision 3 as of 2012-04-11 18:29:18
Size: 5131
Comment: fix typo
Revision 4 as of 2012-04-13 00:30:14
Size: 9651
Comment: Adding tasks. More headings. Internal formatting change.
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
This project is a continuation to the [[http://gcc.gnu.org/wiki/gcc-in-cxx|build GCC in C++]] project.  Its goal is to
explore re-implementing some existing GCC components in C++.

'''Background:''' What matters for GCC going forward is that it continue to be comprehensible and maintainable. That is a struggle that GCC has faced for its entire existence as a free software project. It is certainly true that using C++ unwisely can make that struggle more difficult.  But this issue is not qualitatively different from the issues we face today.

Whether we use C or C++, we need to try to ensure that interfaces are easy to understand, that the code is reasonably modular, that the internal documentation corresponds to the code, that it is possible for new developers to write new passes and to fix bugs.  Those are the important issues for us to consider.  The C++ features which are not present in C--features which are well documented in many books and many web sites--are not an important issue.

For additional background information on this effort and its scope, please check out http://airs.com/ian/cxx-slides.pdf
This project is a continuation of the
[[http://gcc.gnu.org/wiki/gcc-in-cxx|build GCC in C++]] project.
Its goal is to explore re-implementing some existing GCC components in C++.

== Background ==

What matters for GCC going forward is that
it continue to be comprehensible and maintainable.
That is a struggle that GCC has faced
for its entire existence as a free software project.
It is certainly true that using C++ unwisely
can make that struggle more difficult.
But this issue is not qualitatively different from the issues we face today.

Whether we use C or C++,
we need to try to ensure
that interfaces are easy to understand,
that the code is reasonably modular,
that the internal documentation corresponds to the code,
that it is possible for new developers to write new passes and to fix bugs.
Those are the important issues for us to consider.
The C++ features which are not present in C
-- features which are well documented in many books and many web sites --
are not an important issue.

For additional background information on this effort and its scope,
please check out http://airs.com/ian/cxx-slides.pdf .
Line 16: Line 34:
Line 22: Line 41:
 * C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries.  * C++ makes it easier to write cleaner interfaces
  
by making it harder to break interface boundaries.
Line 29: Line 49:
 * It is maintained by Diego Novillo (dnovillo@google.com), who will do periodic merges from trunk.
 * Patches in the branch should be contributed to trunk following the usual contribution procedures.
 * Patches should follow the new [[http://gcc.gnu.org/wiki/CppConventions|C++ coding conventions]]. '''Note''': As of 2012-04-11 these conventions are still in draft form. Before patches can be moved out of this branch into trunk, we need to wait until they are approved and installed in http://gcc.gnu.org/codingconventions.html.
 * {{{ChangeLog}}} entries should be written in the file {{{ChangeLog.cxx-conversion}}} in the corresponding directories.
 * All e-mail communication related to the branch, should be tagged with {{{[cxx-conversion]}}} in the subject.

== Accessing the branch ==

The branch is accessible via SVN at {{{ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion}}}. It was also registered in GCC's Git mirror, see [[http://gcc.gnu.org/wiki/GitMirror|these instructions]] for details on how to access it.

Additionally, you can view the branch with your browser at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/


== Starting Points ==

 * Convert VEC to {{{std::vector}}} is a good starting point. This is the interface in {{{vec.h}}}.

 * Another easy starting point would be converting uses of {{{htab_t}}} to type safe C++ hash tables, e.g., {{{std::tr1::unordered_map}}}. Here portability suggests the ability to switch to different hash table implementations; see {{{gold/gold.h}}} in the GNU binutils for one way to approach that.

 * Another easy starting point is finding calls to {{{qsort}}} and converting them to {{{std::sort}}}, which typically leads to code which is larger but runs faster.

 * Work out the details of using STL containers with GC allocated objects. This means teaching [[gengtype]] how to generate code to traverse STL containers, which would then be used during GC. This is not a task for the faint-hearted. But see also [[http://gcc.gnu.org/ml/gcc/2010-06/msg00143.html|here]] Tom Tromey's hint.

Quoting ([[http://gcc.gnu.org/ml/gcc/2009-06/msg00630.html|1]]):"Longer term, we know that memory usage is an issue in GCC. In the old obstack days, we had a range of obstacks with different lifespans, so we could create RTL with a temporary lifetime which was given a longer lifetime when needed. We got away from that because we spent far too much time chasing bugs in which RTL should have been saved to a longer
lifetime but wasn't. However, that model should permit us to run with significantly less memory, which would translate to less compile time.I think we might be able to do it by implementing a custom allocator, such as a pool allocator which permits allocating different sizes of memory, and never frees memory. Then the tree class could take an allocator as a template parameter. Then we would provide convertors which copied the tree class to a different allocation style. Then, for example, fold-const.c could use a temporary pool which lived only for the length of the call to fold. If it returned a new value, the convertor would force a copy out of the temporary pool. If this works
out, we can use type safety to enforce memory discipline, use significantly less memory during compilation, and take a big step toward getting rid of the garbage collector."
 * It is maintained by Diego Novillo (dnovillo@google.com),
   who will do periodic merges from trunk.
 * Patches in the branch should be contributed to trunk
   following the usual contribution procedures.
 * Patches should follow the new
   [[http://gcc.gnu.org/wiki/CppConventions|C++ coding conventions]].
   '''Note''': As of 2012-04-11 these conventions are still in draft form.
   Before patches can be moved out of this branch into trunk,
   we need to wait until they are approved
   and installed in http://gcc.gnu.org/codingconventions.html.
 * {{{ChangeLog}}} entries should be written
   in the file {{{ChangeLog.cxx-conversion}}} in the corresponding directories.
 * All e-mail communication related to the branch,
   should be tagged with {{{[cxx-conversion]}}} in the subject.

== Accessing the Branch ==

The branch is accessible via SVN
at {{{ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion}}}.
It was also registered in GCC's Git mirror,
see [[http://gcc.gnu.org/wiki/GitMirror|these instructions]]
for details on how to access it.

Additionally, you can view the branch with your browser
at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/.

== Development Strategy ==

While conversion is essential,
doing so in a manner that limits disruption is important.
To that end, we suggest the following development strategy.

Before implementing a change, identify the benefit.
Primarily we expect the benefit to be better
code adaptability, code writability, or code readibility.
However, improvements to memory use, compile time, and run time
are feasible.

Prefer to follow the idioms and APIs of the C++ standard library
when implementing new abstractions.
This approach is most important for abstractions
that have equivalents in the standard,
but for which using the standard abstraction is undesirable.
This approach preserves maximum flexibility in implementation.

Where reasonable, implement the change behind the existing APIs.
For example, replace the bodies of an existing macro
with the new implementation.
Test this configuration for both correctness and performance.
Send that change as a patch.

Change the uses of the old API to the new API in bite-size patches.
A patch that changes every file
is more disruptive than ten patches changing ten distinct sets of files.

== Conversion Tasks ==

We will serve no code before its time.

=== Prerequisite Tasks ===

Modify the gcc build to build with C++
and test on a sufficient number of targets.
We are tracking progress on the
[[http://gcc.gnu.org/wiki/CppBuildStatus|C++ Build Status]] wiki page.

Finish the [[http://gcc.gnu.org/wiki/CppConventions|C++ coding conventions]]
and adopt them by moving them into the
[[http://gcc.gnu.org/codingconventions.html|GCC Coding Conventions]].

Work out the details of using STL containers with GC allocated objects.
This means teaching [[gengtype]]
how to generate code to traverse STL containers,
which would then be used during GC.
This is not a task for the faint-hearted.
But see also
[[http://gcc.gnu.org/ml/gcc/2010-06/msg00143.html|Tom Tromey's hint]].

=== Immediate Tasks ===

Convert {{{VEC}}} to {{{std::vector}}} or some gcc-specific equivalent.
This conversion will reduce the specification of types at {{{VEC}}} uses.
It will also reduce the syntactic burden.

 * Convert declaration, allocation, and deallocation.
 * Convert element indexing.
 * Convert size-changing operations.
 * Convert iteration loops.

Convert used of {{{htab_t}}} to a type-safe template-based hash table.
While {{{std::tr1::unordered_map}}} is technically workable,
it is not part of some base compilers.
See {{{gold/gold.h}}} in the GNU binutils for one approach.
This conversion will reduce the specification of types at {{{htab_t}}} uses.
It will also reduce the syntactic burden.

=== Early Tasks ===

Convert numeric types, e.g. {{{double_int}}},
to C++ classes supporting all the normal operators.
This conversion will reduce the specification of types at uses of these types.
It will also reduce the syntactic burden
by turning function calls into operators.

Convert calls to {{{qsort}}} into calls to {{{std::sort}}},
which typically leads to code which is larger but runs faster.

Add a scoped {{{timevar}} to stop timers automatically.
This change avoids repetitive stopping of counters
before each return statement.

Convert {{{tree_list}}} to
[[http://gcc.gnu.org/wiki/ImprovementProjects#Remove_TREE_LIST|something else]].

Convert accessor macros to use inline functions.
This task is easier in C++ because C++ functions can return references.
The primary benefit here is to make gcc implementations work with gdb.

 * Adding source line information to messages
   still requires some form of macro.
 * Add corresponding gdb {{{skip}}} commands
   to the prototypical {{{.gdbinit}}} file.

=== Mid-Term Tasks ===

Convert the various hooks into classes with virtual functions.
This conversion would enable easily interposing monitoring
on existing hook implementations.

Convert

Quoting ([[http://gcc.gnu.org/ml/gcc/2009-06/msg00630.html|1]]):
"Longer term, we know that memory usage is an issue in GCC.
In the old obstack days, we had a range of obstacks with different lifespans,
so we could create RTL with a temporary lifetime
which was given a longer lifetime when needed.
We got away from that because
we spent far too much time chasing bugs
in which RTL should have been saved to a longer lifetime but wasn't.
However, that model should permit us to run with significantly less memory,
which would translate to less compile time.
I think we might be able to do it by implementing a custom allocator,
such as a pool allocator which permits allocating different sizes of memory,
and never frees memory.
Then the tree class could take an allocator as a template parameter.
Then we would provide convertors
which copied the tree class to a different allocation style.
Then, for example, {{{fold-const.c}}} could use a temporary pool
which lived only for the length of the call to fold.
If it returned a new value,
the convertor would force a copy out of the temporary pool.
If this works out,
we can use type safety to enforce memory discipline,
use significantly less memory during compilation,
and take a big step toward getting rid of the garbage collector."

=== Longer-Term Tasks ===

Directly represent the type hierarchy embodied by tree nodes.
This representation leads to fewer bugs
through increased fidelity in the representation
and static typing.

 * Convert {{{union tree_node}}} and its members into a class hierarchy.
   The first stage of this conversion would not be virtual,
   so as to ensure that there is no change in representation size.

 * Create conversion functions
   to dynamically convert a pointer to a general type
   into a pointer to a specific type.

 * Change the parameters of functions
   to use the pointer types for the type of tree they handle.
   This change is essential to effective static type checking.

 * Change type un-safe "if is type" constructs
   to "if successfully converts to type" constructs.
   This change reduces the number of dynamic conversions,
   which aids efficiency.

Add destructors to clean up indirect memory.

Shift little-used tree fields to auxillary tables.
Trees are densly packed, but they are not necessarily information rich.
In the PPH branch,
we found that near half the pointers we were streaming were null,
which says that many of the fields are unused.
The shift would save memory.

Convert the above hierarchy to polymorphic classes.
Move representation information that is currently explicitly stored
into implicit information based on the virtual pointer.

C++ Conversion

This project is a continuation of the build GCC in C++ project. Its goal is to explore re-implementing some existing GCC components in C++.

Background

What matters for GCC going forward is that it continue to be comprehensible and maintainable. That is a struggle that GCC has faced for its entire existence as a free software project. It is certainly true that using C++ unwisely can make that struggle more difficult. But this issue is not qualitatively different from the issues we face today.

Whether we use C or C++, we need to try to ensure that interfaces are easy to understand, that the code is reasonably modular, that the internal documentation corresponds to the code, that it is possible for new developers to write new passes and to fix bugs. Those are the important issues for us to consider. The C++ features which are not present in C -- features which are well documented in many books and many web sites -- are not an important issue.

For additional background information on this effort and its scope, please check out http://airs.com/ian/cxx-slides.pdf .

Rationale

Migrating GCC to C++ as implementation language:

  • C++ is a standardized, well known, popular language.
  • C++ is nearly a superset of C90 used in GCC.
  • The C subset of C++ is just as efficient as C.
  • C++ supports cleaner code in several significant cases.
  • C++ makes it easier to write cleaner interfaces
    • by making it harder to break interface boundaries.
  • C++ never requires uglier code.
  • C++ is not a panacea but it is an improvement.

Contributing

  • This development branch follows the usual GCC maintainership rules.
  • It is maintained by Diego Novillo (dnovillo@google.com),

    • who will do periodic merges from trunk.
  • Patches in the branch should be contributed to trunk
    • following the usual contribution procedures.
  • Patches should follow the new
  • ChangeLog entries should be written

    • in the file ChangeLog.cxx-conversion in the corresponding directories.

  • All e-mail communication related to the branch,
    • should be tagged with [cxx-conversion] in the subject.

Accessing the Branch

The branch is accessible via SVN at ssh://gcc.gnu.org/svn/gcc/branches/cxx-conversion. It was also registered in GCC's Git mirror, see these instructions for details on how to access it.

Additionally, you can view the branch with your browser at http://gcc.gnu.org/viewcvs/branches/cxx-conversion/.

Development Strategy

While conversion is essential, doing so in a manner that limits disruption is important. To that end, we suggest the following development strategy.

Before implementing a change, identify the benefit. Primarily we expect the benefit to be better code adaptability, code writability, or code readibility. However, improvements to memory use, compile time, and run time are feasible.

Prefer to follow the idioms and APIs of the C++ standard library when implementing new abstractions. This approach is most important for abstractions that have equivalents in the standard, but for which using the standard abstraction is undesirable. This approach preserves maximum flexibility in implementation.

Where reasonable, implement the change behind the existing APIs. For example, replace the bodies of an existing macro with the new implementation. Test this configuration for both correctness and performance. Send that change as a patch.

Change the uses of the old API to the new API in bite-size patches. A patch that changes every file is more disruptive than ten patches changing ten distinct sets of files.

Conversion Tasks

We will serve no code before its time.

Prerequisite Tasks

Modify the gcc build to build with C++ and test on a sufficient number of targets. We are tracking progress on the C++ Build Status wiki page.

Finish the C++ coding conventions and adopt them by moving them into the GCC Coding Conventions.

Work out the details of using STL containers with GC allocated objects. This means teaching gengtype how to generate code to traverse STL containers, which would then be used during GC. This is not a task for the faint-hearted. But see also Tom Tromey's hint.

Immediate Tasks

Convert VEC to std::vector or some gcc-specific equivalent. This conversion will reduce the specification of types at VEC uses. It will also reduce the syntactic burden.

  • Convert declaration, allocation, and deallocation.
  • Convert element indexing.
  • Convert size-changing operations.
  • Convert iteration loops.

Convert used of htab_t to a type-safe template-based hash table. While std::tr1::unordered_map is technically workable, it is not part of some base compilers. See gold/gold.h in the GNU binutils for one approach. This conversion will reduce the specification of types at htab_t uses. It will also reduce the syntactic burden.

Early Tasks

Convert numeric types, e.g. double_int, to C++ classes supporting all the normal operators. This conversion will reduce the specification of types at uses of these types. It will also reduce the syntactic burden by turning function calls into operators.

Convert calls to qsort into calls to std::sort, which typically leads to code which is larger but runs faster.

Add a scoped {timevar to stop timers automatically. This change avoids repetitive stopping of counters before each return statement.

Convert tree_list to something else.

Convert accessor macros to use inline functions. This task is easier in C++ because C++ functions can return references. The primary benefit here is to make gcc implementations work with gdb.

  • Adding source line information to messages
    • still requires some form of macro.
  • Add corresponding gdb skip commands

    • to the prototypical .gdbinit file.

Mid-Term Tasks

Convert the various hooks into classes with virtual functions. This conversion would enable easily interposing monitoring on existing hook implementations.

Convert

Quoting (1): "Longer term, we know that memory usage is an issue in GCC. In the old obstack days, we had a range of obstacks with different lifespans, so we could create RTL with a temporary lifetime which was given a longer lifetime when needed. We got away from that because we spent far too much time chasing bugs in which RTL should have been saved to a longer lifetime but wasn't. However, that model should permit us to run with significantly less memory, which would translate to less compile time. I think we might be able to do it by implementing a custom allocator, such as a pool allocator which permits allocating different sizes of memory, and never frees memory. Then the tree class could take an allocator as a template parameter. Then we would provide convertors which copied the tree class to a different allocation style. Then, for example, fold-const.c could use a temporary pool which lived only for the length of the call to fold. If it returned a new value, the convertor would force a copy out of the temporary pool. If this works out, we can use type safety to enforce memory discipline, use significantly less memory during compilation, and take a big step toward getting rid of the garbage collector."

Longer-Term Tasks

Directly represent the type hierarchy embodied by tree nodes. This representation leads to fewer bugs through increased fidelity in the representation and static typing.

  • Convert union tree_node and its members into a class hierarchy.

    • The first stage of this conversion would not be virtual, so as to ensure that there is no change in representation size.
  • Create conversion functions
    • to dynamically convert a pointer to a general type into a pointer to a specific type.
  • Change the parameters of functions
    • to use the pointer types for the type of tree they handle. This change is essential to effective static type checking.
  • Change type un-safe "if is type" constructs
    • to "if successfully converts to type" constructs. This change reduces the number of dynamic conversions, which aids efficiency.

Add destructors to clean up indirect memory.

Shift little-used tree fields to auxillary tables. Trees are densly packed, but they are not necessarily information rich. In the PPH branch, we found that near half the pointers we were streaming were null, which says that many of the fields are unused. The shift would save memory.

Convert the above hierarchy to polymorphic classes. Move representation information that is currently explicitly stored into implicit information based on the virtual pointer.

None: cxx-conversion (last edited 2012-12-11 14:02:39 by DiegoNovillo)