GCC Improvement Projects
This page lists projects related to the re-organization of the code base in accordance with GCC's Architectural Goals. Everyone with wiki access is welcome to add new projects to this page.
Please observe the following conventions:
The projects listed here are exclusively geared towards improving GCC's code base.
Add new entries into one of the major categories. Feel free to define a new major category if the project does not fit any of the existing ones. If all else fails, use the Miscellaneous category, but please try not to abuse it.
- Each entry in this page is a link to a separate page dealing with that specific project.
- Some projects are self-contained enough that they can be described here, but if you find yourself writing more than a few paragraphs or lists, please move the project to a separate page and link it from here.
Contents
-
GCC Improvement Projects
- Modularity
- Front Ends
-
Middle End
- Gimple Front End
- Middle End Array Expressions
- Make C undefined overflow semantics explicit in the IL
- Remove TREE_LIST
- Compress DECL flags
- Tuplify gimple operands: types and decls
- Replace ad-hoc flexible arrays with VEC()
- Stop abusing GCC_VERSION
- Make initial GIMPLE independent of any -f, -m and -O options
- Back End
- Build System
- Testsuite
- Development Tools
- Documentation
- Compile Time
- Miscellaneous
Modularity
Transition to C++
New template-based API for vectors
Unification of debugging dumps
Simplify GIMPLE generation
Alternatives to GC
Make GCC more modular
Front Ends
Make "convert" a langhook
- convert is a legacy magic-name langhook not present in the langhooks structure. Uses in the language-independent compiler should be changed to fold_convert unless they really need language-specific semantics. Once no longer called there, the prototype should move from tree.h to the front ends. It would also be appropriate to eliminate cases of multiple front ends defining the same function and the various cases where a langhook is used but the default is for a particular name (rather than a particular default implementation) to be used for that langhook.
- Probably much of convert.c should move into c-family code (it does checks for invalid conversions and gives errors for them, which is clearly something that belongs in front ends). Non-C-family front ends defining and using their own "convert" functions may well not need semantics from them that fold_convert lacks. Residual generic conversion logic should give ICEs not errors if asked to do a conversion it doesn't know how to do.
Move FE optimizations to middle-end
Various front-end optimizations should move to middle-end code; in general, optimization in the front ends should be kept to a minimum where the same can reasonably be accomplished in language-independent code. Some specific comments on shorten_compare are at http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01308.html.
- For issues around "extended types that behave much like integer and floating-point types", especially for C++, see PR 43622, and the references therein.
(C++FE) Make access-specifier an enumeration
- Currently special tree-nodes denote private/public/protected [as well as TREE_PUBLIC etc]. It would be good to replace with an enumeration. A side effect of such a change is memory use reduction. (Suggested by Nathan Sidwell)
(C++FE) Compact tree structures
- The C++ FE's additional tree structures and extensions layout neatly on a 32-bit host. But have extraneous alignment padding on a 64-bit host. It would be good to make the layout 64-bit friendly. Making the access-specifier an enum would be a good first step. (Suggested by Nathan Sidwell)
Middle End
Gimple Front End
Middle End Array Expressions
Make C undefined overflow semantics explicit in the IL
Remove TREE_LIST
- TREE_LIST should die. TREE_LIST is the part of static typing of trees most accessible to incremental conversion, although identifiers may also be one of the earlier steps.
- More generally, TREE_CHAIN should die. Containers should be used instead.
Compress DECL flags
tree-core defines a number of bit flags (DECL_IS_MALLOC, DECL_IS_OPERATOR_NEW, DECL_CONSTRUCTOR, DECL_STATIC_CONSTRUCTOR, etc) that are mutually exclusive. It would be better to use some kind of enumeration, rather than individual flags. (We've run out of bits). (Suggested by Richard Biener & Nathan Sidwell)
Tuplify gimple operands: types and decls
Replace ad-hoc flexible arrays with VEC()
Stop abusing GCC_VERSION
Scattering GCC_VERSION conditionals across the source tree (a few places also have GNUC conditionals) is bad style. It would be better to define inline functions, or macros describing if a language feature is supported, in one place (with appropriate conditionals in their definitions) and use them everywhere. Suitable places for these definitions include system.h and hwint.h.
Make initial GIMPLE independent of any -f, -m and -O options
Back End
general backend cleanup
Gimple Back End
OpenMP Support
Integer overflow and saturation
See PR 48580 for discussion of possible C-source-level interfaces. See a paper of Bik, Girkar, Grey and Tian <http://saluc.engr.uconn.edu/refs/compiler/bik02idioms.pdf> regarding how to detect saturating operations. See <http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00846.html> regarding lowering of fixed-point operations to generic types.
- -ftrapv (broken), -fwrapv and -fstrict-overflow relate only to source code, GENERIC and GIMPLE semantics, and do not affect RTL which is always modulo. In future such options should also stop affecting GENERIC and GIMPLE semantics (all semantics should go in the IR, not in global option state); see the no-undefined-overflow branch. PR 30484 discusses the question of semantics for division and modulo operations for INT_MIN and -1.
Profiling options
See http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00937.html regarding issues with profiling options and their effects.
Build System
Top Level Libgcc Migration
Automatic Makefile dependency generation
Toplevel configuration and build system
- libiberty should not be installed unless specifically requested by configure options.
- The config-ml.in special handling of particular targets and configure options for them seems ill-conceived, since the right way to configure multilibs is for the relevant configure options to affect the MULTILIB_* settings used when building GCC, not for config-ml.in to have ad hoc code looking at configure options. (Some of this support is deprecated in GCC 4.6; the rest should be reimplemented inside the gcc/ directory.)
- Toplevel handles unsupported_languages in a suboptimal way. What it should mean is that the languages don't get enabled by default (or by "all" in --enable-languages) but can still be enabled by specifying them manually in --enable-languages - whereas at present it forces the language to be disabled even if the user enables it explicitly.
Macros describing where code in GCC is built
- There are far too many defines used to condition target code (USED_FOR_TARGET, IN_LIBGCC2, IN_TARGET_LIBS, IN_RTS) in one place or another, plus IN_GCC which "distinguishes between code compiled into GCC itself and other programs built during a bootstrap" according to the makefile comment (but which does nothing of the sort - it's used for other programs such as gcov, and for generator programs, and for target code), plus GENERATOR_FILE which actually has a meaningful use.
- This set of defines should be cut down - it should be possible to have just USED_FOR_TARGET and GENERATOR_FILE, plus IN_CONFIGURE_TEST or similar to deal with the IN_GCC conditional in system.h.
- Note that IN_GCC is used in ansidecl.h. Everything in ansidecl.h dealing with compatibility with pre-ISO C should be considered obsolete and removed after removing all uses in the GCC and src repositories; that will allow removing the IN_GCC conditionals. Despite the comments on ansidecl.h claiming to be from the GNU C Library, the glibc copy was removed in 1997 so it's purely a libiberty header now.
Testsuite
Run vectorizer tests multiple times
- We should work out how to get the various vectorizer testsuites to run multiple times, with each vector ISA variant that's available on the target architecture (so you'd test SSE; 128-bit AVX; 256-bit AVX; and maybe other variants - each variant tested with execution testing if there's hardware support, compile testing otherwise), like the torture testsuites run each test multiple times with different options. Though that certainly complicates all the effective target tests for vectorization support, since the results may depend on the options as well as the target.
Canonicalize test case names
- The set of testcase names - the things after "PASS: " or "FAIL: " or other statuses - should not depend on the results of the tests.
Implement a unit test framework
Development Tools
Scripts for testing compile time and memory consumption
Patch Tracking
Documentation
Internal documentation
Compile Time
Speedup areas
Proper GCC Memory Management
Miscellaneous
Beginner Projects
Finalize Partial Transitions