Modular GCC

GCC has grown too big and intertwined. This has made learning curves steeper, raised barriers of entry for new developers, and made the compiler harder to test, debug and understand.

We need to make the compiler more modular, introduce strong APIs, reduce global state, build and test components separately, add unit tests for components, and generally make the compiler a collection of distinct components with well defined interfaces and behaviour.

Include files and Makefile dependencies

One problem with modularization of GCC, is that there is a spaghetti web of included files. There are many C and header files that include other header files to pick up one declaration from the included header in some cases, or for no reason at all in some other cases. The dependencies between the files are most obviously identified by looking at the headers included by a .c implementation file. The makefile rules in Makefile.in and in the target make rules are also very helpful.

Very often, the reason for including files in a C source file can be traced back to the initial commit of a new file, indicating that the file was created as a copy of another file, implementing something else and perhaps needing the header. If the new file implements something for which the included header is not required, this is usually not cleaned up or even noticed.

An example of this is output.h, which used to be included in many front-end files. This is an unwanted dependency, because the functions made available by including output.h are used to write out assembler instructions, which front ends should not be doing anymore: A front end should do parsing, semantics, gimplification, and hand off one or more translation units to cgraphunit.c, one unit at a time.

Another example is intl.h, which is included in files like cgraphunit.c but is not actually used by that file.

In other cases, new features are implemented in a way that creates unnecessary dependences. This happeded for example with options.h which was included in tree.h to import cl_target_option. This was resolved by moving cl_target_option to coretypes.h. Another example from input.h is location_t, which probably ought to be declared in coretypes.h as well.

More problematic for a modularization effort, but fortunately slightly less challenging to fix, is header files including other header files. Assume C source file 'A' includes header file 'B', which includes header file 'C'. All declarations imported from header file 'C' are also available in source file 'A'. There are even cases where no declarations from 'B' are used in 'A' but where not including 'B' anymore uncovers the hidden dependency on 'A'. These hidden dependencies make it all too easy for developers to, more or less accidentally, cross modular boundaries without anyone noticing in patch reviews.

The task of removing redundant include files is something that could be added to the list of projects for beginners. The following independent, small projects would greatly improve the include file situation in GCC:

All this is a bit pointless unless it is focused to get a particular independent module out of it. Any module: graphite, GIMPLE core, GIMPLE passes, RTL core, RTL passes, etc. (there is no obvious consensus about what would be modules yet). Some people believe that physical separation of files and independent libraries are the only way to ensure dependencies are not added back. What is certain, is that with the current setup it is too easy to add undesired dependencies.

Internal interfaces

Another problem with the code base of GCC, is that there are too many dependencies between individual files. To identify individual files, or groups of files, that form a module, such dependencies have to be identified, and either be broken if they are unwanted, or documented if the dependency is supposed to exist. For instance, a GIMPLE pass naturally depends on gimple-fold.c but not on expr.h, a file that only deals with RTL stuff.

Before GCC can be modularized properly, it is necessary to make clear what interfaces and declarations are actually needed in each source file. This will, no doubt, be a huge job. It is unclear at the moment whether there are tools available that could help (Dehydra perhaps, or a dedicated plugin, or Ctags? Or turn this patch into a proper plugin? Maybe create a symbols database and identify dependencies to break?).

Front-end modules

Making a single FE an independent library is a major undertaking. All front ends live in their own subdirectory, but there are countless dependencies on other parts of the compiler (target-independent middle end as well as target-specific things) that have to be broken to make a front end a stand-alone module. But starting with one FE would set an example to follow, and help initiate an effort to define and document/implement the interface between the front ends and the rest of the compiler. The Go front end may be just that example. This is a new front end written intentionally to be independent of the GCC code generator.

Middle-end modules

Instead of having a single monolithic binary, it is proposed to separate the major components in libbackend.a into several libraries. These libraries would live in separate sub-directories under gcc/. Although initially they would be built together with the rest of the compiler, the intent is to evolve these libraries into independent modules that could be built separately from the compiler.

The gcc/ directory would be modularized using the different Intermediate Languages (IL) as the main separators. This means the creation of libgeneric, libgimple and librtl. However, there are other major pieces of functionality that should also be moved into their own modules. The various modules will live under a sub-directory of gcc/ (at least initially). The following is the proposed organization of the gcc/ directory (this does not include the existing directories):

* gcc/

Each directory will export a library and a set of include files that define its interface. Every other module that wants to use its services will only be able to talk to it via the published interface.

TODO: Propose which files should go in each directory and define the interfaces.

-- ManuelLópezIbáñez 2010-05-15 20:37:04 We should document the desired and the undesired [!] dependencies. The main hurdle for me to help make GCC more modular is autotools. I tried to move line-map out of libcpp. This should be trivial because line-map does not have any dependencies but it was impossible to get the autotools magic right. So I gave up.

-- RichardGuenther 2010-06-06 14:37:00 As several of the desired modules are tied together via dependencies on tree separating them out to directories should be not priority. I do agree with c/ and driver/. And I would add lib/ for stuff like sbitmap.

* gcc/

Streamable Intermediate Representations

One important aspect of the modularization proposed in the previous section is going to be the ability to test the different modules independently. As such, I believe that all the intermediate languages in GCC should have a streaming representation. This should make it possible for the compiler pipeline to start and stop at almost any point during compilation.

The main benefit of this feature is in testing. It will be possible to write tests that check a specific analysis or transformation done on a representation, independently of what other transformations may occur before it in the regular pipeline. For instance, it should be possible to generate synthetic gimple, pass it through a single optimization pass and test the resulting gimple.

Additionally, having streamable representations means that the compiler can stop at any time, save its state to some database and resume from that database. This is a feature that is extremely useful for projects like the compiler server or LTO.

None: ModularGCC (last edited 2012-06-29 16:26:29 by StevenBosscher)