Differences between revisions 24 and 25
Revision 24 as of 2017-07-20 14:28:42
Size: 7527
Revision 25 as of 2017-07-21 13:15:45
Size: 7558
Deletions are marked like this. Additions are marked like this.
Line 54: Line 54:
 * Kill TYPE_METHODS  * Kill TYPE_METHODS (upstreamed)
Line 58: Line 58:
 * conversion ops are all on one slot  * conversion ops have a single name and no magic slots

C++ Modules

A module system is coming to C++, this page describes the GCC implementation state.

The goal of the module system is to avoid huge header files, thus speeding up compilation. What distinguishes it from things like precompiled headers are:

  • Not solving the 'preprocessor problem'
  • Composabililty of multiple modules
  • Explicit code annotations to define visible interfaces

Implementation State

Development branch: 'c++-modules' (svn://gcc.gnu.org/svn/gcc/branches/c++-modules).

The branch was created, by Nathan Sidwell, Jan 2017, so it is very early days, and I expect it to be several months before there's something of interest.

  • March 1st 2017 - first executable works!

  • April 26th 2017 - Namespace symbol table handling reworked to be module-compatible (and just generally better).
  • May 3rd 2017 - Symbol table partitioned and module-specific mangling (no back-references)
  • June 15th 2017 - Class & function declarations and definitions

  • July 5th 2017 - Created 'c++-name-lookup' branch to handle changes that are easier to complete on a separate branch
  • July 20th 2017 - First contributor patch applied (Boris Kolpackov)

Random Cleanups

I've been making some random cleanups to the code base. Now stage 1 is open, I'm pushing these to trunk:

  • Inline namespace handling pr 79369 (upstreamed)
  • Canonicalize type hashing (upstreamed)
  • g++-dg.exp: find tests simplify (upstreamed)
  • CRC generation optimization (upstreamed)
  • OVERLOAD representation (upstreamed)
  • Name lookup (qualified, unqualified, ADL) (upstreamed)
  • Name insertion (upstreamed)
  • Namespace contents representation. (upstreamed)
  • Kill strong using directives (upstreamed)
  • Inline namespace representation (upstreamed)
  • DR2061 (upstreamed)
  • Kill TYPE_METHODS (upstreamed)
  • cdtors have proper names and no magic slots
  • conversion ops have a single name and no magic slots


I aim to reuse (with suitable abstraction) as much LTO machinery as possible. LTO currently writes out both type trees and gimple instructions, encapsulating the information into additional sections of the output files. Modules needs to write out both type trees and FE AST (before genericization), but not gimple. It also needs to read that information back into the FE. The data will probably be emitted into not-the-object-file, which will be similar to PCH behaviour (PCH machinery will not be used).

It turns out that the overlap with LTO is 'not much'. Just types, and not the same bits of types. So I'm implementing a separate streamer. Oh well.


Name mangling needs to be adjusted to deal with module-linkage. This is a compiler-interoperability and toolchain issue, as we want objects from different compilers to be link-compatible, and the debugger able to understand module symbols.

Current thoughts are described in module-abi-2017-03-27.pdf.

Interface Designation

The current specification for modules shows no special syntax for denoting the interface TU of a module. However, implementations need to know immediately after seeing the module declaration whether the TU is the interface or one of the implementation TUs (the latter need to effectively import the interface). It is not until an export declaration is seen that it can be positively determined that the TU is the interface. If there is no export declaration, then the TU is either an implementation, or perhaps an interface that doesn't export anything.

One way of solving this would be a special compilation flag, or file suffix, to denote interface compilation. Either of these approaches would require development changes with things like additional make file rules and editor mode selection, which are not ideal and detracts from a 'just drops in' feature set.

I have taken the approach of requiring a [[interface]] attribute on the module-declaration for the interface TU. C++Kona'17 update: Jason & I are working with Gaby dos Reis on standardizing a way of distinguishing interface from implementation.

I have now implemented 'export module Foo;' as the mechanism to denote the interface TU.

module foo [[interface]]; // foo's interface TU

module foo; // one of foo's implementation TUs

Module Linkage

The earliest plan was to use inline namespace capability to wedge an invisible namespace for all things with module linkage (with some internal compiler magicness). This namespace is put just inside the innermost regular namespace, so there's one of these per regular namespace. Exported objects are in the parent namespace. Thus a mangling just drops out.

However, this approach is somewhat awkward. It doesn't scale to 1000' of modules -- you'll end up doing 1000's of name lookups. An it makes other places where we need to know the 'real' context more complicated. Instead I have implemented proper symbol table partitioning. Each loaded module is assigned a unique index, and this is stashed in the DECL. This is a much neater approach, and I found spare space in the existing DECL to put the index. So no structure bloat.

Global Module

Declarations before the module-declaration are in the global module. While this is clear enough, it has a complicated interaction with a module interface:

void Foo ();
export module Quux;
export void Bar ();
void Baz ();

module Quux; // implementation of Quux
void Bar () {
  Baz (); // Baz's declaration visible from purview Quux interface
  Foo (); // ERROR global module decls from interface NOT visible

import Quux; // user of Quux
void Baz ()
  Bar (); // Quux's Bar
  Baz (); // ERROR: Quux's non-exported Baz not visible.
  Foo (); // ERROR: Foo not visible from Quux interface

I have not yet got a good handle on how to approach this.

Module Files

While not fixed in design yet, I think it possible to write the module file at the end of compiling the interface TU. It's not necessary to write it incrementally. As the data will necessarily be a self-referential recursive structure, write at end is probably the better approach and the usual hashing techniques can be used to remember back-references.

Theorem: when reading a compiled module, it is always safe to reorder the reading of its imports . Specifically, an interface that imports some modules, may be written out such that all the imports are processed before the body of the interface itself:

module bob [[interface]];
... whatever
import baz;
... more stuff
export module bar;
... and finally

is equivalent to:

module bob [[interface]];
import baz;
export module bar;
... whatever
... more stuff
... and finally

Note, we don't have to worry about by-name lookup finding different things in this case -- names are already bound (or will be deferrred to instantiation time anyway). This theorem must be correct, otherwise it would not be possible to import two modules that themselves import a third module.

Module re-exporting must be done by reference. Again, this is necessary so that a module may import a module and also import a second module that re-exports that first module.


None: cxx-modules (last edited 2018-08-31 20:09:50 by NathanSidwell)