C++ Modules

A module system is coming to C++, this page describes the GCC implementation state.

The goal of the module system is to avoid huge header files, thus speeding up compilation. What distinguishes it from things like precompiled headers are:

Implementation State

Development branch: 'c++-modules' (svn://gcc.gnu.org/svn/gcc/branches/c++-modules). Reporting bugs

The branch was created, by Nathan Sidwell, Jan 2017, the specification, design and implementation are in flux.

Notable events:

Invoking the Compiler

There are several new options for modules:

More complete documentation is in the GCC manual under 'C++ Modules'.

Bugs

Due to the experimental nature of the implementation things are fragile. Reports from members of the C++ Standard committee will receive attention.

Here's a list of known not-working significant features:

ATOM

The ATOM proposal deviates from the TS in a few ways. Now (October 2018) ATOM is no longer a separate flag, having been merged in p1103r2 (https://wg21.link/p1103). The 4 distinguishing features of ATOM are:

BMI Location

Compiling a module interface TU generates a Binary Module Interface. This BMI is read in by the module's implementation TUs and each importer of the module. There's clearly a dependency between these things, which is different from header files because we have to invoke the compiler to generate the BMI.

To resolve this dependency, a module mapper is queried whenever a BMI name is needed. Communication is via a text-based protocol, which provides mechanism without policy. As such the compiler itself is completely agnostic about where BMIs are or how they are named. Build systems may provide a build-specific mapper. If no module mapper has been specified, a default is provided. It is expected that the behaviour of the default mapper will mature.

The -fmodule-mapper value may be one of:

The first four may specify an ident to provide to the remote mapper, by affixing a trailing '?ident' to the first component of the argument. For instance '=/tmp/mybuild?shibboleth7', or '|build-mapper?shibboleth8 bob'. If no ident is specified, the name of the main source file is used in its place. It is expected that parallel builds will use the ident to distinguish connections from different compiler instances. The protocol does not provide a general purpose compilation mechanism, is not to be exposed beyond one's local security zone, & is not cryptographically hard.

The protocol is documented in the GCC manual. And published in p1184 (https://wg21.link/p1184).

The BMI is not a distributable artifact. Think of it as a cache, that can be recreated as needed.

Design

There are two main pieces of work, (a) streaming to disk, (b) name lookup.

The original plan was to try and reuse LTO's streaming technology for the former. But that turned out to be impractical as there is not much overlap. LTO streams GIMPLE and language-agnostic type information. Modules need AST representation and FE type information. So I went the hand-written auto-numbering streaming route.

Name lookup started by abusing inline namespaces, but that too proved impractical. We'd need the ability to turn these namespaces on and off, and to do that requires changes to name-lookup. Once you're making that kind of change, one may as well do it properly. As a benefit, name-lookup has gotten a lot cleaner.

Mangling

Name mangling needs to be adjusted to deal with module-linkage. This is a compiler-interoperability and toolchain issue, as we want objects from different compilers to be link-compatible, and the debugger able to understand module symbols.

Current thoughts are described in module-abi-2017-09-01.pdf.

Module Linkage

I am not presuming any new linker technology. Module ownership is a new concept, and at least for module-linkage entities, must be reflected in the name mangling. Exported names need not reflect this ownership.

I am working with the Clang developers to define interoperable changes here. To facilitate migration of code, mangling of exported entities does not change from what they would have outside of a module.

Binary Module Interface Files

As mentioned above, a BMI is generated during the compilation of a module interface unit. For GCC I'm generating it as an on-the-side entity, but it could be stashed as a special section in the output assembly file, or even be a new stage of compilation. (Clang is taking this last approach.) The data is encapsulated in an ELF-like file. You can use 'readelf -S' to get at the sections it contains, and 'readelf -p gnu.c++.README' to get at its human-readable section. There are several specially-named sections, which generate the set of namespace-scope bindings. The actual binding values are held in sections named by a decl within them. We support lazy loading via cooperation with the name-lookup machinery. If it finds a lazy binding, it invokes the loader to load that binding. We take care to make sure things are not recursive here (this is non-trivial with C++).

The BMI does not contain timestamps. Thus recompiling a TU with exactly the same options will produce an identical BMI -- that's what you want with a cache. It does contain CRCs, which are used to detect corruption. I've not made corruption detection cryptographically strong or anything, I do not presume an adversarial attacker. If we detect corruption, you should get an error and then compilation terminates with a fatal error -- the likelihood of any further diagnostics being meaningful is negligible.

BMIs are relocatable within the file system, or copyable to another machine, which you might want with a distributed build. They refer to their own imports by reference, naming both the import module, and the (relative) location of the BMI that was loaded. If you copy a BMI you must recursively copy all its imports and recreate the same file structure (these can be found by examining the README section).

Example

Put the following in hello.cc:

module;
#include <stdio.h>
export module hello;
export void greeter (const char *name)
{
  printf ("Hello %s!\n", name);
}

and put the following in main.cc:

import hello;
int main (void)
{
  greeter ("world");
  return 0;
}

Now compile with:

   g++ -fmodules-ts main.cc hello.cc

You can run the a.out:

nathans@devvm2186:161>./a.out
Hello world!

Global Module

The Global Module exists to allow legacy C++ code to be used in a module. The Global Module's contents derive from two sources:

Legacy Header Units are traditional header files that are (separately) compiled with the -fmodule-legacy flag. They each produce a BMI, and are imported into user code with an import declaration naming a legacy header unit:

import <stdio.h>;
import "otherheader.h";

The Global Module Fragment's contents are header files that are included before a module declaration:

module;
#include <stdio.h>
#include "otherheader.h"
export module Quux;
...

Legacy Header Units can be imported into both module and non-module code. The Global Module Fragment is only relevant to modules themselves. Regular non-module code is also in the Global Module. One aspect of the Global Module is that entities can be declared, or even defined, in multiple places -- because that's how header files work. The compiler has to merge these declarations, but can rely on the ODR to presume they are 'the same'.

Another thing is that Legacy Header Units export macros. That's unavoidable if one wants to turn a header file into a legacy header unit.

Random Cleanups

I've been making some random cleanups to the code base:

Timeline

Documentation

I wrote some papers:

I also presented work at:

None: cxx-modules (last edited 2019-02-13 15:50:23 by NathanSidwell)