C++ Modules
A module system is coming to C++, this page describes the GCC implementation state.
The goal of the module system is to avoid huge header files, thus speeding up compilation. What distinguishes it from things like precompiled headers are:
- Composabililty of multiple modules
- Explicit code annotations to define visible interfaces
Implementation State
Development branch: 'c++-modules' (svn://gcc.gnu.org/svn/gcc/branches/c++-modules). Reporting bugs
The branch was created, by Nathan Sidwell, Jan 2017, the specification, design and implementation are in flux.
March 1st 2017 - first executable works!
- April 26th 2017 - Namespace symbol table handling reworked to be module-compatible (and just generally better).
- May 3rd 2017 - Symbol table partitioned and module-specific mangling (no back-references)
June 15th 2017 - Class & function declarations and definitions
- July 5th 2017 - Created 'c++-name-lookup' branch to handle changes that are easier to complete on a separate branch
- July 20th 2017 - First contributor patch applied (Boris Kolpackov)
- Sept 4th 2017 - Uploaded new ABI document
- Sept 8th 2017 - Presented at GNU Cauldron.
- Oct 13th 2017 - Function template exported and instantiated.
- Oct 20th 2017 - A class template exported and instantiated.
- Oct 23rd 2017 - Wrapper script technology.
- Nov 16th 2017 - constexpr functions
Nov 22nd 2017 - Template function & class members of template & non-template classes. Happy birthday, mum!
- Dec 20th 2017 - Enumerated types and static-storage variables
- Jan 2nd 2018 - Typedefs.
- Jan 26th 2018 - Move to ELROND encapsulation
- Mar 15 2018 - Implement p0713 -- 'module;' at start.
- Mar 16 2018 - Add -fmodules-atom (p0947), remove plain -fmodules
- Apr 6th 2018 - Lazy loading (defns clustered with decsl)
- May 1st 2018 - Fixes of inter-module references.
- May 10th 2018 - Module server added.
- May 16th 2018 - ATOM preamble rescanning, -fmodule-preamble preprocessing mode.
- June 3rd 2018 - module server renamed to module mapper, 'file' option resurrected (hi Boris!)
Random Cleanups
I've been making some random cleanups to the code base. Now stage 1 is open, I'm pushing these to trunk:
- Inline namespace handling pr 79369 (upstreamed)
- Canonicalize type hashing (upstreamed)
- g++-dg.exp: find tests simplify (upstreamed)
- CRC generation optimization (upstreamed)
- OVERLOAD representation (upstreamed)
- Name lookup (qualified, unqualified, ADL) (upstreamed)
- Name insertion (upstreamed)
- Namespace contents representation. (upstreamed)
- Kill strong using directives (upstreamed)
- Inline namespace representation (upstreamed)
- DR2061 (upstreamed)
- Kill TYPE_METHODS (upstreamed)
- cdtors have proper names and no magic slots (upstreamed)
- conversion ops have a single name and no magic slots (upstreamed)
- Kill CLASSTYPE_SORTED_FIELDS (upstreamed)
Bugs
Due to the experimental nature of the implementation, I'm not very interested in bug reports just yet. If you're directly working with me, you'll already know how to get my attention.
Here's a list of known not-working significant features:
- The known not-working list is incomplete. FIXME
Design
There are two main pieces of work, (a) streaming to disk, (b) name lookup.
The original plan was to try and reuse LTO's streaming technology for the former. But that turned out to be impractal as there is not much overlap. LTO streams GIMPLE and language-agnostic type information. Modules need AST representation and FE type information. So I went the hand-written auto-numbering streaming route.
Name lookup started by abusing inline namespaces, but that too proved impractical. We'd need the ability to turn these namespaces on and off, and to do that requires changes to name-lookup. Once you're making that kind of change, one may as well do it properly. As a benefit, name-lookup has gotten a lot cleaner.
Mangling
Name mangling needs to be adjusted to deal with module-linkage. This is a compiler-interoperability and toolchain issue, as we want objects from different compilers to be link-compatible, and the debugger able to understand module symbols.
Current thoughts are described in module-abi-2017-09-01.pdf.
Interface Designation
At the start of implementation, there was no special syntax for denoting the interface TU of a module. But implementations need to know immediately after seeing the module declaration whether the TU is the interface or one of the implementation TUs -- they cannot defer that decision. This has now been resolved with the 'export module foo;' syntax designating the interface TU.
Compiling the interface TU generates a Binary Module Interface. This BMI is read in by each implementation TU and each importer of the module. There's clearly a dependency between these things, which is different from header files because we have to invoke the compiler to generate the BMI. I have now implemented a hook in the compiler that can determine what to do if a BMI is not found. The default implementation of this wrapper script invokes the compiler to generate the BMI.
The BMI is not a distributable artifact. Think of it as a cache, that can be recreated as needed.
Module Linkage
I am not presuming any new linker technology. Module ownership is a new concept, and at least for module-linkage na/mes, must be reflected in the name mangling. Exported names need not reflect this ownership.
I am working with the Clang developers to define interoperable changes here. To facilitate migration of code, mangling of exported entities does not change.
Invoking the Compiler
There are several new options for modules:
- -fmodules-ts You need this to enable modules. Without it you'll get parse errors. Oftentimes, it'll be the only option you need.
- -fmodules-atom This selects the ATOM scheme. This is extra-ts.
- -fmodule-mapper=VALUE Module mapper.
- -fno-module-lazy Disable lazy loading
BMI Location
When a BMI file is needed a module mapper is queried. If no module mapper has been specified, a default is provided. It is expected that build systems will provide a build-specific mapper. The -fmodule-mapper value may be one of:
* fd or fd,fd File descriptor(s) to read from and write to
* =socket A local socket
* hostname:port or :port An IPV6 socket
* |program args... A program to invoke and communicate over its stdin/stdout
* file A text file of space-separated module-name/file-name tuples, one per line
The first four may specify a cookie to provide to the remote mapper, by affixing a trailing '?cookie' to the first component of the argument. For instance '=/tmp/mybuild?shibboleth7', or '|build-mapper?shibboleth8 bob'. If no cookie is specifed, the name of the main source file is used in its place. It is expected that parallel builds will use the cookie to distinguish connections from different compiler instances.
Example
Put the following in hello.cc:
module;
#include <stdio.h>
export module hello;
export void greeter (const char *name)
{
printf ("Hello %s!\n", name);
}and put the following in main.cc:
import hello;
int main (void)
{
greeter ("world");
return 0;
}Now compile with:
g++ -fmodules-ts main.cc hello.cc
You can run the a.out:
nathans@devvm2186:161>./a.out Hello world!
Binary Module Interface Files
As mentioved above, a BMI is generated during the compilation of a module interface unit. For GCC I'm generating it as an on-the-side entity, but it could be stashed as a special section in the output assembly file, or even be a new stage of compilation. (Clang is taking this last approach.) The data is encapsulated in an ELF-like file. You can use 'readelf -S' to get at the sections it contains, and 'readelf -p gnu.c++.README' to get at its human-readable section. There are several specially-named sections, which generate the set of namespace-scope bindings. The actual binding values are held in sections named by a decl within them. We support lazy loading via cooperation with the name-lookup machinery. If it finds a lazy binding, it invokes the loader to load that binding. We take care to make sure things are not recursive here (this is non-trivial with C++).
The BMI does not contain timestamps. Thus recompiling a TU with exactly the same options will produce an identical BMI -- that's what you want with a cache. It does contain CRCs, which are used to detect corruption. I've not made corruption detection cryptographically strong or anything. If we detect corruption, you should get an error and then compilation terminates with a fatal error -- the likelihood of any further diagnostics being meaningful is negligible.
Global Module
Declarations before the module-declaration are in the global module. While this is clear enough, it has a complicated interaction with a module interface:
module; void Foo (); export module Quux; export void Bar (); void Baz ();
module Quux; // implementation of Quux
void Bar () {
Baz (); // Baz's declaration visible from purview Quux interface
Foo (); // ERROR global module decls from interface NOT visible
}import Quux; // user of Quux
void Baz ()
{
Bar (); // Quux's Bar
Baz (); // ERROR: Quux's non-exported Baz not visible.
Foo (); // ERROR: Foo not visible from Quux interface
}This is not implemented -- simple cases may work 'by accident'.
Documentation
http://wg21.link/n4047 modules rationale
http://wg21.link/n4610 modules-ts Oct 2016
http://wg21.link/n4720 modules-ts Feb 2018
http://wg21.link/p0947r0 ATOM proposal
http://wg21.link/p0713 'module;' at start