C++ Modules
A module system is coming to C++20, this page describes the GCC implementation state.
A module system will provide several benefits:
- Reduce build times due to not reparsing large header files
- Proper interface/implementation isolation
- Harder to have ODR violations
Implementation State
The modules branch has been merged to trunk, please use the usual trunk build procedure and report bugs in Bugzilla.
The implementation has a few known missing features:
- Translation-Unit Local entities, Prague '20, p1815 (includes p2003)
- Partition definition visibility rules
- Private module fragment
- Parser-level merging of already-imported GMF entities (the reverse is implemented)
Global Module escape mechanism of extern "C/C++"
- Debug generation is somewhat fragile, can ICE
Of course there are bugs in it
Invoking the Compiler
There are several new options for modules:
-fmodules-ts You need this to enable modules. Without it you'll get parse errors. Often times, it'll be the only option you need.
-fmodule-header Header unit compilation.
-fmodule-mapper=VALUE Module mapper, see CMI Location.
-fno-module-lazy Disable lazy loading
More complete documentation is in the GCC manual under 'C++ Modules'.
CMI Location
Compiling a module interface TU generates a Compiled Module Interface. This CMI is read in by the module's implementation TUs and each importer of the module. There's clearly a dependency between these things, which is different from header files because we have to invoke the compiler to generate the CMI.
To resolve this dependency, a module mapper is queried whenever a CMI name is needed. Communication is via a text-based protocol, which provides mechanism without policy. As such the compiler itself is completely agnostic about where CMIs are or how they are named. Build systems may provide a build-specific mapper. If no module mapper has been specified, a default is provided. It is expected that the behaviour of the default mapper will mature.
The -fmodule-mapper value may be one of:
=socket A local socket
hostname:port or :port An IPv6 socket
|program args... A program to invoke and communicate over its stdin/stdout
<>, <>N, <N>M Use stdin/stdout, or specific FD[s]
file A text file of space-separated module-name/file-name tuples, one per line
The first four may specify an ident to provide to the remote mapper, by affixing a trailing ?ident to the first component of the argument. For instance =/tmp/mybuild?shibboleth7, or |build-mapper?shibboleth8 bob. If no ident is specified, the name of the main source file is used in its place. Parallel builds could use the ident to distinguish connections from different compiler instances. The protocol does not provide a general purpose compilation mechanism, is not to be exposed beyond one's local security zone, & is not cryptographically hard.
The protocol is documented in the GCC manual. And published in p1184 (https://wg21.link/p1184). It will have matured since p1184, the GCC manual is the canonical specification (but may be buggy).
The CMI is not a distributable artifact. Think of it as a cache, that can be recreated as needed.
Design
There are three main pieces of work, (a) streaming to disk, (b) name lookup (c) cross-module references.
The original plan was to try and reuse LTO's streaming technology for the former. But that turned out to be impractical as there is not much overlap. LTO streams GIMPLE and language-agnostic type information. Modules need AST representation and FE type information. So I went the hand-written auto-numbering streaming route.
Name lookup started by abusing inline namespaces, but that too proved impractical. We'd need the ability to turn these namespaces on and off, and to do that requires changes to name-lookup. Once you're making that kind of change, one may as well do it properly. As a benefit, name-lookup has gotten a lot cleaner.
Cross-module references originally relied on name-lookup and the specialization table. This became impractical as the specification changed, and some C++ features (I'm looking at you, implicit member functions), broke the invariants that relied upon. These are now done via an indexing table, independent of name-lookup. The last 3 months of 2019 were spent morphing the compiler to that scheme.
ATOM
The ATOM proposal deviated from the TS in a few ways. Now (October 2018) ATOM is no longer a separate flag, having been merged in p1103r2 (https://wg21.link/p1103). The 4 distinguishing features of ATOM are:
- All imports must be a single block at the start of file (just after the module declaration, if there is one) [merged proposal applies this only for modules]
- Module interfaces may be partitioned (replacing proclaimed ownership decls) [merged]
- The global module is replaced by header units and an associate compilation mode for them. [both]
- The type model is simpler. [merged]
Mangling
Name mangling needs to be adjusted to deal with module-linkage. This is a compiler-interoperability and toolchain issue, as we want objects from different compilers to be link-compatible, and the debugger able to understand module symbols.
Current thoughts are described in module-abi-2017-09-01.pdf.
Module Linkage
I am not presuming any new linker technology. Module ownership is a new concept, and at least for module-linkage entities, must be reflected in the name mangling. Exported names need not reflect this ownership.
I am working with the Clang developers to define interoperable changes here. To facilitate migration of code, mangling of exported entities does not change from what they would have outside of a module.
Compiled Module Interface Files
As mentioned above, a CMI is generated during the compilation of a module interface unit. For GCC I'm generating it as an on-the-side entity, but it could be stashed as a special section in the output assembly file, or even be a new stage of compilation. (Clang is taking this last approach.) The data is encapsulated in an ELF-like file. You can use 'readelf -S' to get at the sections it contains, and 'readelf -p .gnu.c++.README' to get at its human-readable section. (ELF is used even on non-elf systems, we do not rely on ELF tools being present.) There are several specially-named sections, which generate the set of namespace-scope bindings. The actual binding values are held in sections named by a decl within them. We support lazy loading via cooperation with the name-lookup machinery. If it finds a lazy binding, it invokes the loader to load that binding. This loading may recurse, but we take care to make sure loops do not occur (this is non-trivial with C++).
Recompiling a TU with exactly the same options will produce a CMI that is equivalent, which is what you want with a cache. It does contain CRCs, which are used to detect corruption. I've not made corruption detection cryptographically strong or anything, I do not presume an adversarial attacker. If we detect corruption, you should get an error and then compilation terminates with a fatal error -- the likelihood of any further diagnostics being meaningful is negligible.
The README sections of the CMI are not read by the compiler, nor contribute to an consistency checks. Thus they can change without invalidating the rest of the CMI. Mutable build information is stored here (build time, host name etc). I may add an option to provide bit-identical CMIs.
CMIs are relocatable within the file system, or copyable to another machine, which you might want with a distributed build. They refer to their own imports by reference, naming both the import module, and the (relative) location of the CMI that was loaded. If you copy a CMI you must recursively copy all its imports and recreate the same file structure (these can be found by examining the README section). If you give the compiler an absolute path (for instance your include-path), that absolute path will end up in the CMI. I may add an option expunge, prefix or relocate file-system-absolute paths.
Example
Put the following in hello.cc:
module; #include <iostream> #include <string_view> export module hello; export void greeter (std::string_view const &name) { std::cout << "Hello " << name << "!\n"; }
and put the following in main.cc:
import hello; int main (void) { greeter ("world"); return 0; }
Now compile with:
g++ -fmodules-ts hello.cc main.cc
You can run the a.out:
bester:7>./a.out Hello world!
Notice that main.cc did not #include <string_view> — it doesn't need to, because it never names that type. The type itself becomes known about due to the exported declaration greeter.
Global Module
The Global Module exists to allow legacy C++ headers to be used in a module. The Global Module's contents derive from two sources:
- Header Units
- The Global Module Fragment
Header Units are traditional header files that are (separately) compiled with the -fmodule-header flag. They each produce a BMI, and are imported into user code with an import declaration naming a header unit:
import <iostream>; import "otherheader.h";
The Global Module Fragment's contents are header files that are included before a module declaration:
module; #include <iostream> #include "otherheader.h" export module Quux; ...
Header Units can be imported into both module and non-module code. The Global Module Fragment is only relevant to modules themselves. Regular non-module code is also in the Global Module. One aspect of the Global Module is that entities can be declared, or even defined, in multiple places -- because that's how header files work. The compiler has to merge these declarations, but can rely on the ODR to presume they are 'the same'.
Another thing is that Header Units export macros. That's unavoidable if one wants to treat a header file as a header unit.
Timeline
March 1st 2017 - first executable works!
- April 26th 2017 - Namespace symbol table handling reworked to be module-compatible (and just generally better).
- May 3rd 2017 - Symbol table partitioned and module-specific mangling (no back-references)
June 15th 2017 - Class & function declarations and definitions
- July 5th 2017 - Created 'c++-name-lookup' branch to handle changes that are easier to complete on a separate branch
- Sept 4th 2017 - Uploaded new ABI document
- Sept 8th 2017 - Presented at GNU Cauldron.
- Sept 26th 2017 - Presented at CPPCon
- Oct 13th 2017 - Function template exported and instantiated.
- Oct 20th 2017 - A class template exported and instantiated.
- Oct 23rd 2017 - Wrapper script technology.
- Nov 16th 2017 - constexpr functions
Nov 22nd 2017 - Template function & class members of template & non-template classes. Happy birthday, mum!
- Dec 20th 2017 - Enumerated types and static-storage variables
- Jan 2nd 2018 - Functions, Classes, Templates, Typedefs. Enough to kick the tyres.
- Jan 26th 2018 - Move to ELROND encapsulation
- Mar 15th 2018 - Implement p0713 -- 'module;' at start.
- Mar 16th 2018 - Add -fmodules-atom (p0947), remove plain -fmodules
- Apr 6th 2018 - Lazy loading (defns clustered with decsl)
- May 1st 2018 - Fixes of inter-module references.
- May 10th 2018 - Module server added.
- May 16th 2018 - ATOM preamble rescanning, -fmodule-preamble preprocessing mode.
- June 3rd 2018 - module server renamed to module mapper, 'file' option resurrected (hi Boris!)
- Aug 27th 2018 - Mangling substitutions
- Aug 28th 2018 - Legacy header macros
- Sept 7th 2018 - Presented at GNU Cauldron
- Sept 25th 2018 - Presented at CPPCon
- Oct 7th 2018 - Legacy header deduping
- Oct 19th 2018 - Merged ATOM into TS. Deprecate fmodules-atom
- Oct 21st 2018 - Added -fno-module-keywords
- Nov 07th 2018 - Implement p0924r1's contextual keywords.
- Dec 13th 2018 - Merging global module entities from legacy imports.
- Jan 15th 2019 - Module partition CMI folding.
- Jan 22nd 2019 - Partitions.
- Jan 30th 2019 - ADL.
- Feb 23rd 2019 - WG21 votes to merge modules into working paper
- Mar 6th 2019 - GMF Pruning
- Mar 13th 2019 - stdio.h can be compiled to a header unit
- Apr 25 2019 - Instantiations are streamed
- Jul 5th 2019 - iostream can be compiled to a header unit (in c++17 mode)
- Jul 24th 2019 - iostream can be compiled to a header unit in c++2a mode
- Sep 17th 2019 - Presented at CPPCon
- Dec 12 2019 - Entity index and hash
- Jan 3rd 2020 - Global Module Entity merging working in C++17 mode
- Feb 4th 2020 - Module control-lines (p1857)
- Feb 7th 2020 - -fdirectives-only including raw string literals
- Feb 28th 2020 - Concepts
Mar 26th 2020 - All STL headers work as header units, (except <ranges> in C++20 mode)
- Apr 3rd 2020 - NSDMIs (hey, a bug report!)
Apr 7th 2020 - C++20 <ranges> as a header unit
- May 13th 2020 - Dynamic initialization order (p1874)
- May 14th 2020 - GMF imports do not leak (p1979)
- Jun 8th 2020 - libcody used for module mapper protocol
- Jun 24th 2020 - Constrained partial specializations
- Jul 1st 2020 - FR39, dependent ADL
- Jul 2nd 2020 - p1779, ABI isolation for member functions (warning, they are not implicitly inline)
Random Cleanups
I've been making some random cleanups to the code base:
- Inline namespace handling pr 79369 (upstreamed)
- Canonicalize type hashing (upstreamed)
- g++-dg.exp: find tests simplify (upstreamed)
- CRC generation optimization (upstreamed)
- OVERLOAD representation (upstreamed)
- Name lookup (qualified, unqualified, ADL) (upstreamed)
- Name insertion (upstreamed)
- Namespace contents representation. (upstreamed)
- Kill strong using directives (upstreamed)
- Inline namespace representation (upstreamed)
- DR2061 (upstreamed)
- Kill TYPE_METHODS (upstreamed)
- cdtors have proper names and no magic slots (upstreamed)
- conversion ops have a single name and no magic slots (upstreamed)
- Kill CLASSTYPE_SORTED_FIELDS (upstreamed)
- cpp_macro cleanup (upstreamed)
- Reimplemented -fdirectives-only preprocessing, adding raw literals. (pending stage 1)
Documentation
https://wg21.link/n4047 modules rationale
https://wg21.link/n4610 modules-ts Oct 2016
https://wg21.link/n4720 modules-ts Feb 2018
https://wg21.link/p0947 ATOM proposal
https://wg21.link/p1103 merging modules
I wrote some papers:
https://wg21.link/p0714 namespace exporting
https://wg21.link/p0715 using directives
https://wg21.link/p0721 using declarations
https://wg21.link/p0731 interface imports
https://wg21.link/p0749 namespace pervasiveness
https://wg21.link/p0774 module declaration
https://wg21.link/p0775 module partitions
https://wg21.link/p0778 module names
https://wg21.link/p0787 proclaiming declarations
https://wg21.link/p0867 interface name
https://wg21.link/p0923 dependent ADL
https://wg21.link/p0924 context-sensitive keyword
https://wg21.link/p0925 unqualified using declarations
https://wg21.link/p1174 legacy macros
https://wg21.link/p1183 legacy header names
https://wg21.link/p1184 module mapper
https://wg21.link/p1213 global module
https://wg21.link/p1299 module preamble
https://wg21.link/p1347 ADL & internal linkage (with Davis Herring)
https://wg21.link/p1395 module partitions not a panacea
https://wg21.link/p1602 make me a module
https://wg21.link/p1884 private module partition: an inconsistent boundary
I also presented work at:
2017 GNU Cauldron https://gcc.gnu.org/wiki/cauldron2017
2017 CPPCon https://www.google.com/search?q=sidwell+cppcon+2017 (because you tube links are verboten)
2018 GNU Cauldron https://gcc.gnu.org/wiki/cauldron2018
2018 CPPCon https://www.google.com/search?q=sidwell+cppcon+2018
2019 CPPCon https://www.google.com/search?q=sidwell+cppcon+2019
2020/10 Overload https://accu.org/journals/overload/28/159/sidwell/