Executive Summary

The C++ memory model is designed to provide predictable results in a parallel environment as well as in a sequential one. This memory model is also being adopted by the C standard. There are 2 primary components:

The data-race component of the C++11 memory model can be accomplished by implementing something like the following flags to the compiler:

These 4 flags will implement the required restrictions in the optimizer, and enable them to be turned on or off as required by testing or knowledgable users.

Exposure to the normal users can be provided through something along the lines of:

The memory-model=c++0x option simply disables the necessary data races for compliance. Architectures which have no hardware support for data race detection only need to disable the store data races, otherwise all four must be disabled.

The -fmemory-model flag itself doesn't limit what the user can use the program for, it simply lets the optimizers know what the limitations are regarding synchronization/awareness of other threads.

Optimizations must be audited for situations which would break compliance, and modified to check these flags. A conformance testsuite is being developed to help find these locations and then ensure they aren't accidentally re-enabled.

GCC has made the decision that optimizations will be allowed to introduce new load data races, as long as the results are thrown away. It will remain this way until it causes an issue with targeted hardware. This is also allowed by the latest draft standard [N3242.1.10.23]:

Transformations that introduce a speculative read of a potentially shared memory
location may not preserve the semantics of the C++ program as defined in this standard,
since they potentially introduce a data race. However, they are typically valid in the
context of an optimizing compiler that targets a specific machine with well-defined
semantics for data races. They would be invalid for a hypothetical machine that is not
tolerant of races or provides hardware race detection. — end note ]

The other required aspect for memory model compliance is implementing the atomic types and operations. Atomic types are defined such that no other thread may ever see an “in between” state. Ie, if 3 stores are needed to change the value of a class, no thread may read a value from the class in which only a subset of the 3 required stores have been performed. The C++ model also provides for a memory ordering parameter which has effects on what kinds of code motion are valid.

The simplest mechanism to implement the atomic feature is to use mutual exclusion locks on each type. When a value is being written, the lock is acquired and all reads are held up until the lock is released. This is undesirable in practice as it creates a bottle neck and poor performance. There are other options and the initial implementation has a more efficient variation of locking.

The ultimate goal is to produce lock-free atomic types. Most modern architectures provide hardware instructions for 1, 2, 4 and 8 byte atomic types. These basic instructions have already been made available in GCC when available. The challenge is providing it for other sized types which are not native. This will be covered in a future section.

Atomic types are also defined to be synchronization points for cross thread communication. The compiler is being modified to emit these synchronization elements as well, look for the Codegen section. This causes some restrictions on what optimizations can be performed around atomic types, and the optimizer needs to be taught about these. This is being covered in the Optimizations section.

Once all the atomic types and operations are supported in a lock-free way, the next step is to provide lock-free versions of much of the standard template library.

None: Atomic/GCCMM/ExecutiveSummary (last edited 2011-09-23 07:29:37 by 195)