I'm sure this will be controversial in some quarters, but the memory model synchronization modes may still be confusing, so here's a quick and dirty run down of how all these things interact and would be used in day to day programming.
The normal case one thinks of is when there is a system wide lock, and when the lock is acquired, you expect everything which occurred before the lock to be completed. ie
- process 1 - - process 2 - otherglob = 2; wait (atomic_lock() == 1); global = 10; print (global); set atomic_lock(1);
you expect 'global' in process 2 to always be 10. You are in effect using the lock as a data-is-ready flag for global.
In order for that to happen in a consistent manner, there is more involved than simply waiting for the lock to be available. If process 1 and 2 are running on different machines, process 1 will have to flush its cache all the way to memory, and process 2 will have to wait for that flush to complete and be visible before it can proceed with allowing the proper value of global to be loaded. Otherwise the results will not be as expected.
Thats the synchronization model which maps to the default or sequentially consistent C++ model. The cache flushing and whatever else is required is built into the library routines for performing atomic loads and stores. There is no mechanism to specify that this lock is for the value of 'global', so the standard defines it such that it applies to all shared memory before the atomic lock operation. So adding
- process3 - wait (atomic_lock() == 1); print (otherglob);
will also provide the expected results. This memory model will always involve some form of synchronization instructions, and potentially waiting on other hardware to complete. It keeps all shared memory accessed synchronized at atomic points, so the programmer should always see the "expected" result without knowing much more about multi-threaded design. This mode would be used by novices, for debugging threaded code, or in a complex system that may be hard to fully understand
The sequential mode has the possibility of being VERY slow if you have a widely distributed system. Thats where the release/acquire model comes in. Proper utilization of it can remove many of the waits present in the sequential model since different processes don't have to wait for all cache flushes, just ones directly related (ie, the thread which set the atomic variable syncronizes with just the thread that performs the read.) You should see the behaviour you expect in a specific thread, but you can't depend on the same ordering being seen in another thread. Often this doesn't matter, so this model is provided to allow code to run more efficiently, but it does require a better understanding of the subtleties of multi-processor side effects in the code you write. This would be used by developers who are more expereienced in multi-threaded coding.
The consume model refinement will be treated the same as release/acquire. The only real difference as far as the compiler is concerned is the actual code which is emitted for synchronization may be less. The optimizers will treat it identical to an acquire .
On the other hand, if you are using an atomic variable simply as a value and don't care about the synchronization aspects (ie, you just want to always see a valid value for the variable), then that maps to the relaxed mode. There may be some academic babble about certain provisions, but this is effectively what it boils down to. The relaxed mode is what you use when you don't care about all that memory flushing and just want to see the values of the atomic itself. This is the fastest model, but make sure you don't depend on the values of other shared variables in other threads. This is also what you get when you use the basic atomic STORE and LOAD macros in C.
The Compiler Flags
The -fmemory-model optimization flags are orthogonal to all this, even though it uses the term memory-model. When a program is written for multi-processing the programmer usually attempts to write it such that there are no data races, otherwise there may be inconsistencies during execution. The flags are used to guarantee the compiler not to introduce various types of data races through transformations.
If you are writing a multithreaded program, you will normally want the -fmemory-model=C++0x option.
If you are utilizing additional data race detection tools, or are unsure of side effects, take the -fmemory-model=safe option.
If you never intend to use more than a single thread, take the -fmemory-model=single option.