It is worth noting that the impact on the optimizer is slightly different with the different models. Although the existence of an atomic load or store operations may act as an optimization barrier, some optimizations can be performed on the atomic operations themselves if they are not in sequentially consistent mode.
For instance, dead store removal can be performed:
x.store (a, memory_order_release); x.store (b, memory_order_release);
It is perfectly safe to note that storing 'a' into 'x' is in fact a dead store, and remove it. Its also possible to move statements around, subject to the various dependencies imposed by the memory model mode.
w = 5; x.store (a, memory_order_release); z = 10; b = y.load (memory_order_acquire); foo() x.store (z + 5, memory_order_release);
The first store to 'x' can be moved further down in the program, but it must maintain the ordering that 'w = 5' happens before it. It can be moved past the load of 'y' since they are not dependent and it is using aquire/release mode. It cannot be moved down past the call to foo() however. So it is acceptable to reorder this example to :
z = 10; w = 5; b = y.load (memory_order_acquire); x.store (a, memory_order_release); foo() x.store (z + 5, memory_order_release);
So there are still optimization opportunities on atomic operations, just as there are with normal variables. Normal variables are optimized within a framework of data dependencies, and there are restrictions based on those edges (ie, a store to 'x' cannot be moved before a load from 'x'.) Likewise, there is similar web of dependencies between atomic loads and stores created by the happens-before and dependent-on relationships specified by each of the modes.
There is a newer write up with all the optimization details. Note this is still a work in progress so it is not complete.