volatile access optimization (C++ / x86_64)

Torvald Riegel triegel@redhat.com
Tue Dec 30 11:05:00 GMT 2014

On Fri, 2014-12-26 at 22:26 +0000, Andrew Haley wrote:
> On 26/12/14 20:32, Matt Godbolt wrote:
> > I'm investigating ways to have single-threaded writers write to memory
> > areas which are then (very infrequently) read from another thread for
> > monitoring purposes. Things like "number of units of work done".
> > 
> > I initially modeled this with relaxed atomic operations. This
> > generates a "lock xadd" style instruction, as I can't convey that
> > there are no other writers.
> > 
> > As best I can tell, there's no memory order I can use to explain my
> > usage characteristics.
> >
> > Giving up on the atomics, I tried volatiles.
> > These are less than ideal as their power is less expressive, but in my
> > instance I am not trying to fight the ISA's reordering; just prevent
> > the compiler from eliding updates to my shared metrics.
> > 
> > GCC's code generation uses a "load; add; store" for volatiles, instead
> > of a single "add 1, [metric]".
> This is correct.
> > http://goo.gl/dVzRSq has the example (which is also at the bottom of my email).
> > 
> > Is there a reason why (in principal) the volatile increment can't be
> > made into a single add? Clang and ICC both emit the same code for the
> > volatile and non-volatile case.
> Yes.  Volatiles use the "as if" rule, where every memory access is as
> written.  a volatile increment is defined as a load, an increment, and
> a store.  If you want single atomic increment, atomics are what you
> should use.  If you want an increment to be written to memory, use a
> store barrier after the increment.

I agree with Andrew.  My understanding of volatile is that the generated
code must do exactly what the abstract machine would do.

One can use volatiles for synchronization if one is also manually adding
HW barriers and potentially compiler barriers (depending on whether you
need to mix volatile and non-volatile) -- but volatiles really aim at a
different use case than atomics.

For the single-writer shared-counter case, a load and a store operation
with memory_order_relaxed seem to be right approach.

More information about the Gcc mailing list