This is the mail archive of the
mailing list for the GCC project.
Re: volatile access optimization (C++ / x86_64)
- From: Matt Godbolt <matt at godbolt dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Andrew Haley <aph at redhat dot com>, GCC Development <gcc at gcc dot gnu dot org>
- Date: Tue, 30 Dec 2014 12:32:34 -0600
- Subject: Re: volatile access optimization (C++ / x86_64)
- Authentication-results: sourceware.org; auth=none
- References: <CAFWXXN3quEdSnaoWuPcQn2k-F99Yaw+6=NqgFgcu9ABpv5ZD3Q at mail dot gmail dot com> <549DE09B dot 8060502 at redhat dot com> <1419937501 dot 21112 dot 168 dot camel at triegel dot csb>
On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel <email@example.com> wrote:
> I agree with Andrew. My understanding of volatile is that the generated
> code must do exactly what the abstract machine would do.
That makes sense. I suppose I don't understand what the difference is
in terms of an abstract machine of "load; add; store" versus the
"load-add-store". At least from on x86, from the perspective of the
memory bus, there's no difference I'm aware of.
> One can use volatiles for synchronization if one is also manually adding
> HW barriers and potentially compiler barriers (depending on whether you
> need to mix volatile and non-volatile) -- but volatiles really aim at a
> different use case than atomics.
Again, the processor's reordering and memory barriers are not of huge
concern to me in this instance. I completely agree about volatile
being the wrong use case.
> For the single-writer shared-counter case, a load and a store operation
> with memory_order_relaxed seem to be right approach.
I agree: this most closely models my intention: a non-atomic-increment
but which has the semantics of being visible to other threads in a
finite period of time (as per your previous email).
The relaxed-load; add; relaxed-store generates the same code as the
volatile code (as in; three separate instructions), but I prefer it
over the volatile as it is more intention-revealing. As to whether
it's valid to peephole optimize the three instructions to be a single
increment in the case of x86 given relaxed memory ordering, I can
offer no good opinion (though my instinct is it should be able to be!)
Thanks all for your help, Matt