volatile access optimization (C++ / x86_64)

Paul_Koning@Dell.com Paul_Koning@Dell.com
Sat Dec 27 16:03:00 GMT 2014

> On Dec 26, 2014, at 6:19 PM, Andrew Haley <aph@redhat.com> wrote:
> On 26/12/14 22:49, Matt Godbolt wrote:
>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley <aph@redhat.com> wrote:
>>> On 26/12/14 20:32, Matt Godbolt wrote:
>>>> Is there a reason why (in principal) the volatile increment can't be
>>>> made into a single add? Clang and ICC both emit the same code for the
>>>> volatile and non-volatile case.
>>> Yes.  Volatiles use the "as if" rule, where every memory access is as
>>> written.  a volatile increment is defined as a load, an increment, and
>>> a store.
>> That makes sense to me from a logical point of view. My
>> understanding though is the volatile keyword was mainly used when
>> working with memory-mapped devices, where memory loads and stores
>> could not be elided. A single-instruction load-modify-write like
>> "increment [addr]" adheres to these constraints even though it is a
>> single instruction.  I realise my understanding could be wrong here!
>> If not though, both clang and icc are taking a short-cut that may
>> puts them into non-compliant state.
> It's hard to be certain.  The language used by the standard is very
> unhelpful: it requires all accesses to be as written, but does not
> define exactly what constitutes an access.

I would look at this sort of thing with the mindset of a network protocol designer.  If the externally visible actions are correct, the implementation is correct.  Details not visible at the external reference interface are irrelevant.

In the case of volatile variables, the external interface in question is the one at the point where that address is implemented — a memory cell, or memory mapped I/O device on a bus.  So the required behavior is that load and store operations (read and write transactions at that interface) occur as written.

If a processor has add instructions that support memory references (as in x86 and vax, but not mips), such an instruction will perform a read cycle followed by a write cycle.  So as seen at the critical interface, the behavior is the same as if you were to do an explicit load, register add, store sequence.  Therefore the use of a single add-to-memory is a valid implementation.


More information about the Gcc mailing list