full memory barrier?

Hei Chan structurechart@yahoo.com
Mon Apr 11 22:11:00 GMT 2011


Thanks for your reply.

You mentioned the statement "is a compiler scheduling barrier for all 
expressions that load from or store values to memory".  Does "memory" mean the 
main memory?  Or does it include the CPU cache?



----- Original Message ----
From: Ian Lance Taylor <iant@google.com>
To: Hei Chan <structurechart@yahoo.com>
Cc: gcc-help@gcc.gnu.org
Sent: Mon, April 11, 2011 2:42:07 PM
Subject: Re: full memory barrier?

Hei Chan <structurechart@yahoo.com> writes:

> I am a little bit confused what asm volatile ("" : : : "memory") does.
>
> I searched online; many people said that it creates the "full memory barrier".
>
> I have a test code:
> int main() {
>         bool bar;
>         asm volatile ("" : : : "memory");
>         bar = true;
>         return 1;
> }
>
> Running g++ -c -g -Wa,-a,-ad foo.cpp gives me:
>
>    2:foo.cpp       ****         bool bar;
>    3:foo.cpp       ****         asm volatile ("" : : : "memory");
>   22                            .loc 1 3 0
>    4:foo.cpp       ****         bar = true;
>   23                            .loc 1 4 0
>
> It doesn't involve any fence instruction.
>
> Maybe I completely misunderstand the idea of "full memory barrier".

The definition of "memory barrier" is ambiguous when looking at code
written in a high-level language.

The statement "asm volatile ("" : : : "memory");" is a compiler
scheduling barrier for all expressions that load from or store values to
memory.  That means something like a pointer dereference, an array
index, or an access to a volatile variable.  It may or may not include a
reference to a local variable, as a local variable need not be in
memory.

This kind of compiler scheduling barrier can be used in conjunction with
a hardware memory barrier.  The compiler doesn't know that a hardware
memory barrier is special, and it will happily move memory access
instructions across the hardware barrier.  Therefore, if you want to use
a hardware memory barrier in compiled code, you must use it along with a
compiler scheduling barrier.

On the other hand a compiler scheduling barrier can be useful even
without a hardware memory barrier.  For example, in a coroutine based
system with multiple light-weight threads running on a single processor,
you need a compiler scheduling barrier, but you do not need a hardware
memory barrier.

gcc will generate a hardware memory barrier if you use the
__sync_synchronize builtin function.  That function acts as both a
hardware memory barrier and a compiler scheduling barrier.

Ian



More information about the Gcc-help mailing list