This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: should sync builtins be full optimization barriers?


On Sep 11, 2011, at 10:12, Andrew MacLeod wrote:

>> To be honest, I can't quite see the use of completely unordered
>> atomic operations, where we not even prohibit compiler optimizations.
>> It would seem if we guarantee that a variable will not be accessed
>> concurrently from any other thread, we wouldn't need the operation
>> to be atomic in the first place. That said, it's quite likely I'm
>> missing something here.
>> 
> there is no guarantee it isnt being accessed concurrently,  we are only guaranteeing that if it is accessed from another thread, it wont be a partially written value...  if you read a 64 bit value on a 32 bit machine, you need to guarantee that both halves are fully written before any read can happen. Thats the bare minimum guarantee of an atomic.

OK, I now see (in §1.10(5) of the n3225 draft) that “relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races. 

However the next paragraph says: 
All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. [...] There is a separate order for each atomic object. There is no requirement that these can be combined into a single total order for all objects. In general this will be impossible since different threads may observe modifications to different objects in inconsistent orders.

So, if I understand correctly, then operations using relaxed memory order will still need fences, but indeed do not require any optimization barrier. For memory_order_seq_cst we'll need a full barrier, and for the others there is a partial barrier.

Also, for relaxed order atomic operations we would only need a single fence between two accesses (by a thread) to the same atomic object. 
> 
>> For Ada, all atomic accesses are always memory_order_seq_cst, and we
>> just care about being able to optimize accesses if we know they'll be
>> done from the same processor. For the C++11 model, thinking about
>> the semantics of any memory orders other than memory_order_seq_cst
>> and their interaction with operations with different ordering semantics
>> makes my head hurt.
> I had many headaches over a long period wrapping my head around it, but ultimately it maps pretty closely to various hardware implementations. Best bet?  just use seq-cst until you discover you have a  performance problem!!  I expect thats why its the default :-)

We've already discovered that. Atomic types are used quite a bit in Ada code. Unfortunately, many of the uses are just for accesses to memory-mapped I/O devices, single write. On many systems I/O locations can't be used for synchronization anyway, and only regular cacheable memory can be used for that.

For such operations you don't want the compiler to reorder accesses to different I/O locations, but mutual exclusion wrt. other threads is already taken care of. It seems this is precisely the opposite from what the relaxed memory order provides.

Regards,
  -Geert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]