This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: should sync builtins be full optimization barriers?

From: Geert Bosch <bosch at adacore dot com>
To: Paolo Bonzini <bonzini at gnu dot org>
Cc: Andrew MacLeod <amacleod at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>, GCC Mailing List <gcc at gcc dot gnu dot org>, Aldy Hernandez <aldyh at redhat dot com>
Date: Mon, 12 Sep 2011 14:40:07 -0400
Subject: Re: should sync builtins be full optimization barriers?
References: <4E69C942.3090808@gnu.org> <20110909081705.GT2687@tyan-ft48-01.lab.bos.redhat.com> <5F13A1A0-79E5-4733-B543-4A6F6311A247@adacore.com> <4E6CC1E4.5000000@redhat.com> <93C7346D-DC47-4C6B-9755-EF438D82DDEA@adacore.com> <4E6DAE90.3070202@gnu.org>

On Sep 12, 2011, at 03:02, Paolo Bonzini wrote:

> On 09/11/2011 09:00 PM, Geert Bosch wrote:
>> So, if I understand correctly, then operations using relaxed memory
>> order will still need fences, but indeed do not require any
>> optimization barrier. For memory_order_seq_cst we'll need a full
>> barrier, and for the others there is a partial barrier.
> 
> If you do not need an optimization barrier, you do not need a processor barrier either, and vice versa.  Optimizations are just another factor that can lead to reordered loads and stores.

Assuming that statement is true, that would imply that even for relaxed ordering there has to be an optimization barrier. Clearly fences need to be used for any atomic accesses, including those with relaxed memory order.

Consider 4 threads and an atomic int x:

thread 1  thread 2  thread 3  thread 4
--------  --------  --------  --------
  x=1;      r1=x      x=3;      r3=x;
  x=2;      r2=x      x=4;      r4=x;

Even with relaxed memory ordering, all modifications to x have to occur in some particular total order, called  the modification order of x.

So, even if each thread preserves its store order, the modification order of x can be any of:
  1,2,3,4
  1,3,2,4
  1,3,4,2
  3,1,2,4
  3,1,4,2
  3,4,1,2

Because there is a single modification order for x, it would be an error for thread 2 and thread 4 to see a different update order.

So, if r1==2,r2==3 and r3==4,r4==1, that would be an error. However, without fences, this can easily happen on an SMP machine, even one with a nice memory model such as the x86.

IIUC, the relaxed memory model mostly seems to allow movement (by compiler and CPU) of unrelated memory operations, but still requires fences between subsequent atomic operations on the same object. 

In other words, while atomic operations with relaxed memory order on some atomic object X cannot be used to synchronize any operations on objects other than X, they themselves cannot cause data races.

  -Geert

Follow-Ups:
- Re: should sync builtins be full optimization barriers?
  - From: Paolo Bonzini
- Re: should sync builtins be full optimization barriers?
  - From: Andrew MacLeod

References:
- should sync builtins be full optimization barriers?
  - From: Paolo Bonzini
- Re: should sync builtins be full optimization barriers?
  - From: Jakub Jelinek
- Re: should sync builtins be full optimization barriers?
  - From: Geert Bosch
- Re: should sync builtins be full optimization barriers?
  - From: Andrew MacLeod
- Re: should sync builtins be full optimization barriers?
  - From: Geert Bosch
- Re: should sync builtins be full optimization barriers?
  - From: Paolo Bonzini

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]