This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][PATCH 0/5] arch: atomic rework

From: "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>
To: Linus Torvalds <torvalds at linux-foundation dot org>
Cc: Torvald Riegel <triegel at redhat dot com>, Will Deacon <will dot deacon at arm dot com>, Peter Zijlstra <peterz at infradead dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, David Howells <dhowells at redhat dot com>, "linux-arch at vger dot kernel dot org" <linux-arch at vger dot kernel dot org>, "linux-kernel at vger dot kernel dot org" <linux-kernel at vger dot kernel dot org>, "akpm at linux-foundation dot org" <akpm at linux-foundation dot org>, "mingo at kernel dot org" <mingo at kernel dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Tue, 11 Feb 2014 07:59:42 -0800
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Authentication-results: sourceware.org; auth=none
References: <1391721423 dot 23421 dot 3898 dot camel at triegel dot csb> <20140206221117 dot GJ4250 at linux dot vnet dot ibm dot com> <1391730288 dot 23421 dot 4102 dot camel at triegel dot csb> <20140207042051 dot GL4250 at linux dot vnet dot ibm dot com> <20140207074405 dot GM5002 at laptop dot programming dot kicks-ass dot net> <20140207165028 dot GO4250 at linux dot vnet dot ibm dot com> <20140207165548 dot GR5976 at mudshark dot cambridge dot arm dot com> <20140207180216 dot GP4250 at linux dot vnet dot ibm dot com> <1391992071 dot 18779 dot 99 dot camel at triegel dot csb> <CA+55aFwTwCPMpYTL_vCgNNP0hE8s2sgB0iw-79=xoj99V0JUNA at mail dot gmail dot com>
Reply-to: paulmck at linux dot vnet dot ibm dot com

On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote:
> On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel <triegel@redhat.com> wrote:
> >
> > Intuitively, this is wrong because this let's the program take a step
> > the abstract machine wouldn't do.  This is different to the sequential
> > code that Peter posted because it uses atomics, and thus one can't
> > easily assume that the difference is not observable.
> 
> Btw, what is the definition of "observable" for the atomics?
> 
> Because I'm hoping that it's not the same as for volatiles, where
> "observable" is about the virtual machine itself, and as such volatile
> accesses cannot be combined or optimized at all.
> 
> Now, I claim that atomic accesses cannot be done speculatively for
> writes, and not re-done for reads (because the value could change),
> but *combining* them would be possible and good.
> 
> For example, we often have multiple independent atomic accesses that
> could certainly be combined: testing the individual bits of an atomic
> value with helper functions, causing things like "load atomic, test
> bit, load same atomic, test another bit". The two atomic loads could
> be done as a single load without possibly changing semantics on a real
> machine, but if "visibility" is defined in the same way it is for
> "volatile", that wouldn't be a valid transformation. Right now we use
> "volatile" semantics for these kinds of things, and they really can
> hurt.
> 
> Same goes for multiple writes (possibly due to setting bits):
> combining multiple accesses into a single one is generally fine, it's
> *adding* write accesses speculatively that is broken by design..
> 
> At the same time, you can't combine atomic loads or stores infinitely
> - "visibility" on a real machine definitely is about timeliness.
> Removing all but the last write when there are multiple consecutive
> writes is generally fine, even if you unroll a loop to generate those
> writes. But if what remains is a loop, it might be a busy-loop
> basically waiting for something, so it would be wrong ("untimely") to
> hoist a store in a loop entirely past the end of the loop, or hoist a
> load in a loop to before the loop.
> 
> Does the standard allow for that kind of behavior?

You asked!  ;-)

So the current standard allows merging of both loads and stores, unless of
course ordring constraints prevent the merging.  Volatile semantics may be
used to prevent this merging, if desired, for example, for real-time code.
Infinite merging is intended to be prohibited, but I am not certain that
the current wording is bullet-proof (1.10p24 and 1.10p25).

The only prohibition against speculative stores that I can see is in a
non-normative note, and it can be argued to apply only to things that are
not atomics (1.10p22).  I don't see any prohibition against reordering
a store to precede a load preceding a conditional branch -- which would
not be speculative if the branch was know to be taken and the load
hit in the store buffer.  In a system where stores could be reordered,
some other CPU might perceive the store as happening before the load
that controlled the conditional branch.  This needs to be addressed.

Why this hole?  At the time, the current formalizations of popular
CPU architectures did not exist, and it was not clear that all popular
hardware avoided speculative stores.  

There is also fun with "out of thin air" values, which everyone agrees
should be prohibited, but where there is not agreement on how to prohibit
them in a mathematically constructive manner.  The current draft contains
a clause simply stating that out-of-thin-air values are prohibited,
which doesn't help someone constructing tools to analyze C++ code.
One proposal requires that subsequent atomic stores never be reordered
before prior atomic loads, which requires useless ordering code to be
emitted on ARM and PowerPC (you may have seen Will Deacon's and Peter
Zijlstra's reaction to this proposal a few days ago).  Note that Itanium
already pays this price in order to provide full single-variable cache
coherence.  This out-of-thin-air discussion is also happening in the
Java community in preparation for a new rev of the Java memory model.

There will also be some discussions on memory_order_consume, which is
intended to (eventually) implement rcu_dereference().  The compiler
writers don't like tracking dependencies, but there may be some ways
of constraining optimizations to preserve the common dependencies that,
while providing some syntax to force preservation of dependencies that
would normally be optimized out.  One example of this is where you have an
RCU-protected array that might sometimes contain only a single element.
In the single-element case, the compiler knows a priori which element
will be used, and will therefore optimize the dependency away, so that
the reader might see pre-initialization state.  But this is rare, so
if added syntax needs to be added in this case, I believe we should be
OK with it.  (If syntax is needed for plain old dereferences, it is
thumbs down all the way as far as I am concerned.  Ditto for things
like stripping the bottom bits off of a decorated pointer.)

No doubt other memory-model issues will come up, but those are the ones
I know about at the moment.

As I said to begin with, hey, you asked!

That said, I would very much appreciate any thoughts or suggestions on
handling these issues.

							Thanx, Paul

Follow-Ups:
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel

References:
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Peter Zijlstra
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Will Deacon
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]