This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][PATCH 0/5] arch: atomic rework

From: "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Linus Torvalds <torvalds at linux-foundation dot org>, Will Deacon <will dot deacon at arm dot com>, Peter Zijlstra <peterz at infradead dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, David Howells <dhowells at redhat dot com>, "linux-arch at vger dot kernel dot org" <linux-arch at vger dot kernel dot org>, "linux-kernel at vger dot kernel dot org" <linux-kernel at vger dot kernel dot org>, "akpm at linux-foundation dot org" <akpm at linux-foundation dot org>, "mingo at kernel dot org" <mingo at kernel dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Wed, 12 Feb 2014 09:39:30 -0800
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Authentication-results: sourceware.org; auth=none
References: <1391730288 dot 23421 dot 4102 dot camel at triegel dot csb> <20140207042051 dot GL4250 at linux dot vnet dot ibm dot com> <20140207074405 dot GM5002 at laptop dot programming dot kicks-ass dot net> <20140207165028 dot GO4250 at linux dot vnet dot ibm dot com> <20140207165548 dot GR5976 at mudshark dot cambridge dot arm dot com> <20140207180216 dot GP4250 at linux dot vnet dot ibm dot com> <1391992071 dot 18779 dot 99 dot camel at triegel dot csb> <CA+55aFwTwCPMpYTL_vCgNNP0hE8s2sgB0iw-79=xoj99V0JUNA at mail dot gmail dot com> <20140211155941 dot GU4250 at linux dot vnet dot ibm dot com> <1392185194 dot 18779 dot 2239 dot camel at triegel dot csb>
Reply-to: paulmck at linux dot vnet dot ibm dot com

On Tue, Feb 11, 2014 at 10:06:34PM -0800, Torvald Riegel wrote:
> On Tue, 2014-02-11 at 07:59 -0800, Paul E. McKenney wrote:
> > On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote:
> > > On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel <triegel@redhat.com> wrote:
> > > >
> > > > Intuitively, this is wrong because this let's the program take a step
> > > > the abstract machine wouldn't do.  This is different to the sequential
> > > > code that Peter posted because it uses atomics, and thus one can't
> > > > easily assume that the difference is not observable.
> > > 
> > > Btw, what is the definition of "observable" for the atomics?
> > > 
> > > Because I'm hoping that it's not the same as for volatiles, where
> > > "observable" is about the virtual machine itself, and as such volatile
> > > accesses cannot be combined or optimized at all.
> > > 
> > > Now, I claim that atomic accesses cannot be done speculatively for
> > > writes, and not re-done for reads (because the value could change),
> > > but *combining* them would be possible and good.
> > > 
> > > For example, we often have multiple independent atomic accesses that
> > > could certainly be combined: testing the individual bits of an atomic
> > > value with helper functions, causing things like "load atomic, test
> > > bit, load same atomic, test another bit". The two atomic loads could
> > > be done as a single load without possibly changing semantics on a real
> > > machine, but if "visibility" is defined in the same way it is for
> > > "volatile", that wouldn't be a valid transformation. Right now we use
> > > "volatile" semantics for these kinds of things, and they really can
> > > hurt.
> > > 
> > > Same goes for multiple writes (possibly due to setting bits):
> > > combining multiple accesses into a single one is generally fine, it's
> > > *adding* write accesses speculatively that is broken by design..
> > > 
> > > At the same time, you can't combine atomic loads or stores infinitely
> > > - "visibility" on a real machine definitely is about timeliness.
> > > Removing all but the last write when there are multiple consecutive
> > > writes is generally fine, even if you unroll a loop to generate those
> > > writes. But if what remains is a loop, it might be a busy-loop
> > > basically waiting for something, so it would be wrong ("untimely") to
> > > hoist a store in a loop entirely past the end of the loop, or hoist a
> > > load in a loop to before the loop.
> > > 
> > > Does the standard allow for that kind of behavior?
> > 
> > You asked!  ;-)
> > 
> > So the current standard allows merging of both loads and stores, unless of
> > course ordring constraints prevent the merging.  Volatile semantics may be
> > used to prevent this merging, if desired, for example, for real-time code.
> 
> Agreed.
> 
> > Infinite merging is intended to be prohibited, but I am not certain that
> > the current wording is bullet-proof (1.10p24 and 1.10p25).
> 
> Yeah, maybe not.  But it at least seems to rather clearly indicate the
> intent ;)

That is my hope.  ;-)

> > The only prohibition against speculative stores that I can see is in a
> > non-normative note, and it can be argued to apply only to things that are
> > not atomics (1.10p22).
> 
> I think this one is specifically about speculative stores that would
> affect memory locations that the abstract machine would not write to,
> and that might be observable or create data races.  While a compiler
> could potentially prove that such stores aren't leading to a difference
> in the behavior of the program (e.g., by proving that there are no
> observers anywhere and this isn't overlapping with any volatile
> locations), I think that this is hard in general and most compilers will
> just not do such things.  In GCC, bugs in that category were fixed after
> researchers doing fuzz-testing found them (IIRC, speculative stores by
> loops).

And that is my fear.  ;-)

> > I don't see any prohibition against reordering
> > a store to precede a load preceding a conditional branch -- which would
> > not be speculative if the branch was know to be taken and the load
> > hit in the store buffer.  In a system where stores could be reordered,
> > some other CPU might perceive the store as happening before the load
> > that controlled the conditional branch.  This needs to be addressed.
> 
> I don't know the specifics of your example, but from how I understand
> it, I don't see a problem if the compiler can prove that the store will
> always happen.

The current Documentation/memory-barriers.txt formulation requires
that both the load and the store have volatile semantics.  Does
that help?

> To be more specific, if the compiler can prove that the store will
> happen anyway, and the region of code can be assumed to always run
> atomically (e.g., there's no loop or such in there), then it is known
> that we have one atomic region of code that will always perform the
> store, so we might as well do the stuff in the region in some order.

And it would be very hard to write a program that proved that the
store had been reordered prior to the load in this case.

> Now, if any of the memory accesses are atomic, then the whole region of
> code containing those accesses is often not atomic because other threads
> might observe intermediate results in a data-race-free way.
> 
> (I know that this isn't a very precise formulation, but I hope it brings
> my line of reasoning across.)
> 
> > Why this hole?  At the time, the current formalizations of popular
> > CPU architectures did not exist, and it was not clear that all popular
> > hardware avoided speculative stores.  
> 
> I'm not quite sure which hole you see there.  Can you elaborate?

Here is one attempt, based on Peter's example later in this thread:

	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

	atomic_int x, y;  /* Default initialization to zero. */
	int r1;

	void T0(void)
	{
		if (atomic_load(&ACCESS_ONCE(x), memory_order_relaxed))
			atomic_store(&ACCESS_ONCE(y), 1, memory_order_relaxed);
	}

	void T1(void)
	{
		r1 = atomic_load(y, memory_order_seq_cst);
		/* Might also need an atomic_thread_fence() here... */
		atomic_store(x, 1, memory_order_seq_cst);
	}

	assert(r1 == 0);

Peter and I would like the assertion to never trigger in this case,
but the current C11 does not seem to guarantee this.  I believe that
the volatile casts forbid the compiler from deciding to omit T0's "if"
even in cases where it could prove that x was always zero.  And given
the store to x, it should not be able to prove constant x here, right?

Again, I believe that current C11 does not guarantee that the assertion
will never trigger, but it would be good if one of the successors to
C11 did make that guarantee.  ;-)

							Thanx, Paul

References:
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Peter Zijlstra
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Will Deacon
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]