This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][PATCH 0/5] arch: atomic rework
- From: Torvald Riegel <triegel at redhat dot com>
- To: paulmck at linux dot vnet dot ibm dot com
- Cc: Linus Torvalds <torvalds at linux-foundation dot org>, Will Deacon <will dot deacon at arm dot com>, Peter Zijlstra <peterz at infradead dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, David Howells <dhowells at redhat dot com>, "linux-arch at vger dot kernel dot org" <linux-arch at vger dot kernel dot org>, "linux-kernel at vger dot kernel dot org" <linux-kernel at vger dot kernel dot org>, "akpm at linux-foundation dot org" <akpm at linux-foundation dot org>, "mingo at kernel dot org" <mingo at kernel dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Thu, 13 Feb 2014 20:43:01 -0800
- Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
- Authentication-results: sourceware.org; auth=none
- References: <20140207165028 dot GO4250 at linux dot vnet dot ibm dot com> <20140207165548 dot GR5976 at mudshark dot cambridge dot arm dot com> <20140207180216 dot GP4250 at linux dot vnet dot ibm dot com> <1391992071 dot 18779 dot 99 dot camel at triegel dot csb> <CA+55aFwTwCPMpYTL_vCgNNP0hE8s2sgB0iw-79=xoj99V0JUNA at mail dot gmail dot com> <1392183564 dot 18779 dot 2187 dot camel at triegel dot csb> <20140212180739 dot GB4250 at linux dot vnet dot ibm dot com> <CA+55aFw3S82GYdtnV2nJCvBGcuZf6kXdF5b7Vp9yb21QKr49Jw at mail dot gmail dot com> <20140213002355 dot GI4250 at linux dot vnet dot ibm dot com> <1392321837 dot 18779 dot 3249 dot camel at triegel dot csb> <20140214020144 dot GO4250 at linux dot vnet dot ibm dot com>
On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote:
> > On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
> > > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
> > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > >
> > > > > Us Linux-kernel hackers will often need to use volatile semantics in
> > > > > combination with C11 atomics in most cases. The C11 atomics do cover
> > > > > some of the reasons we currently use ACCESS_ONCE(), but not all of them --
> > > > > in particular, it allows load/store merging.
> > > >
> > > > I really disagree with the "will need to use volatile".
> > > >
> > > > We should never need to use volatile (outside of whatever MMIO we do
> > > > using C) if C11 defines atomics correctly.
> > > >
> > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway -
> > > > it's called a cache - and there's no actual reason to think that
> > > > "ACCESS_ONCE()" has to mean our current "volatile".
> > > >
> > > > Now, it's possible that the C standards simply get atomics _wrong_, so
> > > > that they create visible semantics that are different from what a CPU
> > > > cache already does, but that's a plain bug in the standard if so.
> > > >
> > > > But merging loads and stores is fine. And I *guarantee* it is fine,
> > > > exactly because CPU's already do it, so claiming that the compiler
> > > > couldn't do it is just insanity.
> > >
> > > Agreed, both CPUs and compilers can merge loads and stores. But CPUs
> > > normally get their stores pushed through the store buffer in reasonable
> > > time, and CPUs also use things like invalidations to ensure that a
> > > store is seen in reasonable time by readers. Compilers don't always
> > > have these two properties, so we do need to be more careful of load
> > > and store merging by compilers.
> >
> > The standard's _wording_ is a little vague about forward-progress
> > guarantees, but I believe the vast majority of the people involved do
> > want compilers to not prevent forward progress. There is of course a
> > difference whether a compiler establishes _eventual_ forward progress in
> > the sense of after 10 years or forward progress in a small bounded
> > interval of time, but this is a QoI issue, and good compilers won't want
> > to introduce unnecessary latencies. I believe that it is fine if the
> > standard merely talks about eventual forward progress.
>
> The compiler will need to earn my trust on this one. ;-)
>
> > > > Now, there are things that are *not* fine, like speculative stores
> > > > that could be visible to other threads. Those are *bugs* (either in
> > > > the compiler or in the standard), and anybody who claims otherwise is
> > > > not worth discussing with.
> > >
> > > And as near as I can tell, volatile semantics are required in C11 to
> > > avoid speculative stores. I might be wrong about this, and hope that
> > > I am wrong. But I am currently not seeing it in the current standard.
> > > (Though I expect that most compilers would avoid speculating stores,
> > > especially in the near term.
> >
> > This really depends on how we define speculative stores. The memory
> > model is absolutely clear that programs have to behave as if executed by
> > the virtual machine, and that rules out speculative stores to volatiles
> > and other locations. Under certain circumstances, there will be
> > "speculative" stores in the sense that they will happen at different
> > times as if you had a trivial implementation of the abstract machine.
> > But to be allowed to do that, the compiler has to prove that such a
> > transformation still fulfills the as-if rule.
>
> Agreed, although the as-if rule would ignore control dependencies, since
> these are not yet part of the standard (as you in fact note below).
> I nevertheless consider myself at least somewhat reassured that current
> C11 won't speculate stores. My remaining concerns involve the compiler
> proving to itself that a given branch is always taken, thus motivating
> it to optimize the branch away -- though this is more properly a
> control-dependency concern.
>
> > IOW, the abstract machine is what currently defines disallowed
> > speculative stores. If you want to put *further* constraints on what
> > implementations are allowed to do, I suppose it is best to talk about
> > those and see how we can add rules that allow programmers to express
> > those constraints. For example, control dependencies might be such a
> > case. I don't have a specific suggestion -- maybe the control
> > dependencies are best tackled similar to consume dependencies (even
> > though we don't have a good solution for those yets). But using
> > volatile accesses for that seems to be a big hammer, or even the wrong
> > one.
>
> In current compilers, the two hammers we have are volatile and barrier().
> But yes, it would be good to have something more focused. One option
> would be to propose memory_order_control loads to see how loudly the
> committee screams. One use case might be as follows:
>
> if (atomic_load(x, memory_order_control))
> atomic_store(y, memory_order_relaxed);
>
> This could also be written:
>
> r1 = atomic_load(x, memory_order_control);
> if (r1)
> atomic_store(y, memory_order_relaxed);
>
> A branch depending on the memory_order_control load could not be optimized
> out, though I suppose that the compiler could substitute a memory-barrier
> instruction for the branch. Seems like it would take a very large number
> of branches to equal the overhead of the memory barrier, though.
>
> Another option would be to flag the conditional expression, prohibiting
> the compiler from optimizing out any conditional branches. Perhaps
> something like this:
>
> r1 = atomic_load(x, memory_order_control);
> if (control_dependency(r1))
> atomic_store(y, memory_order_relaxed);
That's the one I had in mind and talked to you about earlier today. My
gut feeling is that this is preferably over the other because it "marks"
the if-statement, so the compiler knows exactly which branches matter.
I'm not sure one would need the other memory order for that, if indeed
all you want is relaxed -> branch -> relaxed. But maybe there are
corner cases (see the weaker-than-relaxed discussion in SG1 today).