This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][PATCH 0/5] arch: atomic rework

From: Linus Torvalds <torvalds at linux-foundation dot org>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Alec Teal <a dot teal at warwick dot ac dot uk>, Paul McKenney <paulmck at linux dot vnet dot ibm dot com>, Will Deacon <will dot deacon at arm dot com>, Peter Zijlstra <peterz at infradead dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, David Howells <dhowells at redhat dot com>, "linux-arch at vger dot kernel dot org" <linux-arch at vger dot kernel dot org>, "linux-kernel at vger dot kernel dot org" <linux-kernel at vger dot kernel dot org>, "akpm at linux-foundation dot org" <akpm at linux-foundation dot org>, "mingo at kernel dot org" <mingo at kernel dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Tue, 18 Feb 2014 08:49:13 -0800
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Authentication-results: sourceware.org; auth=none
References: <20140207180216 dot GP4250 at linux dot vnet dot ibm dot com> <1391992071 dot 18779 dot 99 dot camel at triegel dot csb> <CA+55aFwTwCPMpYTL_vCgNNP0hE8s2sgB0iw-79=xoj99V0JUNA at mail dot gmail dot com> <1392183564 dot 18779 dot 2187 dot camel at triegel dot csb> <20140212180739 dot GB4250 at linux dot vnet dot ibm dot com> <CA+55aFw3S82GYdtnV2nJCvBGcuZf6kXdF5b7Vp9yb21QKr49Jw at mail dot gmail dot com> <20140213002355 dot GI4250 at linux dot vnet dot ibm dot com> <1392321837 dot 18779 dot 3249 dot camel at triegel dot csb> <20140214020144 dot GO4250 at linux dot vnet dot ibm dot com> <1392352981 dot 18779 dot 3800 dot camel at triegel dot csb> <20140214172920 dot GQ4250 at linux dot vnet dot ibm dot com> <CA+55aFx9CbgrfK4rBVYD75y2KoWiO90dSYsAW83O-tYVLK-gkg at mail dot gmail dot com> <CA+55aFypfiTFwundih8QEA6ZwVGk=g5L4sabsN0932eih5knOQ at mail dot gmail dot com> <1392486310 dot 18779 dot 6447 dot camel at triegel dot csb> <CA+55aFwTrt_6m1inNHQkk74i7uPkHNnacwHiBgioZSXieAs5Sw at mail dot gmail dot com> <1392666947 dot 18779 dot 6838 dot camel at triegel dot csb> <CA+55aFwUnRVk6q3VZeYjWfduoHcExW=Pht6jgp=4bBSaLHNPMA at mail dot gmail dot com> <530296CD dot 5050503 at warwick dot ac dot uk> <CA+55aFyi8QWge7QR0M+ABH-kEiwvoEyMhK6GWRvN1YNKJAFuSQ at mail dot gmail dot com> <1392737465 dot 18779 dot 7644 dot camel at triegel dot csb>

On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel <triegel@redhat.com> wrote:
> On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote:
>> And exactly because I know enough, I would *really* like atomics to be
>> well-defined, and have very clear - and *local* - rules about how they
>> can be combined and optimized.
>
> "Local"?

Yes.

So I think that one of the big advantages of atomics over volatile is
that they *can* be optimized, and as such I'm not at all against
trying to generate much better code than for volatile accesses.

But at the same time, that can go too far. For example, one of the
things we'd want to use atomics for is page table accesses, where it
is very important that we don't generate multiple accesses to the
values, because parts of the values can be change *by*hardware* (ie
accessed and dirty bits).

So imagine that you have some clever global optimizer that sees that
the program never ever actually sets the dirty bit at all in any
thread, and then uses that kind of non-local knowledge to make
optimization decisions. THAT WOULD BE BAD.

Do you see what I'm aiming for? Any optimization that tries to prove
anything from more than local state is by definition broken, because
it assumes that everything is described by the program.

But *local* optimizations are fine, as long as they follow the obvious
rule of not actually making changes that are semantically visible.

(In practice, I'd be impressed as hell for that particular example,
and we actually do end up setting the dirty bit by hand in some
situations, so the example is slightly made up, but there are other
cases that might be more realistic in that sometimes we do things that
are hidden from the compiler - in assembly etc - and the compiler
might *think* it knows what is going on, but it doesn't actually see
all accesses).

> Sorry, but the rules *are* very clear.  I *really* suggest to look at
> the formalization by Batty et al.  And in these rules, proving that a
> read will always return value X has a well-defined meaning, and you can
> use it.  That simply follows from how the model is built.

What "model"?

That's the thing. I have tried to figure out whether the model is some
abstract C model, or a model based on the actual hardware that the
compiler is compiling for, and whether the model is one that assumes
the compiler has complete knowledge of the system (see the example
above).

And it seems to be a mixture of it all. The definitions of the various
orderings obviously very much imply that the compiler has to insert
the proper barriers or sequence markers for that architecture, but
then there is no apparent way to depend on any *other* architecture
ordering guarantees. Like our knowledge that all architectures (with
the exception of alpha, which really doesn't end up being something we
worry about any more) end up having the load dependency ordering
guarantee.

> What you seem to want just isn't covered by the model as it is today --
> you can't infer from that that the model itself would be wrong.  The
> dependency chains aren't modeled in the way you envision it (except in
> what consume_mo tries, but that seems to be hard to implement); they are
> there on the level of the program logic as modeled by the abstract
> machine and the respective execution/output rules, but they are not
> built to represent those specific ordering guarantees the HW gives you.

So this is a problem. It basically means that we cannot do the kinds
of things we do now, which very much involve knowing what the memory
ordering of a particular machine is, and combining that knowledge with
our knowledge of code generation.

Now, *most* of what we do is protected by locking and is all fine. But
we do have a few rather subtle places in RCU and in the scheduler
where we depend on the actual dependency chain.

In *practice*, I seriously doubt any reasonable compiler can actually
make a mess of it. The kinds of optimizations that would actually
defeat the dependency chain are simply not realistic. And I suspect
that will end up being what we rely on - there being no actual sane
sequence that a compiler would ever do, even if we wouldn't have
guarantees for some of it.

And I suspect  I can live with that. We _have_ lived with that for the
longest time, after all. We very much do things that aren't covered by
any existing C standard, and just basically use tricks to coax the
compiler into never generating code that doesn't work (with our inline
asm barriers etc being a prime example).

> I would also be cautious claiming that the rules you suggested would be
> very clear and very simple.  I haven't seen a memory model spec from you
> that would be viable as the standard model for C/C++, nor have I seen
> proof that this would actually be easier to understand for programmers
> in general.

So personally, if I were to write the spec, I would have taken a
completely different approach from what the standard obviously does.

I'd have taken the approach of specifying the required semantics each
atomic op (ie the memory operations would end up having to be
annotated with their ordering constraints), and then said that the
compiler can generate any code that is equivalent to that
_on_the_target_machine_.

Why? Just to avoid the whole "ok, which set of rules applies now" problem.

>> For example, CPU people actually do tend to give guarantees for
>> certain things, like stores that are causally related being visible in
>> a particular order.
>
> Yes, but that's not part of the model so far.  If you want to exploit
> this, please make a suggestion for how to extend the model to cover
> this.

See above. This is exactly why I detest the C "model" thing. Now you
need ways to describe those CPU guarantees, because if you can't
describe them, you can't express them in the model.

I would *much* have preferred the C standard to say that you have to
generate code that is guaranteed to act the same way - on that machine
- as the "naive direct unoptimized translation".

IOW, I would *revel* in the fact that different machines are
different, and basically just describe the "stupid" code generation.
You'd get the guaranteed semantic baseline, but you'd *also* be able
to know that whatever architecture guarantees you have would remain.
Without having to describe those architecture issues.

It would be *so* nice if the C standard had done that for pretty much
everything that is "implementation dependent". Not just atomics.

[ will look at the rest of your email later ]

                 Linus

Follow-Ups:
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel

References:
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Paul E. McKenney
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Alec Teal
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Linus Torvalds
- Re: [RFC][PATCH 0/5] arch: atomic rework
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]