This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing atomic load as compare-and-swap for read-only memory

On Fri, 2016-06-03 at 13:46 +0100, Kyrill Tkachov wrote:
> Hi Jakub, Torvald,
> On 03/06/16 13:32, Jakub Jelinek wrote:
> > On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> >> And that would be fine, IMO.  If you can't even load atomically, doing
> >> something useful with this type will be hard except in special cases.
> >> Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> >> the cache line in exclusive mode can be a lot more costly than what
> >> users might expect from a load.  A short critical section might not be
> >> much slower.
> >>
> >> If you only have a CAS as base of the atomic operations on a type, then
> >> a CAS operation exposed to the user will still be a just a single HW
> >> CAS.  But any other operation besides the CAS and a load will need *two*
> >> CAS operations; even an atomic store has to be implemented as a CAS
> >> loop.
> > Would we just stop expanding all those __atomic_*/__sync_* builtins inline
> > then (which would IMHO break tons of stuff), or just some predicate that
> > atomic.h/atomic headers use?
> >
> >>> But doesn't that mean you should fall back to locked operation also for any
> >>> other atomic operation on such types, because otherwise if you atomic_store
> >>> or any other kind of atomic operation, it wouldn't use the locking, while
> >>> for atomic load it would?
> >> I suppose you mean that one must fall back to using locking for all
> >> operations?  If load isn't atomic, then it can't be made atomic using
> >> external locks if the other operations don't use the locks.
> >>
> >>> That would be an ABI change and quite significant
> >>> pessimization in many cases.
> >> A change from wide CAS to locking would be an ABI change I suppose, but
> >> it could also be considered a necessary bugfix if we don't want to write
> >> to read-only memory.  Does this affect anything but i686?
> > Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
> > (judging from who initiated this thread), I bet there are many others.
> I'm looking at pre-LPAE ARMv7-A targets for which the
> ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends:
> "The way to atomically load two 32-bit quantities is to perform a
> LDREXD/STREXD sequence, reading and writing the same value, for which the
> STREXD succeeds, and use the read values."
> Currently we emit just a single load-doubleword-exclusive which, according to the above,
> would not be enough on such targets.

If the store is idempotent, will the hardware actually execute a store
that could conflict with a read-only memory mapping?
In any case, I assume they would have to at least enforce the ordering
constraints that might be caused with such a store, because a program
could use an idempotent intentially just to get ordering.

A similar question applies to archs with CAS instructions, but I'd
assume they'd always create a visible store.  I'm not quite sure whether
current HW would also always put the cacheline into exclusive mode even
when the CAS would fail because the expected value doesn't match the
actual one.  Do the arch maintainers know something about this?

> On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 70814).

Note that the bug reporter also states that llvm recently changed to
calling libatomic functions.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]