This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Implementing atomic load as compare-and-swap for read-only memory
- From: Torvald Riegel <triegel at redhat dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, gcc at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>
- Date: Fri, 03 Jun 2016 15:44:37 +0200
- Subject: Re: Implementing atomic load as compare-and-swap for read-only memory
- Authentication-results: sourceware.org; auth=none
- References: <57514F17 dot 2090007 at foss dot arm dot com> <20160603100300 dot GE7387 at tucnak dot redhat dot com> <1464956769 dot 17104 dot 295 dot camel at localhost dot localdomain> <20160603123226 dot GH7387 at tucnak dot redhat dot com> <57517C1F dot 4030509 at foss dot arm dot com>
On Fri, 2016-06-03 at 13:46 +0100, Kyrill Tkachov wrote:
> Hi Jakub, Torvald,
>
> On 03/06/16 13:32, Jakub Jelinek wrote:
> > On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> >> And that would be fine, IMO. If you can't even load atomically, doing
> >> something useful with this type will be hard except in special cases.
> >> Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> >> the cache line in exclusive mode can be a lot more costly than what
> >> users might expect from a load. A short critical section might not be
> >> much slower.
> >>
> >> If you only have a CAS as base of the atomic operations on a type, then
> >> a CAS operation exposed to the user will still be a just a single HW
> >> CAS. But any other operation besides the CAS and a load will need *two*
> >> CAS operations; even an atomic store has to be implemented as a CAS
> >> loop.
> > Would we just stop expanding all those __atomic_*/__sync_* builtins inline
> > then (which would IMHO break tons of stuff), or just some predicate that
> > atomic.h/atomic headers use?
> >
> >>> But doesn't that mean you should fall back to locked operation also for any
> >>> other atomic operation on such types, because otherwise if you atomic_store
> >>> or any other kind of atomic operation, it wouldn't use the locking, while
> >>> for atomic load it would?
> >> I suppose you mean that one must fall back to using locking for all
> >> operations? If load isn't atomic, then it can't be made atomic using
> >> external locks if the other operations don't use the locks.
> >>
> >>> That would be an ABI change and quite significant
> >>> pessimization in many cases.
> >> A change from wide CAS to locking would be an ABI change I suppose, but
> >> it could also be considered a necessary bugfix if we don't want to write
> >> to read-only memory. Does this affect anything but i686?
> > Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
> > (judging from who initiated this thread), I bet there are many others.
>
> I'm looking at pre-LPAE ARMv7-A targets for which the
> ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends:
> "The way to atomically load two 32-bit quantities is to perform a
> LDREXD/STREXD sequence, reading and writing the same value, for which the
> STREXD succeeds, and use the read values."
>
> Currently we emit just a single load-doubleword-exclusive which, according to the above,
> would not be enough on such targets.
If the store is idempotent, will the hardware actually execute a store
that could conflict with a read-only memory mapping?
In any case, I assume they would have to at least enforce the ordering
constraints that might be caused with such a store, because a program
could use an idempotent intentially just to get ordering.
A similar question applies to archs with CAS instructions, but I'd
assume they'd always create a visible store. I'm not quite sure whether
current HW would also always put the cacheline into exclusive mode even
when the CAS would fail because the expected value doesn't match the
actual one. Do the arch maintainers know something about this?
> On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 70814).
Note that the bug reporter also states that llvm recently changed to
calling libatomic functions.