This is the mail archive of the
mailing list for the GCC project.
Re: GCC libatomic ABI specification draft
- From: Torvald Riegel <triegel at redhat dot com>
- To: Bin Fan <bin dot x dot fan at oracle dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 17 Jan 2017 18:00:05 +0100
- Subject: Re: GCC libatomic ABI specification draft
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On Thu, 2016-11-17 at 12:12 -0800, Bin Fan wrote:
> On 11/14/2016 4:34 PM, Bin Fan wrote:
> > Hi All,
> > I have an updated version of libatomic ABI specification draft. Please
> > take a look to see if it matches GCC implementation. The purpose of
> > this document is to establish an official GCC libatomic ABI, and allow
> > compatible compiler and runtime implementations on the affected
> > platforms.
Thanks for the update, and sorry for the late reply. Comments below.
> > - Rewrite section 3 to replace "lock-free" operations with "hardware
> > backed" instructions. The digest of this section is: 1) inlineable
> > atomics must be implemented with the hardware backed atomic
> > instructions. 2) for non-inlineable atomics, the compiler must
> > generate a runtime call, and the runtime support function is free to
> > use any implementation.
I still think that using hardware-backed instructions for a particular
type requires that there is a true atomic load instruction for that
type. Emulating a load with an idempotent store (eg, cmpxchg16b) is not
One could argue that an idempotent atomic HW store such as a cmpxchg16b
in a loop is indeed lock-free. However, IMO the intention behind
"lock-free" atomics in C and C++ is to offer atomics that are both
lock-free *and* as fast as one would assume for a fully HW-backed
solution for atomic accesses. This includes that loads must be cheaper
than stores, in particular under contention / concurrent accesses by
I believe that "fast" is much more often part of the motivation for
using lock-free atomics than the actual "lock-free", so the
progress-guarantee aspect (which isn't even lock-free but
obstruction-free, see below). If we do see a sufficiently strong need
for lock-free atomics, which should build something just for that (eg,
if removing the address-free requirement, we can support lock-free (in
the progress-guarantee sense) operations for a lot more types).
Also, while that previous issue is "just" a performance issue, the fact
that we could issue a store when calling to atomic_load() is a
correctness issue, I think.
One example are volatile atomic loads; while C/C++ don't really
constrain what a volatile load needs to be in the underlying
implementation, I think most users would assume that a load really means
a hardware load instruction of some sort, and nothing else. cmpxchg16b
conflicts with such an assumption.
Another example is read-only mapped memory.
Bottom line: we shouldn't rely solely on cmpxchg16b and similar.
(Though this doesn't necessarily mean that there can't be compiler flags
that enable its use.)
I think the ABI should set a baseline for each architecture, and the
baseline decides whether something is inlinable or not. Thus, the
x86_64 ABI would make __int128 operations not imlinable (because of the
issues with cmpxchg16b, see above).
If users want to use capabilities beyond the baseline, they can choose
to use flags that alter/extend the ABI. For example, if they use a flag
that explicitly enables the use of cmpxchg16b for atomics, they also
need to use a libatomic implementation built in the same way (if
possible). This then creates a new ABI(-variant), basically.
I've made a few tests on my x86_64 machine a few weeks ago, and I didn't
see cmpxchg16b being used. IIRC, I also looked at libatomic and didn't
see it (but I don't remember for sure). Either way, if I should have
been wrong, and we are using cmpxchg16b for loads, this should be fixed.
Ideally, this should be fixed before the stage 3 deadline this Friday.
Such a fix might potentially break existing uses, but the earlier we fix
this, the better.
Section 3 Rationale, alternative 1: I'm wondering if the example is
correct. For a 4-byte-aligned type of size 3, the implementation cannot
simply use 4-byte hardware-backed atomics because this will inevitably
touch the 4th byte I think, and the implementation can't know whether
this is padding or not. Or do we expect that things like packed structs
N3.1: Why do you assume that 8-byte HW atomics are available on i386?
Because cmpxchg8b is available for CPUs that are the lowest i?86 we
still intend to support?
I'd also use "hardware-backed" instead of "hardware backed".
> > - The Rationale section in section 3 is also revised to remove the
> > mentioning of "lock-free", but there is not major change of concept.
> > - Add note N3.1 to emphasize the assumption of general hardware
> > supported atomic instruction
> > - Add note N3.2 to discuss the issues of cmpxchg16b
> > - Add a paragraph in section 4.1 to specify memory_order_consume must
> > be implemented through memory_order_acquire. Section 4.2 emphasizes it
> > again.
> > - The specification of each runtime functions mostly maps to the
> > corresponding generic functions in the C11 standard. Two functions are
> > worth noting:
> > 1) C11 atomic_compare_exchange compares and updates the "value" while
> > __atomic_compare_exchange functions in this ABI compare and update the
> > "memory", which implies the memcmp and memcpy semantics.
In Section 4, parts about atomic_compare_exchange: should there be a
back-reference to the memcmp point made earlier in the document?
> > 2) The specification of __atomic_is_lock_free allows both a per-object
> > result and a per-type result. A per-type implementation could pass
> > NULL, or a faked address as the address of the object. A per-object
> > implementation could pass the actual address of the object.
The __atomic_is_lock_free description should specify that "lock-free"
refers to the definition of "lock-free" in C++14, which includes
"address-free". I'm referring to C++14 specifically because this
contains an update which is relevant for (1) LL/SC-based architectures
(ie, that "lock-free" is actually what is called obstruction-free in the
literature) and (2) for any libatomic implementation that wants to use
HW atomics for things like the example in Section 3's Rationale,
alternative 1 (see above).
This ABI needs to also specify how hardware-backed atomics are
implemented on a particular architecture. For example, on architectures
where there is more than one choice for how to certain memory orders
(eg, ARM), the ABI should pick a certain mapping. I guess this should
be a note in Section 4, maybe as a separate subsection and/or an
additional note around the memory_order enum description; I'd keep the
note about implementing something equivalent to C11/C++11 semantics.
What we would document is something like the possible mappings discussed
There are typos in Section 2.4.