This is the mail archive of the
mailing list for the GCC project.
Re: GCC libatomic ABI specification draft
- From: Torvald Riegel <triegel at redhat dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: Bin Fan <bin dot x dot fan at oracle dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Thu, 19 Jan 2017 16:02:49 +0100
- Subject: Re: GCC libatomic ABI specification draft
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On Wed, 2017-01-18 at 14:23 -0800, Richard Henderson wrote:
> On 01/17/2017 09:00 AM, Torvald Riegel wrote:
> > I think the ABI should set a baseline for each architecture, and the
> > baseline decides whether something is inlinable or not. Thus, the
> > x86_64 ABI would make __int128 operations not imlinable (because of the
> > issues with cmpxchg16b, see above).
> > If users want to use capabilities beyond the baseline, they can choose
> > to use flags that alter/extend the ABI. For example, if they use a flag
> > that explicitly enables the use of cmpxchg16b for atomics, they also
> > need to use a libatomic implementation built in the same way (if
> > possible). This then creates a new ABI(-variant), basically.
> Yes. Other examples here are power7/power8 and armv6/armv7.
> In both cases, the architecture added double-word load(-locked) and
> store(-conditional) instructions. In order for us to use these new
> instructions inline, libatomic must be updated to use them as well.
> The general principal, in my opinion, is that extensions to the ISA should
> require that libatomic either be re-built, or perform runtime detection in
> order to select the internal algorithm used.
That sounds okay for me. I think we would have to make that clear in
the ABI specification though, because this also includes requirements
for the user of the ABI (eg, if you compile for power8, you need to use
a suitably built libatomic) and for distributions.
> In the case of arm, distributions normally either (1) build for a specific cpu
> revision, (2) build for old-arm + soft-fpu, (3) build for armv7 + hard-fpu. So
> most distributions would not actually require a runtime check for arm.
> In the case of power, I assume it's possible to run ppc64 on power8, but every
> power8 system to which I have access has ppc64le deployed. Certainly ppc64le
> would not need a runtime check, but it would seem prudent for ppc64 to gain a
> runtime check for the power8 insns.
OK. I think it would be good if ARM/Power people could contribute to
the ABI specification and extend it to also cover ARM/Power.
> > I've made a few tests on my x86_64 machine a few weeks ago, and I didn't
> > see cmpxchg16b being used. IIRC, I also looked at libatomic and didn't
> > see it (but I don't remember for sure). Either way, if I should have
> > been wrong, and we are using cmpxchg16b for loads, this should be fixed.
> > Ideally, this should be fixed before the stage 3 deadline this Friday.
> > Such a fix might potentially break existing uses, but the earlier we fix
> > this, the better.
> You needed to use -mcx16, or any other option (such as -march=host) that
> implies that. And, you will find that expand_atomic_load does have a
> larger-than-word-size fallback path that does use expand_atomic_compare_and_swap.
> So, yes, there's something here that needs adjustment.
I'll send a separate email describing the options I see currently.
> > Section 3 Rationale, alternative 1: I'm wondering if the example is
> > correct. For a 4-byte-aligned type of size 3, the implementation cannot
> > simply use 4-byte hardware-backed atomics because this will inevitably
> > touch the 4th byte I think, and the implementation can't know whether
> > this is padding or not. Or do we expect that things like packed structs
> > are disallowed?
> If we atomically store an unchanged value into the 4th byte, can we tell?
Probably not in terms of the value. But race detectors, HW breakpoints
etc. could observe the store. I'm not sure whether potentially having
to adapt these is justified by being able to optimize atomic access to
> > N3.1: Why do you assume that 8-byte HW atomics are available on i386?
> > Because cmpxchg8b is available for CPUs that are the lowest i?86 we
> > still intend to support?
> For various definitions of "we", I suppose. Red Hat certainly does not support
> anything lower than i686, which does have cmpxchg8b.
> I suspect that the GNU project still supports i486. I do know that glibc has
> dropped support for i386.
> I should note that supporting 64-bit atomics on i686 *is* possible, without the
> CAS problem that you describe for cmpxchg16b, because we *are* guaranteed that
> the FPU supports a 64-bit atomic load/store. And we do already handle this;
> see the atomic_loaddi_fpu and atomic_storedi_fpu patterns.
> I'll also note that, as per above, this implies that if we build for i586-*,
> libatomic should provide runtime paths that detect and use i686 insns, so that
> the library is compatible with what the compiler will generate inline given
> appropriate command-line options.
OK. So these rules should be added to the ABI spec too, I suppose.