This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #28 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #24)
> I think we need to at least clarify the documentation of __atomic, probably
> also of __sync; we might also have to change the implementation of __sync
> builtins on some archs.
> 
> First, I think the __atomic builtins should strictly follow the C11 model --
> one major reason being that it's not worthwhile for us to support a
> GCC-specific memory model, which is complex.  Therefore, I think we should
> clarify in the __atomic docs that C++11 is the reference semantics, and that
> any additional text we provide (e.g., to describe what MEMMODEL_SEQ_CST
> does) is just some hand-wavy attempt of makings things look simpler.  I'd
> choose C++11 over C11 because, although the models are supposed to be equal,
> C++11 has more detailed docs in the area of atomics and synchronization in
> general, IMO.  There are also more people involved in C++ standardization
> than C standardization.

Agreed.

> The only exception that I see is that we might want to give more guarantees
> wrt data-race-freedom (DRF) as far as compiler reordering is concerned.  I'm
> mentioning just the compiler because I'm not aware of GCC supporting
> C11/C++11 on any archs whose ISA distinguishes between atomic and nonatomic
> accesses (ie, where DRF is visible at the ISA level); however, I've heard of
> upcoming archs that may do that.  Trying to promise some support for non-DRF
> programs that synchronize might make transition of legacy code to __atomic
> builtins easier, but it's a slippery slope; I don't think it's worthwhile
> for us to try to give really precise and hard guarantees, just because of
> the complexity involved.  I'm not quite sure what's best here.

I use enough whiteboard space trying to work through the well researched,
formalized, and discussed C++11 semantics; I would not like to see a subtly
different GNU++11 semantics!! Not least because I worry for the programmer who
thinks they've understood the guarantees of the specification and reverse
engineers GCC's output for confirmation, only to find GCC is giving stronger
guarantees than is strictly neccessary.

> <snip>
>
> Thus, we need to at least improve the __sync docs.  If we want to really
> make them C11-like, we need to update the docs, and there might be legacy
> code assuming different semantics that breaks.If we don't, we need to
> update the implementation of __sync RMWs.  I don't know what would be the
> easiest way to do that:
>
> 1) We could put __synch_synchronize around them on all archs, and just don't
> care about the performance overhead (thinking that we want to deprecate them
> anyway).  But unless the backends optimize unnecessary sync ops (eg, on x86,
> remove mfence adjacent to a lock'd RMW), there could be performance
> degradation for __sync-using code.

I'm with Andrew, I don't see the need to penalise everyone to ensure
AArch64/ARM memory ordering (Though I agree it would be nice as a stick to
encourage people to move forwards).

> 2) We could make __atomic RMWs with MEMMODEL_SEQ_CST stronger than required
> by C11.  This could result in other C11-conforming compilers to produce more
> efficient synchronization code, and is thus not a good option IMO.

I would not like to see this implemented.

> 3) We could do something just on ARM (and scan other arcs for similar
> issues).  That's perhaps the cleanest option.

Which leaves 3). From Andrew's two proposed solutions:

> a) introduce an additional memory model... MEMMODEL_SYNC or something
> which is even stronger than SEQ_CST.  This would involve fixing any places
> that assume SEQ_CST is the highest.  And then we use this from the places
> that expand sync calls as the memory model.

This seems a sensible approach and leads to a nicer design, but I worry that it
might be overkill for primitives which we ultimately want to reduce support for
and eventually deprecate.

> b) When we are expanding sync calls, we first look for target __sync
> patterns instead of atomic patterns.  If they exist, we use those.
> Otherwise we simply expand like we do today.  
>
> (b) may be easier to implement, but puts more onus on the target..
> probably involves more complex patterns since you need both atomic and
> sync patterns. My guess is some duplication of code will occur here.  But
> the impact is only to specific targets.

When I looked at this problem internally a few weeks back, this is exactly how
I expected things to work (we can add the documentation for the pattern names
to the list of things which need fixing, as it took me a while to figure out
why my new patterns were never expanded!).

I don't think this is particularly onerous for a target. The tough part in all
of this is figuring out the minimal cost instructions at the ISA level to use
for the various __atomic primitives. Extending support to a stronger model,
should be straightforward given explicit documentation of the stronger ordering
requirements.

Certainly, a target would have to do the same legwork for a) regardless, and
would have to pollute their atomic patterns with checks and code paths for
MEMMODEL_SYNC.

This also gives us an easier route to fixing any issues with the
acquire/release __sync primitives (__sync_lock_test_and_set and
__sync_lock_release) if we decide that these also need to be stronger than
their C++11 equivalents.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]