This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.


On 19/05/15 20:20, Torvald Riegel wrote:
On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
Hello,

On 15/05/15 17:22, Torvald Riegel wrote:
This patch improves the documentation of the built-ins for atomic
operations.

The "memory model" to "memory order" change does improve things but I think that
the patch has some problems. As it is now, it makes some of the descriptions
quite difficult to understand and seems to assume more familiarity with details
of the C++11 specification then might be expected.

I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.

Generally, the memory order descriptions seem to be targeted towards language
designers but don't provide for anybody trying to understand how to implement or
to use the built-ins.

I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.


We seem to have different views about the purpose of the manual page. I'm treating it as a description of the built-in functions provided by gcc to generate the code needed to implement the C++11 model. That is, the built-ins are distinct from C++11 and their descriptions should be, as far as possible, independent of the methods used in the C++11 specification to describe the C++11 memory model.

I understand of course that the __atomics were added in order to support C++11 but that doesn't make them part of C++11 and, since __atomic functions can be made available when C11/C++11 may not be, it seems to make sense to try for stand-alone descriptions.

I'm also concerned that the patch, by describing things in terms of formal C++11 concepts, makes it more difficult for people to know what the built-ins can be expected to do and so make the built-in more difficult to use There is a danger that rather than take a risk with uncertainty about the behaviour of the __atomics, people will fall-back to the __sync functions simply because their expected behaviour is easier to work out.

I don't think that linking to external sites will help either, unless people already want to know C++11. Anybody who just wants to (e.g.) add a memory barrier will take one look at the __sync manual page and use the closest match from there instead.

Note that none of this requires a tutorial of any kind. I'm just suggesting that the manual should describe what behaviour should be expected of the code generated for the functions. For the memory orders, that would mean describing what constraints need to be met by the generated code. The requirement that the atomics should support C++11 could be met by making sure that the description of the expected behaviour is sufficient for C++11.

There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?
[...]
I agree it's not described in the manual, but we're implementing C++11.

(As above) I believe we're supporting the implementation of C++11 and that the distinction is important.

However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.


My understanding is that happens-before is a relation used in the C++11 specification for a specific meaning. I believe that it's used to decide whether something is or is not a data race so saying that it applies to a gcc built-in would be wrong. Using the gcc built-in rather than the equivalent C++11 library function would result in program that C++11 regards as invalid. (Again, as I understand it.)


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6004681..5b2ded8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier.

   [...]  If the data type size maps to one
-of the integral sizes that may have lock free support, the generic
-version uses the lock free built-in function.  Otherwise an
+of the integral sizes that may support lock-freedom, the generic
+version uses the lock-free built-in function.  Otherwise an
   external call is left to be resolved at run time.

=====
This is a slightly awkward sentence. Maybe it could be replaced with something
on the lines of "The generic function uses the lock-free built-in function when
the data-type size makes that possible, otherwise an external call is left to be
resolved at run-time."
=====

Changed to:
"It uses the lock-free built-in function if the specific data type size
makes that possible; otherwise, an external call is left to be resolved
at run time."

Ok for me.

-The memory models integrate both barriers to code motion as well as
-synchronization requirements with other threads.  They are listed here
-in approximately ascending order of strength.
+An atomic operation can both constrain code motion by the compiler and
+be mapped to a hardware instruction for synchronization between threads
+(e.g., a fence).  [...]

=====
This is a little unclear (and inaccurate, aarch64 can use two instructions
for fences). I also thought that atomic operations constrain code motion by the
hardware. Maybe break the link with the compiler and hardware: "An atomic
operation can both constrain code motion and act as a synchronization point
between threads".
=====

I removed "by the compiler" and used "hardware instruction_s_".

Ok.

   @table  @code
   @item __ATOMIC_RELAXED
-No barriers or synchronization.
+Implies no inter-thread ordering constraints.

====
It may be useful to be explicit that there are no restrctions on code motion.
====

But there are restrictions, for example those related to the forward
progress requirements or the coherency rules.

Those are C++11 restrictions, used in the formal description of the model. The expected behaviour of the HW insructions generated by the built-in doesn't require any restrictions to be imposed.

   @item __ATOMIC_CONSUME
-Data dependency only for both barrier and synchronization with another
-thread.
+This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE}
+memory order because of a deficiency in C++11's semantics for
+@code{memory_order_consume}.

=====
It would be useful to have a description of what the __ATOMIC_CONSUME was
meant to do, as well as the fact that it currently just maps to
__ATOMIC_ACQUIRE. (Or maybe just drop it from the documentation until it's
fixed.)
=====

I'll leave what it was meant to do to the ISO C++ discussions.  I think
that it's implemented using mo_acquire is already clear from the text,
or not?

Yes, I just meant don't drop the link to __ATOMIC_CONSUME.

It's probably worth saying something to discourage people from using this though. Something on the lines of "This memory-order should not be used as the implementation is likely to change".


   @item __ATOMIC_ACQUIRE
-Barrier to hoisting of code and synchronizes with release (or stronger)
-semantic stores from another thread.
+Creates an inter-thread happens-before constraint from the release (or
+stronger) semantic store to this acquire load.  Can prevent hoisting
+of code to before the operation.

=====
As noted before, it's not clear what the "inter-thread happens-before"
means in this context.

I don't see a way to make this truly freestanding without replicating
the specification of the C++11 memory model.

(As above) I don't believe that the happens-before applies to the built-ins.

Here and elsewhere:
"Can prevent <motion> of code" is ambiguous: it doesn't say under what
conditions code would or wouldn't be prevented from moving.

Yes, it's not a specification but just an illustration of what can
result from the specification.

But it makes it difficult to know what behaviour to expect, making it difficult to use.

[..] Describing which code movement is
allowed or not, precisely, is way too much detail IMO.  There are full
ISO C++ papers about that (N4455), and even those aren't enumerations of
all allowed code transformations.

A gcc built-in reduces to a code sequence to be executed by a single thread; describing the expected behaviour of that code sequence shouldn't be so difficult that it needs a paper. Note that this doesn't need a formal (in the sense of formal methods) description, just the sort of description that is normal for a compiler built-in.

=====

-Note that the scope of a C++11 memory model depends on whether or not
-the function being called is a @emph{fence} (such as
-@samp{__atomic_thread_fence}).  In a fence, all memory accesses are
-subject to the restrictions of the memory model.  When the function is
-an operation on a location, the restrictions apply only to those
-memory accesses that could affect or that could depend on the
-location.
+Note that in the C++11 memory model, @emph{fences} (e.g.,
+@samp{__atomic_thread_fence}) take effect in combination with other
+atomic operations on specific memory locations (e.g., atomic loads);
+operations on specific memory locations do not necessarily affect other
+operations in the same way.

====
Its very unclear what this paragraph is saying. It seems to suggest that fences
only work in combination with other operations. But that doesn't seem right
since a __atomic_thread_fence (with appropriate memory order) can be dropped
into any piece of code and will act in the way that memory barriers are commonly
understood to work.

Not quite.  If you control the code generation around the fence (e.g.,
so you control which loads/stores a compiler generates), and know which
HW instruction the C++ fence maps too, you can try use it this way.

However, that's not what the built-in is specified to result in --
instead, it's specified to implement a C++ fence.  And those take effect
in combination with the reads-from relation etc., for which one needs to
use atomic operations that access specific memory locations.


As above.

====

   @@ -9131,5 +9135,5 @@ if (_atomic_always_lock_free (sizeof (long long), 0))
   @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size, void *ptr)

   This built-in function returns true if objects of @var{size} bytes always
+generate lock-free atomic instructions for the target architecture.  If
+it is not known to be lock-free, a call is made to a runtime routine named
====
Probably unrelated to the point of this patch but does this mean
"If this built-in function is not known to be lock-free .."?
====


Yes, fixed.

I've also fixed s/model/order/ in the HLE built-ins docs and fixed a
typo there too.

Unless there are objections, I plan to commit the attached patch at the
end of this week.

It's obvious that I have concerns. These come from an apparent difference of opinion about what the built-ins and the manual page are for so I won't object to the patch going in.

Matthew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]