[PATCH] Fix memory orders description in atomic ops built-ins docs.

Thu May 21 15:56:00 GMT 2015

On 19/05/15 20:20, Torvald Riegel wrote:
> On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
>> Hello,
>>
>> On 15/05/15 17:22, Torvald Riegel wrote:
>>> This patch improves the documentation of the built-ins for atomic
>>> operations.
>>
>> The "memory model" to "memory order" change does improve things but I think that
>> the patch has some problems. As it is now, it makes some of the descriptions
>> quite difficult to understand and seems to assume more familiarity with details
>> of the C++11 specification then might be expected.
>
> I'd say that's a side effect of the C++11 memory model being the
> reference specification of the built-ins.
>
>> Generally, the memory order descriptions seem to be targeted towards language
>> designers but don't provide for anybody trying to understand how to implement or
>> to use the built-ins.
>
> I agree that the current descriptions aren't a tutorial on the C++11
> memory model.  However, given that the model is not GCC-specific, we
> aren't really in a need to provide a tutorial, in the same way that we
> don't provide a C++ tutorial.  Users can pick the C++11 memory model
> educational material of their choice, and we need to document what's
> missing to apply the C++11 knowledge to the built-ins we provide.
>

We seem to have different views about the purpose of the manual page. I'm treating it 
as a description of the built-in functions provided by gcc to generate the code 
needed to implement the C++11 model. That is, the built-ins are distinct from C++11 
and their descriptions should be, as far as possible, independent of the methods used 
in the C++11 specification to describe the C++11 memory model.

I understand of course that the __atomics were added in order to support C++11 but 
that doesn't make them part of C++11 and, since __atomic functions can be made 
available when C11/C++11 may not be, it seems to make sense to try for stand-alone 
descriptions.

I'm also concerned that the patch, by describing things in terms of formal C++11 
concepts, makes it more difficult for people to know what the built-ins can be 
expected to do and so make the built-in more difficult to use There is a danger that 
rather than take a risk with uncertainty about the behaviour of the __atomics, people 
will fall-back to the __sync functions simply because their expected behaviour is 
easier to work out.

I don't think that linking to external sites will help either, unless people already 
want to know C++11. Anybody who just wants to (e.g.) add a memory barrier will take 
one look at the __sync manual page and use the closest match from there instead.

Note that none of this requires a tutorial of any kind. I'm just suggesting that the 
manual should describe what behaviour should be expected of the code generated for 
the functions. For the memory orders, that would mean describing what constraints 
need to be met by the generated code. The requirement that the atomics should support 
C++11 could be met by making sure that the description of the expected behaviour is 
sufficient for C++11.

> There are several resources for implementers, for example the mappings
> maintained by the Cambridge research group.  I guess it would be
> sufficient to have such material on the wiki.  Is there something
> specific that you'd like to see documented for implementers?
> [...]
> I agree it's not described in the manual, but we're implementing C++11.

(As above) I believe we're supporting the implementation of C++11 and that the 
distinction is important.

> However, I don't see why happens-before semantics wouldn't apply to
> GCC's implementation of the built-ins; there may be cases where we
> guarantee more, but if one uses the builtins in way allowed by the C++11
> model, one certainly gets behavior and happens-before relationships as
> specified by C++11.
>

My understanding is that happens-before is a relation used in the C++11 specification 
for a specific meaning. I believe that it's used to decide whether something is or is 
not a data race so saying that it applies to a gcc built-in would be wrong. Using the 
gcc built-in rather than the equivalent C++11 library function would result in 
program that C++11 regards as invalid. (Again, as I understand it.)

>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 6004681..5b2ded8 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier.
>>
>>    [...]  If the data type size maps to one
>> -of the integral sizes that may have lock free support, the generic
>> -version uses the lock free built-in function.  Otherwise an
>> +of the integral sizes that may support lock-freedom, the generic
>> +version uses the lock-free built-in function.  Otherwise an
>>    external call is left to be resolved at run time.
>>
>> =====
>> This is a slightly awkward sentence. Maybe it could be replaced with something
>> on the lines of "The generic function uses the lock-free built-in function when
>> the data-type size makes that possible, otherwise an external call is left to be
>> resolved at run-time."
>> =====
>
> Changed to:
> "It uses the lock-free built-in function if the specific data type size
> makes that possible; otherwise, an external call is left to be resolved
> at run time."

Ok for me.

>> -The memory models integrate both barriers to code motion as well as
>> -synchronization requirements with other threads.  They are listed here
>> -in approximately ascending order of strength.
>> +An atomic operation can both constrain code motion by the compiler and
>> +be mapped to a hardware instruction for synchronization between threads
>> +(e.g., a fence).  [...]
>>
>> =====
>> This is a little unclear (and inaccurate, aarch64 can use two instructions
>> for fences). I also thought that atomic operations constrain code motion by the
>> hardware. Maybe break the link with the compiler and hardware: "An atomic
>> operation can both constrain code motion and act as a synchronization point
>> between threads".
>> =====
>
> I removed "by the compiler" and used "hardware instruction_s_".

Ok.

>>    @table  @code
>>    @item __ATOMIC_RELAXED
>> -No barriers or synchronization.
>> +Implies no inter-thread ordering constraints.
>>
>> ====
>> It may be useful to be explicit that there are no restrctions on code motion.
>> ====
>
> But there are restrictions, for example those related to the forward
> progress requirements or the coherency rules.

Those are C++11 restrictions, used in the formal description of the model. The 
expected behaviour of the HW insructions generated by the built-in doesn't require 
any restrictions to be imposed.

>>    @item __ATOMIC_CONSUME
>> -Data dependency only for both barrier and synchronization with another
>> -thread.
>> +This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE}
>> +memory order because of a deficiency in C++11's semantics for
>> +@code{memory_order_consume}.
>>
>> =====
>> It would be useful to have a description of what the __ATOMIC_CONSUME was
>> meant to do, as well as the fact that it currently just maps to
>> __ATOMIC_ACQUIRE. (Or maybe just drop it from the documentation until it's
>> fixed.)
>> =====
>
> I'll leave what it was meant to do to the ISO C++ discussions.  I think
> that it's implemented using mo_acquire is already clear from the text,
> or not?

Yes, I just meant don't drop the link to __ATOMIC_CONSUME.

It's probably worth saying something to discourage people from using this though. 
Something on the lines of "This memory-order should not be used as the implementation 
is likely to change".

>>
>>    @item __ATOMIC_ACQUIRE
>> -Barrier to hoisting of code and synchronizes with release (or stronger)
>> -semantic stores from another thread.
>> +Creates an inter-thread happens-before constraint from the release (or
>> +stronger) semantic store to this acquire load.  Can prevent hoisting
>> +of code to before the operation.
>>
>> =====
>> As noted before, it's not clear what the "inter-thread happens-before"
>> means in this context.
>
> I don't see a way to make this truly freestanding without replicating
> the specification of the C++11 memory model.

(As above) I don't believe that the happens-before applies to the built-ins.

>> Here and elsewhere:
>> "Can prevent <motion> of code" is ambiguous: it doesn't say under what
>> conditions code would or wouldn't be prevented from moving.
>
> Yes, it's not a specification but just an illustration of what can
> result from the specification.

But it makes it difficult to know what behaviour to expect, making it difficult to use.

> [..] Describing which code movement is
> allowed or not, precisely, is way too much detail IMO.  There are full
> ISO C++ papers about that (N4455), and even those aren't enumerations of
> all allowed code transformations.

A gcc built-in reduces to a code sequence to be executed by a single thread; 
describing the expected behaviour of that code sequence shouldn't be so difficult 
that it needs a paper. Note that this doesn't need a formal (in the sense of formal 
methods) description, just the sort of description that is normal for a compiler 
built-in.

>> =====
>>
>> -Note that the scope of a C++11 memory model depends on whether or not
>> -the function being called is a @emph{fence} (such as
>> -@samp{__atomic_thread_fence}).  In a fence, all memory accesses are
>> -subject to the restrictions of the memory model.  When the function is
>> -an operation on a location, the restrictions apply only to those
>> -memory accesses that could affect or that could depend on the
>> -location.
>> +Note that in the C++11 memory model, @emph{fences} (e.g.,
>> +@samp{__atomic_thread_fence}) take effect in combination with other
>> +atomic operations on specific memory locations (e.g., atomic loads);
>> +operations on specific memory locations do not necessarily affect other
>> +operations in the same way.
>>
>> ====
>> Its very unclear what this paragraph is saying. It seems to suggest that fences
>> only work in combination with other operations. But that doesn't seem right
>> since a __atomic_thread_fence (with appropriate memory order) can be dropped
>> into any piece of code and will act in the way that memory barriers are commonly
>> understood to work.
>
> Not quite.  If you control the code generation around the fence (e.g.,
> so you control which loads/stores a compiler generates), and know which
> HW instruction the C++ fence maps too, you can try use it this way.
>
> However, that's not what the built-in is specified to result in --
> instead, it's specified to implement a C++ fence.  And those take effect
> in combination with the reads-from relation etc., for which one needs to
> use atomic operations that access specific memory locations.
>

As above.

>> ====
>>
>>    @@ -9131,5 +9135,5 @@ if (_atomic_always_lock_free (sizeof (long long), 0))
>>    @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size, void *ptr)
>>
>>    This built-in function returns true if objects of @var{size} bytes always
>> +generate lock-free atomic instructions for the target architecture.  If
>> +it is not known to be lock-free, a call is made to a runtime routine named
>> ====
>> Probably unrelated to the point of this patch but does this mean
>> "If this built-in function is not known to be lock-free .."?
>> ====
>>
>
> Yes, fixed.
>
> I've also fixed s/model/order/ in the HLE built-ins docs and fixed a
> typo there too.
>
> Unless there are objections, I plan to commit the attached patch at the
> end of this week.

It's obvious that I have concerns. These come from an apparent difference of opinion 
about what the built-ins and the manual page are for so I won't object to the patch 
going in.

Matthew