This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions


On 10/30/18 8:32 PM, James Greenhalgh wrote:
> On Tue, Oct 02, 2018 at 11:19:09AM -0500, Richard Henderson wrote:
>> When the result of an operation is not used, we can ignore the
>> result by storing to XZR.  For two of the memory models, using
>> XZR with LD<op> has a preferred assembler alias, ST<op>.
> 
> ST<op> has different semantics to LD<op>, in particular, ST<op> is not
> ordered by a DMB LD; so this could weaken the LDADD and break C11 semantics.
> 
> The relevant Arm Arm text is:
> 
>   If the destination register is not one of WZR or XZR, LDADDA and
>   LDADDAL load from memory with acquire semantics
> 
>   LDADDL and LDADDAL store to memory with release semantics.
> 
>   LDADD has no memory ordering requirements.
> 
> I'm taking this to mean that even if the result is unused, using XZR is not
> a valid transformation; it weakens the expected acquire semantics to
> unordered.
> 
> The example I have from Will Deacon on an internal bug database is:
> 
>   P0 (atomic_int* y,atomic_int* x) {
>     atomic_store_explicit(x,1,memory_order_relaxed);
>     atomic_thread_fence(memory_order_release);
>     atomic_store_explicit(y,1,memory_order_relaxed);
>   }
> 
>   P1 (atomic_int* y,atomic_int* x) {
>     int r0 = atomic_fetch_add_explicit(y,1,memory_order_relaxed);
>     atomic_thread_fence(memory_order_acquire);
>     int r1 = atomic_load_explicit(x,memory_order_relaxed);
>   }
> 
>   The outcome where y == 2 and P1 has r0 = 1 and r1 = 0 is illegal.
> 
> This example comes from a while back in my memory; so copying Will for
> any more detailed questions.
> 
> My impression is that this transformation is not safe, and so the patch is
> not OK.

Here's a revised patch.

Use ST<op> for relaxed and release orderings, retain the (non-xzr) scratch
register for other orderings.  But the scratch need not be early-clobber, since
there's no mid-point of half-consumed inputs.


r~

Attachment: z
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]