[PATCH][ARM] PR target/69875 Fix atomic_loaddi expansion

Kyrill Tkachov kyrylo.tkachov@foss.arm.com
Fri Feb 19 15:25:00 GMT 2016


Hi all,

The atomic_loaddi expander on arm has some issues and can benefit from a rewrite to properly
perform double-word atomic loads on various architecture levels.

Consider the code:
----------------------
#include <stdatomic.h>

atomic_ullong foo;

int glob;

int main(void) {
     atomic_load_explicit(&foo, memory_order_acquire);
     return glob;
}
---------------------

Compiled with -O2 -march=armv7-a -std=c11 this gives:
         movw    r3, #:lower16:glob
         movt    r3, #:upper16:glob
         dmb     ish
         movw    r2, #:lower16:foo
         movt    r2, #:upper16:foo
         ldrexd  r0, r1, [r2]
         ldr     r0, [r3]
         bx      lr

For the acquire memory model the barrier should be after the ldrexd, not before.
The same code is generated when compiled with -march=armv7ve. However, we can get away with a single LDRD
on such systems. In issue C.c of The ARM Architecture Reference Manual for ARMv7-A and ARMv7-R
recommends at chapter A3.5.3:
"In an implementation that includes the Large Physical Address Extension, LDRD and STRD accesses to 64-bit aligned
locations are 64-bit single-copy atomic".
We still need the barrier after the LDRD to enforce the acquire ordering semantics.

For ARMv8-A we can do even better and use the load double-word acquire instruction: LDAEXD, with no need for
a barrier afterwards.

I've discussed the required sequences with some kernel folk and had a read through:
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
and this is the patch I've come up with.

This patch handles all three of the above cases by rewriting the atomic_loaddi expander.
With this patch for the above code with -march=armv7-a we would now generate:
         movw    r3, #:lower16:foo
         movt    r3, #:upper16:foo
         ldrexd  r0, r1, [r3]
         movw    r3, #:lower16:glob
         movt    r3, #:upper16:glob
         dmb     ish
         ldr     r0, [r3]
         bx      lr

For -march=armv7ve:
         movw    r3, #:lower16:foo
         movt    r3, #:upper16:foo
         ldrd    r2, r3, [r3]
         movw    r3, #:lower16:glob
         movt    r3, #:upper16:glob
         dmb     ish
         ldr     r0, [r3]
         bx      lr

and for -march=armv8-a:
         movw    r3, #:lower16:foo
         movt    r3, #:upper16:foo
         ldaexd  r2, r3, [r3]
         movw    r3, #:lower16:glob
         movt    r3, #:upper16:glob
         ldr     r0, [r3]
         bx      lr

For the relaxed memory model the armv7ve and armv8-a can be relaxed to a single
LDRD instruction, without any barriers.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

P.S. The backport to the previous branches will look a bit different because the
ARM_FSET_HAS_CPU1 machinery in arm.h was introduced for GCC 6. I'll prepare a backport
separately if this is accepted.

2016-02-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/69875
     * config/arm/arm.h (TARGET_HAVE_LPAE): Define.
     * config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
     * config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
     (atomic_loaddi_1): Delete.
     (atomic_loaddi): Rewrite expander using the above changes.

2016-02-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/69875
     * gcc.target/arm/atomic_loaddi_acquire.x: New file.
     * gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
     * gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
     * gcc.target/arm/atomic_loaddi_1.c: New test.
     * gcc.target/arm/atomic_loaddi_2.c: Likewise.
     * gcc.target/arm/atomic_loaddi_3.c: Likewise.
     * gcc.target/arm/atomic_loaddi_4.c: Likewise.
     * gcc.target/arm/atomic_loaddi_5.c: Likewise.
     * gcc.target/arm/atomic_loaddi_6.c: Likewise.
     * gcc.target/arm/atomic_loaddi_7.c: Likewise.
     * gcc.target/arm/atomic_loaddi_8.c: Likewise.
     * gcc.target/arm/atomic_loaddi_9.c: Likewise.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-atomic-loaddi.patch
Type: text/x-patch
Size: 12071 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160219/74979aa7/attachment.bin>


More information about the Gcc-patches mailing list