[PATCH][ARM] PR target/69875 Fix atomic_loaddi expansion
Kyrill Tkachov
kyrylo.tkachov@foss.arm.com
Fri Feb 19 15:25:00 GMT 2016
Hi all,
The atomic_loaddi expander on arm has some issues and can benefit from a rewrite to properly
perform double-word atomic loads on various architecture levels.
Consider the code:
----------------------
#include <stdatomic.h>
atomic_ullong foo;
int glob;
int main(void) {
atomic_load_explicit(&foo, memory_order_acquire);
return glob;
}
---------------------
Compiled with -O2 -march=armv7-a -std=c11 this gives:
movw r3, #:lower16:glob
movt r3, #:upper16:glob
dmb ish
movw r2, #:lower16:foo
movt r2, #:upper16:foo
ldrexd r0, r1, [r2]
ldr r0, [r3]
bx lr
For the acquire memory model the barrier should be after the ldrexd, not before.
The same code is generated when compiled with -march=armv7ve. However, we can get away with a single LDRD
on such systems. In issue C.c of The ARM Architecture Reference Manual for ARMv7-A and ARMv7-R
recommends at chapter A3.5.3:
"In an implementation that includes the Large Physical Address Extension, LDRD and STRD accesses to 64-bit aligned
locations are 64-bit single-copy atomic".
We still need the barrier after the LDRD to enforce the acquire ordering semantics.
For ARMv8-A we can do even better and use the load double-word acquire instruction: LDAEXD, with no need for
a barrier afterwards.
I've discussed the required sequences with some kernel folk and had a read through:
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
and this is the patch I've come up with.
This patch handles all three of the above cases by rewriting the atomic_loaddi expander.
With this patch for the above code with -march=armv7-a we would now generate:
movw r3, #:lower16:foo
movt r3, #:upper16:foo
ldrexd r0, r1, [r3]
movw r3, #:lower16:glob
movt r3, #:upper16:glob
dmb ish
ldr r0, [r3]
bx lr
For -march=armv7ve:
movw r3, #:lower16:foo
movt r3, #:upper16:foo
ldrd r2, r3, [r3]
movw r3, #:lower16:glob
movt r3, #:upper16:glob
dmb ish
ldr r0, [r3]
bx lr
and for -march=armv8-a:
movw r3, #:lower16:foo
movt r3, #:upper16:foo
ldaexd r2, r3, [r3]
movw r3, #:lower16:glob
movt r3, #:upper16:glob
ldr r0, [r3]
bx lr
For the relaxed memory model the armv7ve and armv8-a can be relaxed to a single
LDRD instruction, without any barriers.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?
Thanks,
Kyrill
P.S. The backport to the previous branches will look a bit different because the
ARM_FSET_HAS_CPU1 machinery in arm.h was introduced for GCC 6. I'll prepare a backport
separately if this is accepted.
2016-02-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
PR target/69875
* config/arm/arm.h (TARGET_HAVE_LPAE): Define.
* config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
* config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
(atomic_loaddi_1): Delete.
(atomic_loaddi): Rewrite expander using the above changes.
2016-02-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
PR target/69875
* gcc.target/arm/atomic_loaddi_acquire.x: New file.
* gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
* gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
* gcc.target/arm/atomic_loaddi_1.c: New test.
* gcc.target/arm/atomic_loaddi_2.c: Likewise.
* gcc.target/arm/atomic_loaddi_3.c: Likewise.
* gcc.target/arm/atomic_loaddi_4.c: Likewise.
* gcc.target/arm/atomic_loaddi_5.c: Likewise.
* gcc.target/arm/atomic_loaddi_6.c: Likewise.
* gcc.target/arm/atomic_loaddi_7.c: Likewise.
* gcc.target/arm/atomic_loaddi_8.c: Likewise.
* gcc.target/arm/atomic_loaddi_9.c: Likewise.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-atomic-loaddi.patch
Type: text/x-patch
Size: 12071 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160219/74979aa7/attachment.bin>
More information about the Gcc-patches
mailing list