This compiles __int128 f(__int128 *p, __int128 *q, __int128 x) { return __sync_val_compare_and_swap (p, *q, x); } into: f: ldp x6, x7, [x1] mov x4, x0 .L3: ldxp x0, x1, [x4] cmp x0, x6 ccmp x1, x7, 0, eq bne .L4 stlxp w5, x2, x3, [x4] cbnz w5, .L3 .L4: dmb ish ret This means if the compare fails, we return the value loaded via LDXP. However unless the STXP succeeds, this returned value is not single-copy atomic. So on failure we still need to execute STLXP.
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630198.html
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>: https://gcc.gnu.org/g:df8958e6bc5d050dab8bdc5954c1632fb0e98d18 commit r14-6021-gdf8958e6bc5d050dab8bdc5954c1632fb0e98d18 Author: Wilco Dijkstra <wilco.dijkstra@arm.com> Date: Thu Nov 30 16:14:35 2023 +0000 AArch64: Fix __sync_val_compare_and_swap [PR111404] __sync_val_compare_and_swap may be used on 128-bit types and either calls the outline atomic code or uses an inline loop. On AArch64 LDXP is only atomic if the value is stored successfully using STXP, but the current implementations do not perform the store if the comparison fails. In this case the value returned is not read atomically. gcc/ChangeLog: PR target/111404 * config/aarch64/aarch64.cc (aarch64_split_compare_and_swap): For 128-bit store the loaded value and loop if needed. libgcc/ChangeLog: PR target/111404 * config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using either new value or loaded value.