Well, yeah.  We can only really blame ARM for this: they provided a
double-word CAS but no way to define a double-word atomic load which
does not also store.  I hesitate to place blame on the ARM architects,
a splendid and diligent bunch, but there it is.  I have no idea why
LDXP doesn't work as an atomic load, but it does not.

