Atomic operations on the ARM
Richard Earnshaw
rearnsha@arm.com
Mon Oct 7 06:11:00 GMT 2002
> Well, here's exchange_and_add:
> __asm__ ("\n"
> "0:\tldr\t%0,[%3]\n\t"
> "add\t%1,%0,%4\n\t"
> "swp\t%2,%1,[%3]\n\t"
> "cmp\t%0,%2\n\t"
> "swpne\t%1,%2,[%3]\n\t"
> "bne\t0b"
>
> The others are pretty similar. So it looks like the same thing to me.
> Ugh!
>
It's almost certain that the libstc++ code came originally form the glibc
implementation...
> I'm not very familiar with ARM, so I'm just going by a quick reference
> guide here... but it appears that SWP is the only one of the normal
> atomic primitives available. So we have test-and-set/xchg, but that's
> about it.
>
That's about it, yes.
> You can do a 24-bit mutex (which is all at least one other platform
> offers) using swpb; then you use one byte of the _Atomic_word as a
> spinlock. That does still have some issues but should be possible
> without an ABI change since the size/alignment of the _Atomic_word
> don't actually change. Does that sound worthwhile?
>
> > We could get clever and use a single bit in _Atomic_word to be a mutex
> > bit, and effectively make it a bit-field, effectively
> >
> > typedef struct
> > {
> > int mutex:1;
> > signed int val:31
> > } _Atomic_word;
> >
> > But that would involve fixing the source code that directly accesses this
> > type and changing it to use set and read macros.
>
> Why, fancy that. It looks awfully familiar... :) See above for why I
> think it's advantageous, though. I don't think you could implement
> that safely with only swap-word and swap-byte, also. Need a whole
> byte.
Hmm, that might work though it wastes more bits, I was thinking of code
along the lines of
Initial: Atomic_word address in Rsem, increment in Rval; Ra & Rb are
scratch
mvn Ra, #0 /* Ra <- -1 */
0:
swp Rb, Ra, [Rsem]
tst Rb, #1
bne 0b
add Rc, Rb, Rval, asl #1
str Rc, [Rsem]
mov Rb, Rb, asr #1
Result is in Rb.
Note that this works because we guarantee that bit zero of the value is
really the lock bit, so can never be 1 when semaphore is not held. When
the lock is held then we can't even read the value from the Atomic value
address; we must wait for the locker to complete.
The implication is that all writers must first obtain a lock with the
looping part of the above code, a reader must repeatedly read the value
until bit 0 is clear, the result is then the remaining bits shifted down.
Hence
__Atomic_set_word: /* Value in R1, address in R0
mvn R2, #0
0:
swp R3, R2, [R0]
tst R3, #1
bne 0b
mov R1, R1, asl #1
str R1, [r0]
mov pc, lr
__Atomic_read_word:
0:
ldr R1, [R0]
tst R1, #1
bne 0b
mov R0, R1, asr #1
mov pc, lr
(or in C code):
inline int __Atomic_read_word (volatile _Atomic_word *addr)
{
int val;
while ((val = *addr) & 1)
;
return val >> 1;
}
Your suggestion of using the whole low byte for the semaphore would mean
that the read_word code would not have to spin on the semaphore, but any
code that updates the value would have to do an additional load after
first acquiring the lock, it would then have to clear the lock bits out of
the loaded word, eg
__exchange_and_add
mov Ra, #255
0:
swpb Rb, Ra, [Rsem]
tst Rb, #1 /* Any bit in the bottom 8 means lock held */
bne 0b
ldr Rb, [Rsem]
bic Rb, Rb, #255 /* Clear out semaphore word */
add Rc, Rb, Rval, asl #8
str Rc, [Rsem]
mov Rb, Rb, asr #8
It would also need different code for a big-endian system, where the
semaphore byte would be at the other end of the word.
I think that on balance that using a single bit would be more efficient,
despite the drawback of requiring a spin during reads. After all, we
already expect conflicting locks to be very rare.
R.
More information about the Libstdc++
mailing list