Atomic operations on the ARM

Richard Earnshaw rearnsha@arm.com
Mon Oct 7 06:11:00 GMT 2002


> Well, here's exchange_and_add:
>   __asm__ ("\n"
>            "0:\tldr\t%0,[%3]\n\t"
>            "add\t%1,%0,%4\n\t"
>            "swp\t%2,%1,[%3]\n\t"
>            "cmp\t%0,%2\n\t"
>            "swpne\t%1,%2,[%3]\n\t"
>            "bne\t0b"
> 
> The others are pretty similar.  So it looks like the same thing to me. 
> Ugh!
> 

It's almost certain that the libstc++ code came originally form the glibc 
implementation...


> I'm not very familiar with ARM, so I'm just going by a quick reference
> guide here... but it appears that SWP is the only one of the normal
> atomic primitives available.  So we have test-and-set/xchg, but that's
> about it.
> 

That's about it, yes.

> You can do a 24-bit mutex (which is all at least one other platform
> offers) using swpb; then you use one byte of the _Atomic_word as a
> spinlock.  That does still have some issues but should be possible
> without an ABI change since the size/alignment of the _Atomic_word
> don't actually change.  Does that sound worthwhile?
> 
> > We could get clever and use a single bit in _Atomic_word to be a mutex 
> > bit, and effectively make it a bit-field, effectively
> > 
> > typedef struct
> > {
> >   int mutex:1;
> >   signed int val:31
> > } _Atomic_word;
> > 
> > But that would involve fixing the source code that directly accesses this 
> > type and changing it to use set and read macros.
> 
> Why, fancy that.  It looks awfully familiar... :)  See above for why I
> think it's advantageous, though.  I don't think you could implement
> that safely with only swap-word and swap-byte, also.  Need a whole
> byte.

Hmm, that might work though it wastes more bits, I was thinking of code 
along the lines of

Initial:  Atomic_word address in Rsem, increment in Rval; Ra & Rb are 
scratch

	mvn	Ra, #0		/* Ra <- -1 */
0:
	swp	Rb, Ra, [Rsem]
	tst	Rb, #1
	bne	0b
	add	Rc, Rb, Rval, asl #1
	str	Rc, [Rsem]
	mov	Rb, Rb, asr #1

Result is in Rb.

Note that this works because we guarantee that bit zero of the value is 
really the lock bit, so can never be 1 when semaphore is not held.  When 
the lock is held then we can't even read the value from the Atomic value 
address; we must wait for the locker to complete.

The implication is that all writers must first obtain a lock with the 
looping part of the above code, a reader must repeatedly read the value 
until bit 0 is clear, the result is then the remaining bits shifted down.

Hence 

__Atomic_set_word: /* Value in R1, address in R0
	mvn	R2, #0
0:
	swp	R3, R2, [R0]
	tst	R3, #1
	bne	0b
	mov	R1, R1, asl #1
	str	R1, [r0]
	mov	pc, lr

__Atomic_read_word:
0:
	ldr	R1, [R0]
	tst	R1, #1
	bne	0b
	mov	R0, R1, asr #1
	mov	pc, lr

(or in C code):

inline int __Atomic_read_word (volatile _Atomic_word *addr)
{
  int val;

  while ((val = *addr) & 1)
    ;

  return val >> 1;
}

Your suggestion of using the whole low byte for the semaphore would mean 
that the read_word code would not have to spin on the semaphore, but any 
code that updates the value would have to do an additional load after 
first acquiring the lock, it would then have to clear the lock bits out of 
the loaded word, eg

__exchange_and_add

	mov	Ra, #255
0:
	swpb	Rb, Ra, [Rsem]
	tst	Rb, #1		/* Any bit in the bottom 8 means lock held */
	bne	0b
	ldr	Rb, [Rsem]
	bic	Rb, Rb, #255	/* Clear out semaphore word */
	add	Rc, Rb, Rval, asl #8
	str	Rc, [Rsem]
	mov	Rb, Rb, asr #8

It would also need different code for a big-endian system, where the 
semaphore byte would be at the other end of the word.

I think that on balance that using a single bit would be more efficient, 
despite the drawback of requiring a spin during reads.  After all, we 
already expect conflicting locks to be very rare.

R.



More information about the Libstdc++ mailing list