[RFC] Fix full memory barrier on SPARC-V8

Geert Bosch bosch@adacore.com
Tue Jun 28 07:28:00 GMT 2011


On Jun 27, 2011, at 22:45, David Miller wrote:

> From: Geert Bosch <bosch@adacore.com>
> Date: Mon, 27 Jun 2011 22:21:47 -0400
> 
>> On Jun 27, 2011, at 19:53, David Miller wrote:
>> 
>>> Adding a ldstub here is going to be really expensive, on UltraSparc
>>> that can be 36+ cycles even on a cache hit.
>> 
>> Yes, synchronization in multi-CPU systems is expensive.
>> If it's really cheap, you're probably doing something wrong.
> 
> First, I fundamentally disagree with this assertion.  The reason
> proper memory barriers exist is so that you don't need nonsense like
> these proposed atomics to get proper memory operation ordering.
Sorry, I see now I phrased this poorly, no offense intended. We both 
agree that with TSO there is never a need for any STBAR instructions on 
SPARCv8. The point is that TSO is not sufficient for strong consistency.
The reason for this is the existence of write buffers (see fig 6.1, or K-1
of the SPARC v8 architecture manual). In particular, note the CPU-local
bypass from the store buffer. Two processors both storing a value X in 
location Y and then reading from Y might each see their own value. In
the end, one will reach memory first and the stores will be ordered
there. The load-store instructions are necessary to ensure the store
will be seen by the memory system before subsequent loads can use them.

The main issue is that SPARC's TSO does not guarantee Store-Load ordering.
So, only by issuing a SWAP(A) or LDSTUB(A) instruction can total ordering
of all loads and stores be guaranteed. 
> 
> A proper membar on your v9 test system is orders of magnitude cheaper
> than this stbar+ldstub business.
That's true, but membar is a SPARC v9 instruction. The issue Eric and I
are addressing is only about SPARCv8. 

> \You then go on to speak about LEON, does LEON implement PSO?
No, I'm not talking about PSO anywhere or SPARCv9 anywhere. 
Just plain old SPARCv8, using the TSO model. This requires a
load-store instruction to guarantee a full memory barrier.

I'm not making this up, that is why I refer to the examples in
the SPARC v8 architecture manual that specifically state that
SWAP instructions need to be used instead of store instructions
to make Dekker's algorithm work.

  -Geert



More information about the Gcc-patches mailing list