This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing C++1x and C1x atomics (really an aside on SFENCE)


On 8/20/09, Boehm, Hans <hans.boehm@hp.com> wrote:
> > -----Original Message-----
> > From: Lawrence Crowl [mailto:crowl@google.com]
> > The problem is that gcc does support 80386.  It also supports
> > other processors that have less-than-complete support for
> > concurrency.  Just in the x86 line, we get some additional
> > capability in many new layers.
> >
> >   8086        LOCK XCHG
> >   80486       CMPXCHG XADD
> >   Pentium     CMPXCHG8B
> >   SSE         SFENCE
>
> Aside to an interesting discussion:
>
> I believe the current conclusion is that SFENCE should be ignored,
> except for library or compiler-generated code that uses
> non-temporal/coalescing stores, which I believe are also a recent
> addition.  Normal stores are ordered anyway, so it's not needed.
> Thus you are faced with a choice of either (a) implementing fences
> on the assumption that ordinary code may contain non-temporal stores,
> or (b) making sure that non-temporal stores are always surrounded by
> the appropriate fences.  This is really an important ABI issue, but
> it's something that I believe no ABI currently specifies.  Our
> conclusion in earlier discussions among a different group of people
> was that (b) made more sense, since non-temporal stores of various
> kinds seemed to be largely confined to a few library routines.

Hm.  I would expect that given the C++0x memory model, compilers could
be much more aggressive about using non-temporal stores, potentially
improving performance substantially.  That is, it may be better to
accept a slightly less efficient ABI for today's compilers to gain a
more efficient ABI for tomorrow's compilers.

> It would be really nice if everyone somehow managed to agree on this.
> Inconsistency here, probably even between Windows and Linux, seems
> likely to result in really subtle bugs.
>
> Note that this also affects correctness of spinlock implementations,
> not just atomics.  A simple store to release a lock doesn't work if
> the critical section may contain unfenced non-temporal stores.

Yes, but the spinning acquire doesn't require the fence, only the the
release.  So, is this additional instruction a performance problem?

>
> >   SSE2        MFENCE
> >   late AMD64  CMPXCHG16B
> >
> > So, we do not get to ignore the problem as a relic of 80386.


This email seems to have gotten side-tracked by my filters.  Sorry
for the delay.

-- 
Lawrence Crowl


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]