This is the mail archive of the
mailing list for the GCC project.
Re: Implementing C++1x and C1x atomics (really an aside on SFENCE)
- From: Lawrence Crowl <crowl at google dot com>
- To: "Boehm, Hans" <hans dot boehm at hp dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, Richard Guenther <richard dot guenther at gmail dot com>, Andrew Haley <aph at redhat dot com>, Paolo Bonzini <bonzini at gnu dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Wed, 9 Sep 2009 15:51:18 -0700
- Subject: Re: Implementing C++1x and C1x atomics (really an aside on SFENCE)
- References: <4A82E93B.firstname.lastname@example.org> <Pine.LNX.email@example.com> <firstname.lastname@example.org> <Pine.LNX.email@example.com> <firstname.lastname@example.org> <Pine.LNX.email@example.com> <firstname.lastname@example.org> <Pine.LNX.email@example.com> <firstname.lastname@example.org> <238A96A773B3934685A7269CC8A8D042577A1AD998@GVW0436EXB.americas.hpqcorp.net>
On 8/20/09, Boehm, Hans <email@example.com> wrote:
> > -----Original Message-----
> > From: Lawrence Crowl [mailto:firstname.lastname@example.org]
> > The problem is that gcc does support 80386. It also supports
> > other processors that have less-than-complete support for
> > concurrency. Just in the x86 line, we get some additional
> > capability in many new layers.
> > 8086 LOCK XCHG
> > 80486 CMPXCHG XADD
> > Pentium CMPXCHG8B
> > SSE SFENCE
> Aside to an interesting discussion:
> I believe the current conclusion is that SFENCE should be ignored,
> except for library or compiler-generated code that uses
> non-temporal/coalescing stores, which I believe are also a recent
> addition. Normal stores are ordered anyway, so it's not needed.
> Thus you are faced with a choice of either (a) implementing fences
> on the assumption that ordinary code may contain non-temporal stores,
> or (b) making sure that non-temporal stores are always surrounded by
> the appropriate fences. This is really an important ABI issue, but
> it's something that I believe no ABI currently specifies. Our
> conclusion in earlier discussions among a different group of people
> was that (b) made more sense, since non-temporal stores of various
> kinds seemed to be largely confined to a few library routines.
Hm. I would expect that given the C++0x memory model, compilers could
be much more aggressive about using non-temporal stores, potentially
improving performance substantially. That is, it may be better to
accept a slightly less efficient ABI for today's compilers to gain a
more efficient ABI for tomorrow's compilers.
> It would be really nice if everyone somehow managed to agree on this.
> Inconsistency here, probably even between Windows and Linux, seems
> likely to result in really subtle bugs.
> Note that this also affects correctness of spinlock implementations,
> not just atomics. A simple store to release a lock doesn't work if
> the critical section may contain unfenced non-temporal stores.
Yes, but the spinning acquire doesn't require the fence, only the the
release. So, is this additional instruction a performance problem?
> > SSE2 MFENCE
> > late AMD64 CMPXCHG16B
> > So, we do not get to ignore the problem as a relic of 80386.
This email seems to have gotten side-tracked by my filters. Sorry
for the delay.