This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][libitm] Convert to c++11 atomics


On Tue, 2011-12-13 at 11:17 -0800, Richard Henderson wrote:
> On 11/30/2011 05:13 PM, Richard Henderson wrote:
> > The library is written in C++, so in theory we can use the real atomic<> templates, etc.  Except that we have the same horrid problem finding the C++ headers as did for <type_traits>, so again we have a local copy of <atomic>.  Blah.  But given that it is a copy, the rest of the code is using the real interfaces.
> > 
> > This passes the testsuite on power7 (gcc110), which *is* picky about memory barriers, but I seem to recall that the larger external tests that Velox used were much better at picking out problems.  And I've misplaced those...
> > 
> > Torvald, if you'd be so kind as to cast another set of eyes across this, I'd be grateful.
> 
> I've committed the patch.

I've reviewed your patch, and there are quite a few changes that are
necessary (see the attached patch).

The memory orders and fences as in my patch are based on the C++11
memory model as specified by Batty et al. ("Mathematizing C++
Concurrency: The Post-Rapperswil Model", 2010).  I've tried to keep the
number of barriers (and their strength) as low as possible, but in some
cases the barriers that we seem to need in the C++11 model could be
merged to fewer HW barriers.  However, I think only performance on
non-TSO hardware (e.g., Power) is affected by this because on TSO HW
(e.g., x86) all but the seq_cst fences/memory-orders are no-ops.  The
compiler might be able to optimize some of the cases in the future
(e.g., by merging barriers redundant on a particular arch).
If someone has suggestions for how to optimize this AND has a proof for
why this will still work with the optimizations applied, please speak
up.  Corrections are also welcome, of course.

TM also presents an interesting case that was hard for me to map to the
C++11 model: We need to establish synchronizes-with / happens-before
relations for the loads/stores from/to application data that happen in
transactions.  C++11 requires atomics for this, even if we use barriers
to enforce the memory orders for those accesses.  However, we can't just
forge atomic accesses to nonatomic locations because this might not work
on all architectures (e.g., alignment constraints, metadata,...).

In the particular case (the validated loads technique used in
method-gl.cc, load(), store(), and validate()), we actually do not need
to have loads or stores to be really atomic, but need the compiler to
treat them as if they were atomics wrt. to reordering etc. (e.g., wrt.
adjacent fences).  Right now, I'm relying on the fact that GCC doesn't
optimize atomics yet and am just using nonatomic loads/stores.  Perhaps
it might be TRTDT to add custom builtins for this to GCC, so that we can
model the requirements that we have in libitm precisely, and don't get
surprises once GCC starts to optimize code with atomics.

Tested on x86 with STAMP and microbenchmarks.  But we can't rely on
tests to get confidence that this code works, so if anyone feels
sufficiently familiar with the C++11 model, please review this.


Torvald

Attachment: patch1
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]