This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Tue, 2011-12-13 at 11:17 -0800, Richard Henderson wrote: > On 11/30/2011 05:13 PM, Richard Henderson wrote: > > The library is written in C++, so in theory we can use the real atomic<> templates, etc. Except that we have the same horrid problem finding the C++ headers as did for <type_traits>, so again we have a local copy of <atomic>. Blah. But given that it is a copy, the rest of the code is using the real interfaces. > > > > This passes the testsuite on power7 (gcc110), which *is* picky about memory barriers, but I seem to recall that the larger external tests that Velox used were much better at picking out problems. And I've misplaced those... > > > > Torvald, if you'd be so kind as to cast another set of eyes across this, I'd be grateful. > > I've committed the patch. I've reviewed your patch, and there are quite a few changes that are necessary (see the attached patch). The memory orders and fences as in my patch are based on the C++11 memory model as specified by Batty et al. ("Mathematizing C++ Concurrency: The Post-Rapperswil Model", 2010). I've tried to keep the number of barriers (and their strength) as low as possible, but in some cases the barriers that we seem to need in the C++11 model could be merged to fewer HW barriers. However, I think only performance on non-TSO hardware (e.g., Power) is affected by this because on TSO HW (e.g., x86) all but the seq_cst fences/memory-orders are no-ops. The compiler might be able to optimize some of the cases in the future (e.g., by merging barriers redundant on a particular arch). If someone has suggestions for how to optimize this AND has a proof for why this will still work with the optimizations applied, please speak up. Corrections are also welcome, of course. TM also presents an interesting case that was hard for me to map to the C++11 model: We need to establish synchronizes-with / happens-before relations for the loads/stores from/to application data that happen in transactions. C++11 requires atomics for this, even if we use barriers to enforce the memory orders for those accesses. However, we can't just forge atomic accesses to nonatomic locations because this might not work on all architectures (e.g., alignment constraints, metadata,...). In the particular case (the validated loads technique used in method-gl.cc, load(), store(), and validate()), we actually do not need to have loads or stores to be really atomic, but need the compiler to treat them as if they were atomics wrt. to reordering etc. (e.g., wrt. adjacent fences). Right now, I'm relying on the fact that GCC doesn't optimize atomics yet and am just using nonatomic loads/stores. Perhaps it might be TRTDT to add custom builtins for this to GCC, so that we can model the requirements that we have in libitm precisely, and don't get surprises once GCC starts to optimize code with atomics. Tested on x86 with STAMP and microbenchmarks. But we can't rely on tests to get confidence that this code works, so if anyone feels sufficiently familiar with the C++11 model, please review this. Torvald
Attachment:
patch1
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |