This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH][RFC] Remove volatile from data members in libstdc++


Richard -

I don't fully understand what you're saying here.

Clearly, according to language standards, ordinary loads do not
guarantee atomicity.  A compiler that always load an int in two halves
is clearly standard conforming, according to current standards.
Hopefully you agree?

So are you arguing that gcc always guarantees this, and since the
__sync_ primitives are gcc-specific anyway, there is no need to do
anything more?

I have not seen any such guarantee in the gcc documentation.  I will
grant you that in most cases the cheapest way to load an aligned int is
to do so atomically.  But not always.  Consider the following, where
both local and global are known not to be modified by this code:

	local = global;
	global2 = ...local...;
	l1: <code that needs lots of registers, but doesn't need local,
and doesn't call functions>
	global3 = ...local...;

It is likely that local will have to be spilled at l1.  It can either be
spilled to a stack location and reloaded from the stack location, or we
can skip the store, and just reload local from global.  I claim that the
latter is usually cheaper if the global is directly addressable.  And it
certainly makes the load from global look nonatomic.  And in ways that
would have been disallowed if global were declared volatile.

I have no idea whether gcc currently does this (maybe not) or whether
some future version of gcc might(I wouldn't bet against it).

The proposed C++ memory model allows such optimizations on ordinary,
non-atomic variables.  For code that follows pthread rules, it shouldn't
matter.

Hans

> -----Original Message-----
> From: Richard Guenther [mailto:rguenther@suse.de] 
> Sent: Friday, July 14, 2006 1:12 AM
> To: Boehm, Hans
> Cc: Ian Lance Taylor; Paolo Carlini; gcc-patches@gcc.gnu.org; 
> libstdc++@gcc.gnu.org
> Subject: RE: [PATCH][RFC] Remove volatile from data members 
> in libstdc++
> 
> On Thu, 13 Jul 2006, Boehm, Hans wrote:
> 
> > I think we are all in violent agreement that volatile isn't 
> guaranteed 
> > to solve the problem.
> > 
> > But I still claim it's closer than other viable mechanisms we 
> > currently have.  Thus it's the best available stop-gap workaround 
> > until the underlying issues get fixed.
> 
> I still claim that there are no underlying issues to fix 
> (other than optimizing rope to not using a mutex).  And I 
> still claim that if there were underlying issues to fix 
> volatile would help nothing - in fact likely would make the 
> problem worse as it tends to widen possible race windows due 
> to extra loads and worse optimized code.
> 
> But it's the libstdc++ maintainers call...
> 
> so, please take the bunch of patches I submitted and decide 
> what to do about them for -v3/-v7.
> 
> > The other alternatives that have been suggested are:
> > 
> > 1) The atomicity.h functionality in libstdc++.
> > 
> > 2) The __sync_ gcc intrinsics.
> > 
> > 3) volatile assembly code.
> 
> The only thing that should happen here wrt 1, 2 and 3 is that 
> v7 should overhaul it's atomicity.h primitives to use the gcc 
> intrinsics where available and otherwise fall back to 
> assembly code (for most of the interesting architectures you 
> can steal from the linux kernel code in their 
> include/asm-$arch/atomic.h headers).
> 
> > I think neither (1) nor (2) really apply here.  They should be (and 
> > soon will be, I hope) used for atomic read-modify-write 
> operations.  
> > But they don't support plain atomic loads and stores.  For rope at 
> > least, it's the reference count loads (for copy-avoidance) 
> that are the issue.
> 
> There are no such things as atomic loads.  Even the kernel 
> relies on gcc to load properly aligned words from memory in 
> one piece.  Usually the hardware also relies on proper 
> alignment here (there's the largest "atomic" load of a 
> aligned L1 cacheline).
> 
> > (3) is probably correct, but I don't think it's practical to put 
> > machine-specific in-line assembly code in such library code.
> > 
> > One could argue that the right solution is to add loads and 
> stores to 
> > (2).  But that quickly runs into another major weakness of 
> the current
> > primitives: They don't allow the right kind of control over memory 
> > ordering/visibility.  If you tried to follow the current design and 
> > include a full memory fence everywhere, the resulting atomic loads 
> > would be completely unusable, at least on machines like Pentium 4s 
> > that have slow fences.
> 
> ?  You are mixing two issues here.  First compiler 
> optimization barriers and second, hardware load/store 
> barriers.  Of course the gcc sync builtins already deal with 
> both.  Have to.  Otherwise they would not work at all.
> 
> Again, the linux kernel is a good place to look at for what 
> hoops one need to jump through on what architectures to get 
> proper optimization and CPU barriers and atomicity.
> 
> Richard.
> 
> --
> Richard Guenther <rguenther@suse.de>
> Novell / SUSE Labs
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]