This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Cilk+ questions

From: Torvald Riegel <triegel at redhat dot com>
To: "Iyer, Balaji V" <balaji dot v dot iyer at intel dot com>
Cc: Jeff Law <law at redhat dot com>, gcc at gcc dot gnu dot org
Date: Fri, 21 Oct 2011 16:31:44 +0200
Subject: Cilk+ questions
References: <1313748023.3533.1957.camel@triegel.csb> <2950715866004049A240A2F9BB410E7315F220450B@azsmsx502.amr.corp.intel.com> <1315518085.14756.8211.camel@triegel.csb> <2950715866004049A240A2F9BB410E7315F3F030FA@azsmsx502.amr.corp.intel.com>
Hello everyone,

we thought to take this conversation to the list because it is related
to the GCC implementation of Cilk+. My questions to Balaji are quoted,
Balaji's replies are marked with BVI>.

On Fri, 2011-09-23 at 10:42 -0700, Iyer, Balaji V wrote:
> > 1) Implicit syncs are said to be called after destructors. However, 
> > this seems to be counter-intuitive for programmers. What was the 
> > reasoning to choose that instead of running destructors after the 
> > implicit sync? Was it about performance?
> > 
> > 
> > BVI) Implicit syncs: It is true that in most cases it would be better 
> > to sync before calling destructors in order to avoid race conditions.
> > However, destructors are often used to release locks.
> 
> Agreed, but I would assume that this is not that common. Are there any
> common cases _besides_ lock objects for block scope? OTOH, I can't think
> of any other objects that would hold locks throughout their whole
> lifetime instead of just during individual operations. Do you have
> any examples of other common objects/patterns?
> 
> BVI> We don't have any other examples off hand, but the lock-guard case is a
> BVI> very common one and is especially important now that it is part of the
> BVI> new C++ standard.

Hmm. Could we make a special case for lock guards? That is, have them be
called before the sync, so that we first call all lock-guard
destructors, then the sync, then the rest of the destructors? Then we'd
still release locks released from other destructors after the case.

However, this changes the desctructor order, and so probably adds more
complexity and potential surprises than necessary. Does anybody have an
idea how to avoid being prone to both race conditions and deadlocks?

> 
> > If a sync
> > occurs while a lock is held, the result is likely to be deadlock, so 
> > it is critical that the destructor that releases the lock be called 
> > before the sync.  Thus, there is no universally-correct location for 
> > the sync.  Either decision has a work-around: you can advance the sync 
> > by putting it in explicitly and you can advance the destructor call by 
> > putting in an extra set of braces.  The former is slightly easier and 
> > more flexible, thus the (somewhat arbitrary) decision to call 
> > destructors before syncing.
> 
> I don't really agree regarding what's better in this trade-off. If the
> to-be-destructed object is shared, the destructor will race with other
> potential uses, right? And I would argue that most destructors are not
> written in a thread-safe way and such that they leave the object in a
> state where other operations can (and do) detect that it's been
> destructed already.
> Thus, if the programmer makes a mistake, we have the trade-off between
> a race condition and a deadlock, and I'd really prefer the deadlock in
> this case, especially because we can more easily provide diagnostics
> to the programmer for a deadlock, than for a race. Also note that GCC
> doesn't include a proper race detection tool currently.
> 
> BVI> This is a good point, and one that we should consider thoroughly.  I
> BVI> would want to get a lot of feedback before making a change in this area.
> 

How do we get this feedback, and from whom?


> > Nevertheless, the programmer note is actually obsolete -- work at MIT 
> > has demonstrated an efficient implementation of hyperobjects using the 
> > memory-mapped hardware already on the CPU.  It does not use 
> > copy-on-write, but it does result in the same address being visible 
> > for a hyperobject in different concurrent strands.
> 
> So I guess they map the same memory region at several addresses? Are there
> any other specific reason to require the equality relation to be handled
> via the address of the object?
> 
> BVI> Actually, it maps *different* memory regions to the *same* address.
> BVI> Thus, two distinct views can have the same address.  We are not requiring
> BVI> anything of any specific equality relation.  The programmer note was
> BVI> intended to point out a specific attribute of the implementation that
> BVI> could exploited by the programmer to determine if a steal occurred.

Ah okay. If this is just an implementation artifact, then perhaps it
should not be part of the language specification so that it does not 

> > The implementation note is not about
> > copy-on-write.  It observes that the view of the hyperobject changes 
> > only at a spawn or sync, so the compiler can optimize out the lookup 
> > in cases where flow analysis shows that the view cannot change.
> 
> Okay. I'll take another look at that.
> 
> 
> > 3) The spec states that the C++ hyperobject syntax might change. Can 
> > you share any details, or plans? This would be interesting to know; if 
> > this is going to be changed anyway, the GCC Cilk+ implementation 
> > should likely have the new (hopefully better?) syntax, or not?
> > 
> > BVI) Hyperobject syntax: We are currently exploring the possibility of 
> > creating "hyperpointers" whereby a normal object can be treated like a 
> > reducer by referencing it through a smart pointer type.  It is likely 
> > that the existing hyperobject syntax will remain supported for some 
> > time and may actually be superior for certain situations.
> 
> Do you have additional information about that?
> 
> BVI> I can give you the general idea: Instead of defining an object that has
> BVI> "hyperobject semantics", we define a normal (not-hyper) object, then
> BVI> create a hyperpointer to that object.  If you access the object by
> BVI> dereferencing the hyperpointer, you get hyperobject-like behavior.  If
> BVI> you access the object normally, you get normal behavior.  Thus, you
> BVI> create a normal object in serial code, manipulate it through a
> BVI> hyperpointer in parallel code, and view the result when the parallel
> BVI> code is done and you're back in serial code.

Hmm. So this would be so kind of automatic synchronization, or creating
copies and merging them automatically when necessary? More details would
still be nice to know.


> > BVI) Lookups within monoids: the prohibition on doing a hyperobject 
> > lookup within a monoid operation is simply to prevent re-entering the 
> > hyperobject runtime, causing either infinite recursion, deadlock, or race conditions.
> > Only hyperobjects that use such an ill-formed monoid would be 
> > affected, but if the program hangs or crashes, then obviously 
> > everybody is affected ;-)
> 
> I see, so it's a reentrancy issue you don't want to deal with in the library.
> 
> BVI> We want to deal with it; we just haven't yet :-).  It's on our list.

So is the long-term goal to have the library deal with this and remove
the respective programming constraint in the language specification?

> 
> > Regarding the ABI, I was somewhat surprised to read that changing the 
> > number of workers requires the application to shut down the runtime 
> > and restart it. What's the reason for this constraint? Was this just 
> > to keep things simple? It seems to me that being able to change the 
> > number of workers at runtime would be good for system-wide scheduling, 
> > even if the Cilk runtime just treats the call as a hint that it can 
> > lazily adhere to.
> > 
> > BVI) The number of system workers is fixed because we use a simple 
> > linear array to hold the workers.  The system workers are allocated at 
> > the beginning of the array, and user workers are assigned slots in the 
> > array as they bind to the runtime.  By default, the array is sized to 
> > 3x the number of workers, though this can be overridden by an API 
> > call.  See the calculation of nworkers and max_user_workers in 
> > cilkg_get_user_settable_values in global_state.cpp for all the gory 
> > details.
> > 
> > If some other threading package is being used to create more user 
> > workers that bind to the runtime than there are slots in the array, we 
> > dynamically create workers for them that simply can't be stolen from.  
> > Our assumption is that the system is oversubscribed and stealing 
> > wasn't likely anyway.
> 
> It's not just that. We could potentially also want to reduce parallelism
> for a particular application (eg, if another application of higher
> priority is running at the same time). Sure we can you have more worker
> threads than necessary, but this won't be very good for performance. And
> there is CPU hotplug and all that (eg, think VMs), so I think it would
> be good (in the long term, likely) to make that adjustable at runtime.
> 
> BVI> Perhaps.  This gets discussed now and then, but we havenât seen enough
> BVI> use cases to make it high-priority.  Now that the runtime is open source,
> BVI> anybody who thinks that this is important is welcome to implement it.
> BVI> I'm sure that this issue will come up again.

Yes, this will likely come up again.

> 
> > Choosing the worker to steal from is a simple matter of generating a 
> > random number, mod-ing that value by the array size, and indexing into 
> > the array. Making the structure able to handle changes requires 
> > locking which will impact performance and may introduce race conditions.
> 
> Are you sure about that? If this change can happen lazily, then we can
> also synchronize lazily, like it's done in RCU for example...
> 
> BVI> Yes.  We have little doubt that this is technically feasible, if someone
> BVI> put enough time into it.  The naÃve implementation would require frequent
> BVI> locking, but I have little doubt that an implementation is possible
> BVI> without so much locking.
> 
> > We've considered allowing users to change the number of workers 
> > dynamically, but thus far the costs have outweighed the benefits.
> 
> Can you elaborate? That is, which implementations have you considered,
> and what where the performance benefits/costs?
> 
> BVI> The main use for such a feature so far has been for writing tests and,
> BVI> though that is important, the amount of work to implement this feature
> BVI> did not seem worth the potential instability.  We don't rule it out in
> BVI> the future and, of course, the open-source community is welcome to take a
> BVI> swipe at it.

I see.


Torvald
Follow-Ups:
- Re: Cilk+ questions
  - From: Jonathan Wakely
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]