This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts


On 2018/12/17 10:32 PM, Thomas Schwinge wrote:
The reason there are deadlocks from inside the plugin on GOMP_PLUGIN_fatal() is when we hold the
struct gomp_device_descr's*device*  lock, which is also acquired when we execute atexit device shutdown handlers, hence the deadlock.

I don't think this is the case for the OpenACC entry points that grab at the openacc.async.* hooks,
Ah, OK, I see.  (But I thought that method of deadlock had been fixed by
some structural changes, to have plugin functions call the
non-terminating "GOMP_PLUGIN_error" and return some error, instead of
calling "GOMP_PLUGIN_fatal"?  I may be misremembering.  Or, that's
another TODO item for later, separately...  Or, if that's actually the
case, that this has been fixed in the way I described, then should these
functions also be changed accordingly: instead of "GOMP_PLUGIN_fatal"
call "GOMP_PLUGIN_error", and then return an error code?)

You remembered correctly, although...

though I can audit them again if deemed so.
My understanding had been that deadlock may happen if we're inside some
of these async/wait/serialize/synchronize functions, with "async" locked,
then run into an error, then libgomp prepares to abort, and at that time
shuts down the device, which will shut down the asyncqueues
("goacc_fini_asyncqueues"), which will again try to lock "async" -- which
it actually doesn't.  My misunderstanding, I guess?

...at least now, you can see that goacc_fini_asyncqueues() does not attempt to
acquire devicep->openacc.async.lock when doing cleanup.

Come to think of it, that might be a bug there. :P

"If there are two or more host threads executing and sharing the same accelerator device,
two asynchronous operations with the same async-value will be enqueued on the same activity queue"
Right, but then, in the very next sentence, it goes on to state: "If the
threads are not synchronized with respect to each other, the operations
may be enqueued in either order and therefore may execute on the device
in either order".  So this, and given that:

I actually didn't care much about that next sentence, since it's just stating the obvious :)

It also seem to imply that the multiple host threads are enqueuing operations to the same async queue, hence further
corroborating that queues are device-wide, not thread.

That said, I recall most (if not all) of the synchronization operations and behavior are all
defined to be with respect to operations of the local host thread only, so the spec mentioning interaction with
other host threads here may be moot, as there's no way meaningful way to synchronize with
the OpenACC activity of other host threads (though correct me if I forgot some API)
..., I concluded something must be wrong in the OpenACC 2.6,
2.16.1. "async clause" text, and no such (host-side) inter-thread
synchronization can be expected from OpenACC "async"/"wait".  I've also
already got that on my list of things to clarify with the OpenACC
technical committee, later on.

I just remembered, there does seem to be one area where device vs. thread-wide interpretation will be visible:
when using acc_get/set_cuda_stream(). Supposedly, given the specification's device-wide queue/stream model,
different host-threads should access the same CUDA stream when using acc_get/set_cuda_stream().
This will break if we made async queues to be thread-local.

Also, CUDA streams do not seem to support local-thread-operation-only synchronization.
I remember this was an issue in the old implementation inside the nvptx plugin as well, and we
had hacks to somewhat alleviate it (i.e. calling streams "single" or "multi" threaded)
Right.

Well, another issue that we might want to bring up to the OpenACC committee:)
I agree that if async queues spaces were simply thread-local then things would be much simpler.
OK, so you agree with that, good.

And, no problem foreseeable about simply moving the asyncqueues into
"goacc_thread" -- and removing the "async" lock?

I think we should still try to solve the potential deadlock problems, and stay close to the current
implementation just for now. We can ask the committee for further guidance later.

Chung-Lin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]