This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gomp_target_fini


On Jan 22, 2016, at 2:16 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
>> Thomas, I've mentioned this issue before - there is sometimes just too much
>> irrelevant stuff to wade through in your patch submissions, and it
>> discourages review. The discussion of the actual problem begins more than
>> halfway through your multi-page mail. Please try to be more concise.
>> 
>> On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
>>> Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
>>> atexit handler, gomp_target_fini, which, with the device lock held, will
>>> call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
>>> clean up.
>>> 
>>> Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
>>> context is now in an inconsistent state
>> 
>>> Thus, any cuMemFreeHost invocations that are run during clean-up will now
>>> also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
>>> GOMP_PLUGIN_fatal, which again will trigger the same or another
>>> (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
>>> trying to lock the device again, which is still locked.
>> 
>>>    	libgomp/
>>>    	* error.c (gomp_vfatal): Call _exit instead of exit.
>> 
>> It seems unfortunate to disable the atexit handlers for everything for what
>> seems purely an nvptx problem.
>> 
>> What exactly happens if you don't register the cleanups with atexit in the
>> first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in the
>> cleanup functions?
> 
> I agree, _exit is just wrong, there could be important atexit hooks from the
> application.  You can set some flag that the libgomp or nvptx plugin atexit
> hooks should not do anything, or should do things differently.  But
> bypassing all atexit handlers is risky.

I’d use the phrase, is wrong.

Just create a semaphore that says that init was fully done, and at the end of init, set it, and at the beginning of the cleanup, just test it and anytime you want to cancel the cleanup, reset the semaphore.  Think of it, as a is_valid predicate.  Any operation that needs it to be valid can query it first, and fail otherwise.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]