This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gomp_target_fini


Hi!

Ping.

On Wed, 23 Dec 2015 12:05:32 +0100, I wrote:
> Ping.
> 
> On Wed, 16 Dec 2015 13:30:21 +0100, I wrote:
> > On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > > > +/* This function finalizes all initialized devices.  */
> > > > > +
> > > > > +static void
> > > > > +gomp_target_fini (void)
> > > > > +{
> > > > > +  [...]
> > > > 
> > > > The question is what will this do if there are async target tasks still
> > > > running on some of the devices at this point (forgotten #pragma omp taskwait
> > > > or similar if target nowait regions are started outside of parallel region,
> > > > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > > > Also there is the question that the 
> > > > So I think the patch is ok with the above mentioned changes.
> > > 
> > > Here is what I've committed to trunk.
> > 
> > > --- a/libgomp/libgomp.h
> > > +++ b/libgomp/libgomp.h
> > > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
> > >    } cuda;
> > >  } acc_dispatch_t;
> > >  
> > > +/* Various state of the accelerator device.  */
> > > +enum gomp_device_state
> > > +{
> > > +  GOMP_DEVICE_UNINITIALIZED,
> > > +  GOMP_DEVICE_INITIALIZED,
> > > +  GOMP_DEVICE_FINALIZED
> > > +};
> > > +
> > >  /* This structure describes accelerator device.
> > >     It contains name of the corresponding libgomp plugin, function handlers for
> > >     interaction with the device, ID-number of the device, and information about
> > > @@ -933,8 +941,10 @@ struct gomp_device_descr
> > >    /* Mutex for the mutable data.  */
> > >    gomp_mutex_t lock;
> > >  
> > > -  /* Set to true when device is initialized.  */
> > > -  bool is_initialized;
> > > +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> > > +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> > > +     to FINALIZED state (at program shutdown).  */
> > > +  enum gomp_device_state state;
> > 
> > (ACK, but I assume we'll want to make sure that an OpenACC device is
> > never re-initialized if we're in/after the libgomp finalization phase.)
> > 
> > 
> > The issue mentioned above: "exit inside of parallel" is actually a
> > problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
> > libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
> > test cases now run into annoying "WARNING: program timed out".  Here is
> > what's happening, as I understand it: in
> > libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
> > returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.
> > 
> > > --- a/libgomp/target.c
> > > +++ b/libgomp/target.c
> > 
> > > +/* This function finalizes all initialized devices.  */
> > > +
> > > +static void
> > > +gomp_target_fini (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < num_devices; i++)
> > > +    {
> > > +      struct gomp_device_descr *devicep = &devices[i];
> > > +      gomp_mutex_lock (&devicep->lock);
> > > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > > +	{
> > > +	  devicep->fini_device_func (devicep->target_id);
> > > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > > +	}
> > > +      gomp_mutex_unlock (&devicep->lock);
> > > +    }
> > > +}
> > 
> > > @@ -2387,6 +2433,9 @@ gomp_target_init (void)
> > >        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > >  	goacc_register (&devices[i]);
> > >      }
> > > +
> > > +  if (atexit (gomp_target_fini) != 0)
> > > +    gomp_fatal ("atexit failed");
> > >  }
> > 
> > Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > atexit handler, gomp_target_fini, which, with the device lock held, will
> > call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > clean up.
> > 
> > Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > context is now in an inconsistent state, see
> > <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:
> > 
> >     CUDA_ERROR_LAUNCH_FAILED = 719
> >         An exception occurred on the device while executing a
> >         kernel. Common causes include dereferencing an invalid device
> >         pointer and accessing out of bounds shared memory. The context
> >         cannot be used, so it must be destroyed (and a new one should be
> >         created). All existing device memory allocations from this
> >         context are invalid and must be reconstructed if the program is
> >         to continue using CUDA.
> > 
> > Thus, any cuMemFreeHost invocations that are run during clean-up will now
> > also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> > GOMP_PLUGIN_fatal, which again will trigger the same or another
> > (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> > trying to lock the device again, which is still locked.
> > 
> > (Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
> > errors, should we destroy the context right away, or toggle a boolean
> > flag to mark it as unusable, and use that as an indication to avoid the
> > follow-on failures of cuMemFreeHost just described above, for example?)
> > 
> > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
> > tells us:
> > 
> >     Since the behavior is undefined if the exit() function is called more
> >     than once, portable applications calling atexit() must ensure that the
> >     exit() function is not called at normal process termination when all
> >     functions registered by the atexit() function are called.
> > 
> > ... which we're violating here, at least in the nvptx plugin.  I have not
> > analyzed the intermic one.
> > 
> > As it happens, Chung-Lin has been working in that area:
> > <http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
> > which he recently re-posted:
> > <http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
> > <http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
> > <http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
> > I have not analyzed whether his changes would completely resolve the
> > problem just described, but at least conceptually they seem to be a step
> > into the right direction?  (Jakub?)
> > 
> > Now, to resolve the immediate problem, what is the right thing for us to
> > do?  Is the following simple change OK, or is there a reason to still run
> > atexit handlers if terminating under error conditions?
> > 
> > commit b1733e8f9df6ae7d6828e2194df1b314772701c5
> > Author: Thomas Schwinge <thomas@codesourcery.com>
> > Date:   Wed Dec 16 13:10:39 2015 +0100
> > 
> >     Avoid deadlocks in libgomp due to competing atexit handlers
> >     
> >     	libgomp/
> >     	* error.c (gomp_vfatal): Call _exit instead of exit.
> > ---
> >  libgomp/error.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git libgomp/error.c libgomp/error.c
> > index 094c24a..1ef7491 100644
> > --- libgomp/error.c
> > +++ libgomp/error.c
> > @@ -34,6 +34,7 @@
> >  #include <stdarg.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> > +#include <unistd.h>
> >  
> >  
> >  #undef gomp_vdebug
> > @@ -77,7 +78,7 @@ void
> >  gomp_vfatal (const char *fmt, va_list list)
> >  {
> >    gomp_verror (fmt, list);
> > -  exit (EXIT_FAILURE);
> > +  _exit (EXIT_FAILURE);
> >  }
> >  
> >  void


GrÃÃe
 Thomas

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]