This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link)


Hi!

On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > +/* This function finalizes all initialized devices.  */
> > > +
> > > +static void
> > > +gomp_target_fini (void)
> > > +{
> > > +  [...]
> > 
> > The question is what will this do if there are async target tasks still
> > running on some of the devices at this point (forgotten #pragma omp taskwait
> > or similar if target nowait regions are started outside of parallel region,
> > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > Also there is the question that the 
> > So I think the patch is ok with the above mentioned changes.
> 
> Here is what I've committed to trunk.

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
>    } cuda;
>  } acc_dispatch_t;
>  
> +/* Various state of the accelerator device.  */
> +enum gomp_device_state
> +{
> +  GOMP_DEVICE_UNINITIALIZED,
> +  GOMP_DEVICE_INITIALIZED,
> +  GOMP_DEVICE_FINALIZED
> +};
> +
>  /* This structure describes accelerator device.
>     It contains name of the corresponding libgomp plugin, function handlers for
>     interaction with the device, ID-number of the device, and information about
> @@ -933,8 +941,10 @@ struct gomp_device_descr
>    /* Mutex for the mutable data.  */
>    gomp_mutex_t lock;
>  
> -  /* Set to true when device is initialized.  */
> -  bool is_initialized;
> +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> +     to FINALIZED state (at program shutdown).  */
> +  enum gomp_device_state state;

(ACK, but I assume we'll want to make sure that an OpenACC device is
never re-initialized if we're in/after the libgomp finalization phase.)


The issue mentioned above: "exit inside of parallel" is actually a
problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
test cases now run into annoying "WARNING: program timed out".  Here is
what's happening, as I understand it: in
libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      gomp_mutex_lock (&devicep->lock);
> +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> +	{
> +	  devicep->fini_device_func (devicep->target_id);
> +	  devicep->state = GOMP_DEVICE_FINALIZED;
> +	}
> +      gomp_mutex_unlock (&devicep->lock);
> +    }
> +}

> @@ -2387,6 +2433,9 @@ gomp_target_init (void)
>        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
>  	goacc_register (&devices[i]);
>      }
> +
> +  if (atexit (gomp_target_fini) != 0)
> +    gomp_fatal ("atexit failed");
>  }

Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
atexit handler, gomp_target_fini, which, with the device lock held, will
call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
clean up.

Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
context is now in an inconsistent state, see
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:

    CUDA_ERROR_LAUNCH_FAILED = 719
        An exception occurred on the device while executing a
        kernel. Common causes include dereferencing an invalid device
        pointer and accessing out of bounds shared memory. The context
        cannot be used, so it must be destroyed (and a new one should be
        created). All existing device memory allocations from this
        context are invalid and must be reconstructed if the program is
        to continue using CUDA.

Thus, any cuMemFreeHost invocations that are run during clean-up will now
also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
GOMP_PLUGIN_fatal, which again will trigger the same or another
(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
trying to lock the device again, which is still locked.

(Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
errors, should we destroy the context right away, or toggle a boolean
flag to mark it as unusable, and use that as an indication to avoid the
follow-on failures of cuMemFreeHost just described above, for example?)

<http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
tells us:

    Since the behavior is undefined if the exit() function is called more
    than once, portable applications calling atexit() must ensure that the
    exit() function is not called at normal process termination when all
    functions registered by the atexit() function are called.

... which we're violating here, at least in the nvptx plugin.  I have not
analyzed the intermic one.

As it happens, Chung-Lin has been working in that area:
<http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
which he recently re-posted:
<http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
I have not analyzed whether his changes would completely resolve the
problem just described, but at least conceptually they seem to be a step
into the right direction?  (Jakub?)

Now, to resolve the immediate problem, what is the right thing for us to
do?  Is the following simple change OK, or is there a reason to still run
atexit handlers if terminating under error conditions?

commit b1733e8f9df6ae7d6828e2194df1b314772701c5
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Dec 16 13:10:39 2015 +0100

    Avoid deadlocks in libgomp due to competing atexit handlers
    
    	libgomp/
    	* error.c (gomp_vfatal): Call _exit instead of exit.
---
 libgomp/error.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git libgomp/error.c libgomp/error.c
index 094c24a..1ef7491 100644
--- libgomp/error.c
+++ libgomp/error.c
@@ -34,6 +34,7 @@
 #include <stdarg.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <unistd.h>
 
 
 #undef gomp_vdebug
@@ -77,7 +78,7 @@ void
 gomp_vfatal (const char *fmt, va_list list)
 {
   gomp_verror (fmt, list);
-  exit (EXIT_FAILURE);
+  _exit (EXIT_FAILURE);
 }
 
 void


GrÃÃe
 Thomas

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]