This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4.5] depend nowait support for target


On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> > So, here is what I have for now.  Attached target-29.c testcase works fine with
> > MIC emul, however I don't know how to (and where) properly check for completion
> > of async execution on target.  And, similarly, where to do unmapping after that?
> > Do we need a callback from plugin to libgomp (as far as I understood, PTX
> > runtime supports this, but HSA doesn't), or libgomp will just check for
> > ttask->is_completed in task.c?
> 
> Here is the patch updated to have a task.c defined function that the plugin
> can call upon completion of async offloading exection.

Thanks.

> The testsuite coverage will need to improve, the testcase is wrong
> (contains data races - if you want to test parallel running of two target
> regions that both touch the same var, I'd say best would be to use
> #pragma omp atomic and or in 4 in one case and 1 in another case, then
> test if result is 5 (and similarly for the other var).
> Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
> not have it, but we'll need to find some solution for that.
> 
> Another thing to work on beyond testsuite coverage (it is desirable to test
> nowait target tasks (both depend and without depend) being awaited in all
> the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
> taskgroup, or if (0) task with depend clause waiting on that.
> 
> Also, I wonder what to do if #pragma omp target nowait is used outside of
> (host) parallel - when team is NULL.  All the tasking code in that case just
> executes tasks undeferred, which is fine for all but target nowait - there
> it is I'd say useful to be able to run a single host thread concurrently
> with some async offloading tasks.  So, I wonder if in that case,
> if we encounter target nowait with team == NULL, should not just create a
> dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
> if (0) starting above it and ending at program's end.  In OpenMP, the
> program's initial thread is implicitly surrounded by inactive parallel, so
> this isn't anything against the OpenMP execution model.  But we'd need to
> free the team somewhere in a destructor.
>
> Can you please try to cleanup the liboffloadmic side of this, so that
> a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> is used?

Do you mean something like the patch bellow?  I'll discuss it with liboffloadmic
maintainers.

> Can you make sure it works on XeonPhi non-emulated too?

I'm trying to do it, but it will take some time...

Unfortunately, target-32.c fails for me using emulation mode:

Program received signal SIGSEGV, Segmentation fault.
#0  0x00007ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422
#1  0x00007ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195
#2  0x00007ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:468
#3  0x00007ff4ab11570d in gomp_task_maybe_wait_for_dependencies (depend=0x7ff49b0d9de0) at libgomp/task.c:1539
#4  0x00007ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 <.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662
#5  0x00000000004011f9 in main._omp_fn ()
#6  0x00007ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at libgomp/team.c:119
#7  0x0000003b07e07ee5 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003b076f4b8d in clone () from /lib64/libc.so.6

However when I manually run commands from testsuite/libgomp.log under the same
environment, it passes.  Don't know where is the difference.

Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only
100% CPU usage in offload_target_main instead of 200%, so it seems that only one
target task is running concurrently.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6da09b1..772e198 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -220,6 +220,10 @@ static void
 register_main_image ()
 {
   __offload_register_image (&main_target_image);
+
+  /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
+     asynchronous task on target is completed.  */
+  __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -537,13 +541,3 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
 
   GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
 }
-
-/* Called by liboffloadmic when asynchronous function is completed.  */
-
-extern "C" void
-__gomp_offload_intelmic_async_completed (const void *async_data)
-{
-  TRACE ("(async_data = %p)", async_data);
-
-  GOMP_PLUGIN_target_task_completion ((void *) async_data);
-}
diff --git a/liboffloadmic/runtime/offload_host.cpp b/liboffloadmic/runtime/offload_host.cpp
index a150410..e52019d 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -64,8 +64,7 @@ static void __offload_fini_library(void);
 #define GET_OFFLOAD_NUMBER(timer_data) \
     timer_data? timer_data->offload_number : 0
 
-extern "C" void
-__gomp_offload_intelmic_async_completed (const void *);
+static void (*task_completion_callback)(void *);
 
 extern "C" {
 #ifdef TARGET_WINNT
@@ -2510,7 +2509,7 @@ extern "C" {
         const void *info
     )
     {
-	__gomp_offload_intelmic_async_completed (info);
+	task_completion_callback ((void *) info);
     }
 }
 
@@ -5672,6 +5671,11 @@ extern "C" void __offload_unregister_image(const void *target_image)
     }
 }
 
+extern "C" void __offload_register_task_callback(void (*cb)(void *))
+{
+    task_completion_callback = cb;
+}
+
 // Runtime trace interface for user programs
 
 void __offload_console_trace(int level)
diff --git a/liboffloadmic/runtime/offload_host.h b/liboffloadmic/runtime/offload_host.h
index afd5c99..2a43fd6 100644
--- a/liboffloadmic/runtime/offload_host.h
+++ b/liboffloadmic/runtime/offload_host.h
@@ -376,6 +376,9 @@ extern "C" bool __offload_target_image_is_executable(const void *target_image);
 extern "C" bool __offload_register_image(const void* image);
 extern "C" void __offload_unregister_image(const void* image);
 
+// Registers asynchronous task completion callback
+extern "C" void __offload_register_task_callback(void (*cb)(void *));
+
 // Initializes offload runtime library.
 DLL_LOCAL extern int __offload_init_library(void);
 

  -- Ilya


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]