This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4.5] depend nowait support for target
- From: Ilya Verbin <iverbin at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Aldy Hernandez <aldyh at redhat dot com>, gcc-patches at gcc dot gnu dot org, Kirill Yukhin <kirill dot yukhin at gmail dot com>, Thomas Schwinge <thomas at codesourcery dot com>, Alexander Monakov <amonakov at ispras dot ru>, Martin Jambor <mjambor at suse dot cz>
- Date: Thu, 12 Nov 2015 20:43:53 +0300
- Subject: Re: [gomp4.5] depend nowait support for target
- Authentication-results: sourceware.org; auth=none
- References: <20150908092014 dot GA1847 at tucnak dot redhat dot com> <20151002192801 dot GA24765 at msticlxl57 dot ims dot intel dot com> <20151015140156 dot GE478 at tucnak dot redhat dot com> <20151019194754 dot GB1855 at msticlxl57 dot ims dot intel dot com> <20151111165222 dot GL5675 at tucnak dot redhat dot com>
On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> > So, here is what I have for now. Attached target-29.c testcase works fine with
> > MIC emul, however I don't know how to (and where) properly check for completion
> > of async execution on target. And, similarly, where to do unmapping after that?
> > Do we need a callback from plugin to libgomp (as far as I understood, PTX
> > runtime supports this, but HSA doesn't), or libgomp will just check for
> > ttask->is_completed in task.c?
>
> Here is the patch updated to have a task.c defined function that the plugin
> can call upon completion of async offloading exection.
Thanks.
> The testsuite coverage will need to improve, the testcase is wrong
> (contains data races - if you want to test parallel running of two target
> regions that both touch the same var, I'd say best would be to use
> #pragma omp atomic and or in 4 in one case and 1 in another case, then
> test if result is 5 (and similarly for the other var).
> Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
> not have it, but we'll need to find some solution for that.
>
> Another thing to work on beyond testsuite coverage (it is desirable to test
> nowait target tasks (both depend and without depend) being awaited in all
> the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
> taskgroup, or if (0) task with depend clause waiting on that.
>
> Also, I wonder what to do if #pragma omp target nowait is used outside of
> (host) parallel - when team is NULL. All the tasking code in that case just
> executes tasks undeferred, which is fine for all but target nowait - there
> it is I'd say useful to be able to run a single host thread concurrently
> with some async offloading tasks. So, I wonder if in that case,
> if we encounter target nowait with team == NULL, should not just create a
> dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
> if (0) starting above it and ending at program's end. In OpenMP, the
> program's initial thread is implicitly surrounded by inactive parallel, so
> this isn't anything against the OpenMP execution model. But we'd need to
> free the team somewhere in a destructor.
>
> Can you please try to cleanup the liboffloadmic side of this, so that
> a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> is used?
Do you mean something like the patch bellow? I'll discuss it with liboffloadmic
maintainers.
> Can you make sure it works on XeonPhi non-emulated too?
I'm trying to do it, but it will take some time...
Unfortunately, target-32.c fails for me using emulation mode:
Program received signal SIGSEGV, Segmentation fault.
#0 0x00007ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422
#1 0x00007ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195
#2 0x00007ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:468
#3 0x00007ff4ab11570d in gomp_task_maybe_wait_for_dependencies (depend=0x7ff49b0d9de0) at libgomp/task.c:1539
#4 0x00007ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 <.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662
#5 0x00000000004011f9 in main._omp_fn ()
#6 0x00007ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at libgomp/team.c:119
#7 0x0000003b07e07ee5 in start_thread () from /lib64/libpthread.so.0
#8 0x0000003b076f4b8d in clone () from /lib64/libc.so.6
However when I manually run commands from testsuite/libgomp.log under the same
environment, it passes. Don't know where is the difference.
Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only
100% CPU usage in offload_target_main instead of 200%, so it seems that only one
target task is running concurrently.
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6da09b1..772e198 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -220,6 +220,10 @@ static void
register_main_image ()
{
__offload_register_image (&main_target_image);
+
+ /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
+ asynchronous task on target is completed. */
+ __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
}
/* liboffloadmic loads and runs offload_target_main on all available devices
@@ -537,13 +541,3 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
}
-
-/* Called by liboffloadmic when asynchronous function is completed. */
-
-extern "C" void
-__gomp_offload_intelmic_async_completed (const void *async_data)
-{
- TRACE ("(async_data = %p)", async_data);
-
- GOMP_PLUGIN_target_task_completion ((void *) async_data);
-}
diff --git a/liboffloadmic/runtime/offload_host.cpp b/liboffloadmic/runtime/offload_host.cpp
index a150410..e52019d 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -64,8 +64,7 @@ static void __offload_fini_library(void);
#define GET_OFFLOAD_NUMBER(timer_data) \
timer_data? timer_data->offload_number : 0
-extern "C" void
-__gomp_offload_intelmic_async_completed (const void *);
+static void (*task_completion_callback)(void *);
extern "C" {
#ifdef TARGET_WINNT
@@ -2510,7 +2509,7 @@ extern "C" {
const void *info
)
{
- __gomp_offload_intelmic_async_completed (info);
+ task_completion_callback ((void *) info);
}
}
@@ -5672,6 +5671,11 @@ extern "C" void __offload_unregister_image(const void *target_image)
}
}
+extern "C" void __offload_register_task_callback(void (*cb)(void *))
+{
+ task_completion_callback = cb;
+}
+
// Runtime trace interface for user programs
void __offload_console_trace(int level)
diff --git a/liboffloadmic/runtime/offload_host.h b/liboffloadmic/runtime/offload_host.h
index afd5c99..2a43fd6 100644
--- a/liboffloadmic/runtime/offload_host.h
+++ b/liboffloadmic/runtime/offload_host.h
@@ -376,6 +376,9 @@ extern "C" bool __offload_target_image_is_executable(const void *target_image);
extern "C" bool __offload_register_image(const void* image);
extern "C" void __offload_unregister_image(const void* image);
+// Registers asynchronous task completion callback
+extern "C" void __offload_register_task_callback(void (*cb)(void *));
+
// Initializes offload runtime library.
DLL_LOCAL extern int __offload_init_library(void);
-- Ilya