This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[hsa] Redesign busy loop waiting so that a kernel dispatch signal can be reused
- From: Martin LiÅka <mliska at suse dot cz>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 24 Nov 2015 15:55:13 +0100
- Subject: [hsa] Redesign busy loop waiting so that a kernel dispatch signal can be reused
- Authentication-results: sourceware.org; auth=none
Hello.
Following patch is a workaround for Carrizo devices that tend to have problems
with propagation of signal values due to an issue with L2.
Commited to the branch.
Martin
>From ca4475aedb47e49b4bdc0a8980f200ec93b31d61 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 24 Nov 2015 10:41:54 +0100
Subject: [PATCH 1/2] Redesign busy loop waiting so that a kernel dispatch
signal can be reused
libgomp/ChangeLog:
2015-11-24 Martin Liska <mliska@suse.cz>
* plugin/plugin-hsa.c (GOMP_OFFLOAD_run): Rewrite busy loop
that does a workaround for Carrizo machines.
---
libgomp/plugin/plugin-hsa.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index b866a78..99ec8e1 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -1219,24 +1219,26 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void** args)
__atomic_store_n ((uint16_t*)(&packet->header), header, __ATOMIC_RELEASE);
hsa_signal_store_release (agent->command_q->doorbell_signal, index);
- /* TODO: fixup, following workaround is necessary to run kernel from
- kernel dispatch mechanism on a Carrizo machine. */
-
- for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
- {
- hsa_signal_t child_s;
- child_s.handle = shadow->children_dispatches[i]->signal;
-
- HSA_DEBUG ("Waiting for children completion signal: %lu\n",
- shadow->children_dispatches[i]->signal);
- while (hsa_signal_wait_acquire
- (child_s, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX,
- HSA_WAIT_STATE_BLOCKED) != 0);
- }
+ /* TODO: GPU agents in Carrizo APUs cannot properly update L2 cache for
+ signal wait and signal load operations on their own and we need to
+ periodically call the hsa_signal_load_acquire on completion signals of
+ children kernels in the CPU to make that happen. As soon the
+ limitation will be resolved, this workaround can be removed. */
HSA_DEBUG ("Kernel dispatched, waiting for completion\n");
- while (hsa_signal_wait_acquire (s, HSA_SIGNAL_CONDITION_LT, 1,
- UINT64_MAX, HSA_WAIT_STATE_BLOCKED) != 0);
+
+ /* Root signal waits with 1ms timeout. */
+ while (hsa_signal_wait_acquire (s, HSA_SIGNAL_CONDITION_LT, 1, 1000 * 1000,
+ HSA_WAIT_STATE_BLOCKED) != 0)
+ for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
+ {
+ hsa_signal_t child_s;
+ child_s.handle = shadow->children_dispatches[i]->signal;
+
+ HSA_DEBUG ("Waiting for children completion signal: %lu\n",
+ shadow->children_dispatches[i]->signal);
+ hsa_signal_load_acquire (child_s);
+ }
release_kernel_dispatch (shadow);
--
2.6.3