This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[gomp4] Fix task cancellation
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Henderson <rth at redhat dot com>, Torvald Riegel <triegel at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Tue, 8 Oct 2013 20:27:31 +0200
- Subject: [gomp4] Fix task cancellation
- Authentication-results: sourceware.org; auth=none
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
Hi!
I've noticed that occassionally on larger box under high load
cancel-taskgroup-{2.c,2.C,3.C} tests would time out.
The problem is that sometimes gomp_team_barrier_clear_task_pending
wasn't called, when GOMP_taskgroup_end (or could be GOMP_taskwait)
ate the last few threads.
We have task_count counter (number of GOMP_TASK_{WAITING,TIED} tasks
that haven't finished yet) and then task_running_count, which is only
incremented in gomp_barrier_handle_tasks and decremented there too,
and we use those for two purposes:
1) to find out if it makes sense to wake some threads possibly sleeping
on a barrier
2) to find out if it is ok to call gomp_team_barrier_clear_task_pending
Pre-gomp-4_0-branch, we were actually incrementing task_running_count
in both gomp_barrier_handle_tasks and GOMP_taskwait (GOMP_taskgroup_end
didn't exist yet), and thus task_running_count was actually usable for 2),
but could be way off for 1) - task_running_count could be much bigger than
nthreads, even when say all the running tasks were on one or two threads,
all caused by GOMP_taskwait inside of task spawned from GOMP_taskwait,
etc. I've changed it to only increment task_running_count in
gomp_barrier_handle_tasks, so it is actually guaranteed to be <= nthreads
and can be reliably used for 1), but that apparently broke the spots
using it for 2). So, this patch introduces another counter, number of
GOMP_TASK_WAITING tasks ready to be scheduled, but not scheduled yet,
and we simply clear pending tasks bits when that counter decrements to zero.
Tested on x86_64-linux and i686-linux.
2013-10-08 Jakub Jelinek <jakub@redhat.com>
* libgomp.h (struct gomp_team): Add task_queued_count field.
Add comments about task_{,queued_,running_}count.
* team.c (gomp_new_team): Clear task_queued_count.
* task.c (GOMP_task): Increment task_queued_count.
(gomp_task_run_pre): Decrement task_queued_count. If it is
decremented to zero, call gomp_team_barrier_clear_task_pending.
(gomp_task_run_post_handle_dependers): Increment task_queued_count.
(gomp_barrier_handle_tasks): Don't call
gomp_team_barrier_clear_task_pending here.
--- libgomp/libgomp.h.jj 2013-10-04 13:48:39.000000000 +0200
+++ libgomp/libgomp.h 2013-10-08 19:17:12.303460999 +0200
@@ -385,8 +385,18 @@ struct gomp_team
gomp_mutex_t task_lock;
struct gomp_task *task_queue;
- int task_count;
- int task_running_count;
+ /* Number of all GOMP_TASK_{WAITING,TIED} tasks in the team. */
+ unsigned int task_count;
+ /* Number of GOMP_TASK_WAITING tasks currently waiting to be scheduled. */
+ unsigned int task_queued_count;
+ /* Number of GOMP_TASK_{WAITING,TIED} tasks currently running
+ directly in gomp_barrier_handle_tasks; tasks spawned
+ from e.g. GOMP_taskwait or GOMP_taskgroup_end don't count, even when
+ that is called from a task run from gomp_barrier_handle_tasks.
+ task_running_count should be always <= team->nthreads,
+ and if current task isn't in_tied_task, then it will be
+ even < team->nthreads. */
+ unsigned int task_running_count;
int work_share_cancelled;
int team_cancelled;
--- libgomp/team.c.jj 2013-10-04 18:44:31.000000000 +0200
+++ libgomp/team.c 2013-10-08 19:03:25.067640468 +0200
@@ -172,6 +172,7 @@ gomp_new_team (unsigned nthreads)
gomp_mutex_init (&team->task_lock);
team->task_queue = NULL;
team->task_count = 0;
+ team->task_queued_count = 0;
team->task_running_count = 0;
team->work_share_cancelled = 0;
team->team_cancelled = 0;
--- libgomp/task.c.jj 2013-09-27 12:04:12.000000000 +0200
+++ libgomp/task.c 2013-10-08 19:50:46.057284830 +0200
@@ -407,6 +407,7 @@ GOMP_task (void (*fn) (void *), void *da
team->task_queue = task;
}
++team->task_count;
+ ++team->task_queued_count;
gomp_team_barrier_set_task_pending (&team->barrier);
do_wake = team->task_running_count + !parent->in_tied_task
< team->nthreads;
@@ -434,6 +435,8 @@ gomp_task_run_pre (struct gomp_task *chi
team->task_queue = NULL;
}
child_task->kind = GOMP_TASK_TIED;
+ if (--team->task_queued_count == 0)
+ gomp_team_barrier_clear_task_pending (&team->barrier);
if ((gomp_team_barrier_cancelled (&team->barrier)
|| (taskgroup && taskgroup->cancelled))
&& !child_task->copy_ctors_done)
@@ -538,6 +541,7 @@ gomp_task_run_post_handle_dependers (str
team->task_queue = task;
}
++team->task_count;
+ ++team->task_queued_count;
++ret;
}
free (child_task->dependers);
@@ -670,8 +674,6 @@ gomp_barrier_handle_tasks (gomp_barrier_
}
team->task_running_count++;
child_task->in_tied_task = true;
- if (team->task_count == team->task_running_count)
- gomp_team_barrier_clear_task_pending (&team->barrier);
}
gomp_mutex_unlock (&team->task_lock);
if (do_wake)
Jakub