This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp3] Real OpenMP 3.0 tasking support


On Wed, Jun 11, 2008 at 05:04:59PM +0200, Johannes Singler wrote:
> >Please see the libgomp.c/sort-1.c and libgomp.fortran/strassen.f90
> >testcases/benchmarks to see tasking in action.
> 
> I looked into sort-1.c. Some results on a 2xquadcore machine:
> 
> singler@i10pc121:~/scratch> OMP_NUM_THREADS=1 ./a.out
> Threads: 1
> sort1: 3.11416
> sort2: 3.11768
> sort3: 3.12816
> singler@i10pc121:~/scratch> OMP_NUM_THREADS=2 ./a.out
> Threads: 2
> sort1: 1.61807
> sort2: 1.81284
> sort3: 1.62286
> singler@i10pc121:~/scratch> OMP_NUM_THREADS=4 ./a.out
> Threads: 4
> sort1: 0.905296
> sort2: 2.50576
> sort3: 1.62746
> singler@i10pc121:~/scratch> OMP_NUM_THREADS=8 ./a.out
> Threads: 8
> sort1: 0.546678
> sort2: 1.35682
> sort3: 1.64447
> 
> sort1 scales nicely, while sort2 has some jerky behavior (probably 
> because it is completely unbalanced). sort3 (the one using the task 
> construct) seems to be stuck at a speedup of a bit less than 2. This is 
> why I suspect that only two threads do actual work (possibly leaving the 
> last block to sort to the master thread).

Yeah, you are right.  Fix below.  Here is what I get now on quadcore:

for j in sort-1.exe strassen.exe; do for i in 1 2 3 4; do echo OMP_NUM_THREADS=$i ./$j 10000000; OMP_NUM_THREADS=$i ./$j 10000000; done; done
OMP_NUM_THREADS=1 ./sort-1.exe 10000000
Threads: 1
sort1: 1.13085
sort2: 1.14096
sort3: 1.1561
OMP_NUM_THREADS=2 ./sort-1.exe 10000000
Threads: 2
sort1: 0.59218
sort2: 0.666475
sort3: 0.604476
OMP_NUM_THREADS=3 ./sort-1.exe 10000000
Threads: 3
sort1: 0.420473
sort2: 0.810649
sort3: 0.430028
OMP_NUM_THREADS=4 ./sort-1.exe 10000000
Threads: 4
sort1: 0.338535
sort2: 0.656172
sort3: 0.354156
OMP_NUM_THREADS=1 ./strassen.exe 10000000
 Time for matmul      =   1.459543
 Time for Strassen    =   0.926005
 Time for Strassen MP =   0.900968
OMP_NUM_THREADS=2 ./strassen.exe 10000000
 Time for matmul      =   1.479224
 Time for Strassen    =   1.173954
 Time for Strassen MP =   0.500256
OMP_NUM_THREADS=3 ./strassen.exe 10000000
 Time for matmul      =   1.471209
 Time for Strassen    =   0.916251
 Time for Strassen MP =   0.396559
OMP_NUM_THREADS=4 ./strassen.exe 10000000
 Time for matmul      =   1.476634
 Time for Strassen    =   1.173638
 Time for Strassen MP =   0.366120

The fix for this bug is just calling gomp_team_barrier_set_task_pending
unconditionally, as whenever we bump task_count we have some waiting tasks.
Without it, you are right and only 2 threads were doing something
- the pending bit was set the first time GOMP_TASK_WAITING was created,
which woke up one thread that started running explicit tasks.  But
after it picked up first task, it cleared the pending bit (as at that
point the number of running tasks was equal to number of all explicit
tasks) and nothing ever set it again.  The first thread then called
the task dispatch routine when reaching the barrier as the last thread.

The rest of the changes are just a small optimization, if one thread is
running implicit task or if(0) tasks from the implicit task, but all
other threads in the team are already running some GOMP_TASK_TIED
task, there is no need to call futex_wake.

2008-06-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_task): Add in_tied_task field.
	* task.c (gomp_init_task): Initialize it.
	(GOMP_task): Likewise.  Call gomp_team_barrier_set_task_pending
	unconditionally.  Don't call gomp_team_barrier_wake if
	current task is implicit or if(0) from implicit and number of
	running tasks is equal to nthreads - 1.

--- libgomp/libgomp.h.jj	2008-06-06 12:38:07.000000000 +0200
+++ libgomp/libgomp.h	2008-06-11 18:53:30.000000000 +0200
@@ -253,6 +253,7 @@ struct gomp_task
   void *fn_data;
   enum gomp_task_kind kind;
   bool in_taskwait;
+  bool in_tied_task;
   gomp_sem_t taskwait_sem;
 };
 
--- libgomp/task.c.jj	2008-06-06 12:38:08.000000000 +0200
+++ libgomp/task.c	2008-06-11 18:57:03.000000000 +0200
@@ -43,6 +43,7 @@ gomp_init_task (struct gomp_task *task, 
   task->icv = *prev_icv;
   task->kind = GOMP_TASK_IMPLICIT;
   task->in_taskwait = false;
+  task->in_tied_task = false;
   task->children = NULL;
   gomp_sem_init (&task->taskwait_sem, 0);
 }
@@ -103,6 +104,7 @@ GOMP_task (void (*fn) (void *), void *da
 
       gomp_init_task (&task, thr->task, gomp_icv (false));
       task.kind = GOMP_TASK_IFFALSE;
+      task.in_tied_task = thr->task->in_tied_task;
       thr->task = &task;
       if (__builtin_expect (cpyfn != NULL, 0))
 	{
@@ -134,6 +136,7 @@ GOMP_task (void (*fn) (void *), void *da
 		      & ~(uintptr_t) (arg_align - 1));
       gomp_init_task (task, parent, gomp_icv (false));
       task->kind = GOMP_TASK_IFFALSE;
+      task->in_tied_task = parent->in_tied_task;
       thr->task = task;
       if (cpyfn)
 	cpyfn (arg, data);
@@ -143,6 +146,7 @@ GOMP_task (void (*fn) (void *), void *da
       task->kind = GOMP_TASK_WAITING;
       task->fn = fn;
       task->fn_data = arg;
+      task->in_tied_task = true;
       gomp_mutex_lock (&team->task_lock);
       if (parent->children)
 	{
@@ -170,9 +174,10 @@ GOMP_task (void (*fn) (void *), void *da
 	  task->prev_queue = task;
 	  team->task_queue = task;
 	}
-      if (team->task_count++ == 0)
-	gomp_team_barrier_set_task_pending (&team->barrier);
-      do_wake = team->task_running_count < team->nthreads;
+      ++team->task_count;
+      gomp_team_barrier_set_task_pending (&team->barrier);
+      do_wake = team->task_running_count + !parent->in_tied_task
+		< team->nthreads;
       gomp_mutex_unlock (&team->task_lock);
       if (do_wake)
 	gomp_team_barrier_wake (&team->barrier, 1);


	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]