[PATCH][libgomp/gomp-3_0-branch] Support for OpenMP in user threads, and pooling for nested threads

Jakub Jelinek jakub@redhat.com
Wed Apr 30 18:06:00 GMT 2008


On Tue, Apr 29, 2008 at 03:45:47PM +0200, Johannes Singler wrote:
> That's true, for the parallel sorters, we can move to tasks. But in that
> case, we would like to nest parallel regions inside tasks. (Or is that
> forbidden?)

It is not forbidden, but exactly why would you want to do that?  Looking
at quicksort.h, the best is to create one #pragma omp parallel num_threads (num_threads)
region in parallel_sort_qs (why do you call omp_set_num_threads when you
could just use num_threads clause?) and then run that parallel_sort_qs_conquer
inside of #pragma omp single nowait.
After the parallel_sort_qs_divide call you instead of #pragma omp parallel
sections just have:
#pragma omp task
  parallel_sort_qs_conquer(begin, begin + split, comp);
#pragma omp task
  parallel_sort_qs_conquer(begin + split, end, comp);

and let the libgomp tasking scheduler do its job.  If the recursion
in the algorithm needed some merge work afterwards rather than divide work
before the recursion, this would be harder to do with tasks.

Anyway, with the current gomp branch code the above might be too expensive,
because whenever some firstprivate argument needs constructing, libgomp
creates the stack for the task right away, uses makecontext and switches to
the newly created task's context and runs the constructors and after
GOMP_task_start () call switches back to the creating thread.
For firstprivate clauses where no construction is needed it is ok to copy
the variables into a temporary buffer, not allocate any stack for the task
and in most cases avoid using swapcontext (ok, there are untied tasks).
There is a workaround - make the C++ classes shared and pass around
some integer in firstprivate.

I wonder whether instead of running the initial part of the outlined
function for the task and stopping at GOMP_task_start () when something
needs copy-ctors we couldn't outline another helper function which would
run the constructors inside of some structure (in a buffer allocated at task
creation time) and map such firstprivate arguments not to automatic
variables in the task function, but to fields of that structure.

Anyway, guess I first need to finish the tasking libgomp support somehow and
then this can be changed afterwards.

> >I'd say most OpenMP programs will just use non-nested parallels,
> 
> I'm pretty sure that will change with even more cores and the task
> construct being used more widely.

I'm not convinced, but really, if we don't slow down the non-nested parallel
case, I'm open for speedups for the nested cases.

> What do you mean? One global pool, or one per user thread?

We need one pool per non-nested thread (created when non-nested thread first
encounters pragma omp parallel and destroyed using pthread_key destructor),
then we could have pools for nested teams that the nested contexts can just
pick.  And then there can be a global dock for threads if we have overall
less threads than CPUs and we park them for cases where num_threads
increases somewhere or a new pool needs to be created.

	Jakub



More information about the Gcc-patches mailing list