This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[libgomp] task scheduler rewrite and task priorities implementation


Yo!

As promised, here is the work implementing a new API for the task scheduler, rewriting the scheduler to fit into this new API, and implementing the task priorities that are in OpenMP > 4.1. There are also lots of cleanups and documentation.

For the record, task priorities allow a priority clause for tasks such that higher priority tasks are preferred.

	#pragma omp task priority(999)

First, this patchset is pretty invasive. My original idea was to just insert the tasks in order into the double linked lists, but as you and rth pointed out, this wouldn't scale well. So, instead of a 20 line patch, you get 2900 :-). But, you get a clean interface and lots and lots of cleanups.

There were a million ways of doing this, each with its own trade-off, but ultimately I had to pick one and go with it. I've tried to keep the common case inlined and fast. This is the case of only 0-priority items in the queues. It should behave exactly like before, albeit with a comparison to see if we're the common case or not.

There are many optimizations we could do:
	- Move the allocation of priority nodes outside of the lock.
	- Create a cache of priority nodes instead of allocating each
	  time.
	- Thread the splay tree with additional links to make
	  accessing parent, or max, or whatever threads even quicker.
	- Move everything into task.c so we could inline everything.
	  (I'd prefer not to, and keep things in priority_queue.[ch]).
	- Move part of the next/prev priority node links into gomp_task
	  (Not sure if this would work without making a mess of things).
	- etc etc.

The list is endless. We could micro optimize this to death. If at all possible, could we concentrate on agreeing on the general implementation, making the common case fast, and perhaps micro-optimizing as followups?

The only FIXME I have is your suggested use of offsetof() for the gomp_task pointer (data) in priority_node. I really don't see anyway of doing this. Suggestions welcome.

Everything is working. Tested with no regressions on a 56-core x86-64 Linux machine with:

	OMP_NUM_THREADS=[2 4 5 16 56]

Committed as obvious.  I'm going on vacation.

Aldy

p.s. Just kidding ;-).

Attachment: curr
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]