[gomp4] assign unused gwv clauses to auto/independent parallel acc loops

Cesar Philippidis cesar@codesourcery.com
Wed Sep 9 16:39:00 GMT 2015


This patch assigns any available gang, worker or vector level
parallelism to auto and independent loops inside acc parallel regions.
This is done in omplower for two reasons:

  1. At the moment, it's too late to do this in oacc-xform because
     ompexpand is responsible for partitioning loops. This will likely
     get revisited later when we add support for kernels.

  2. omplower already has several tree walkers to scan for nesting
     errors and data mappings, etc. This is just another tree walk
     for acc parallel regions.

There are a couple of problems with this patch. First, I make no attempt
to determine the optimal work-sharing clause for a particular loop.
Instead, I assign the lowest (i.e. gang before worker before vector)
available parallelism to the outermost loop. At this point, that's
better than nothing. The second issue is, while adding clauses does let
ompexpand partition acc loops, we are not setting default values for
num_gangs, num_workers and vector_length yet (although we do set
vector_length to 32 when num_workers != 1).

It should be noted that this optimization only applies to acc loops
inside parallel regions. I probably could expand it to acc loops inside
acc routines, but technically acc routines are only supposed to have one
level of parallelism anyway. It also probably could be expanded to
handle independent loops inside kernels regions too.

Is this patch ok for gomp-4_0-branch or should I hold off until the
kernels situation gets resolved?

Cesar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: auto-independent-omplow.diff
Type: text/x-patch
Size: 15450 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150909/402cffe1/attachment.bin>


More information about the Gcc-patches mailing list