[openacc] tile, independent, default, private and firstprivate support in c/++

Nathan Sidwell nathan@codesourcery.com
Thu Nov 5 17:13:00 GMT 2015


On 11/05/15 12:01, Thomas Schwinge wrote:

> On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis <cesar@codesourcery.com> wrote:
>> On 11/05/2015 04:14 AM, Thomas Schwinge wrote:

>> Sorry, I must have mis-phrased it. The spec is unclear here. There are
>> three possible ways to interpret 'acc parallel loop reduction':
>>
>>    1. acc parallel reduction
>>       acc loop
>
> This is what you propose in your patch, but I don't think that makes
> sense, or does it?  I'm happy to learn otherwise, but in my current
> understanding, a reduction clause needs to be attached (at least) to the
> innermost construct where reductions are to be processed.  (Let's also

Correct, the  above interpretation must be wrong.

> consider multi-level gang/worker/vector loops/reductions.)  So, either:
>
>>    2. acc parallel
>>       acc loop reduction
>
> ... this, or even this:
>
>>    3. acc parallel reduction
>>       acc loop reduction
>
> ..., which I'm not sure what the execution model implementation requires.
> (Nathan?)

interpretation #2 is sufficient, I think. However, both are lacking a 'copy 
(reduction_var)', clause as otherwise there's nothing changing the default data 
attribute of 'firstprivate' (working on that patch).  Perhaps 'reduction' on 
'parallel'  is meant to imply that  (because that's what makes sense), but the 
std doesn't say it.

In summary it's probably safe to implement interpretation #3.  That way we can 
implement the hypothesis that reductions at the outer construct imply copy.

> And while we're at it: the very same question also applies to the private
> clause, which -- contrary to all other (as far as I remember) clauses --
> also is applicable to both the parallel and loop constructs:
>
>      #pragma acc parallel loop private([...])
>
> ... is to be decomposed into which of the following:
>
>      #pragma acc parallel private([...])
>      #pragma acc loop
>
>      #pragma acc parallel
>      #pragma acc loop private([...])
>
>      #pragma acc parallel private([...])
>      #pragma acc loop private([...])
>
> (There is no private clause allowed to be specified with the kernels
> construct for what it's worth, but that doesn't mean we couldn't use it
> internally, of course, if so required.)

I think interpretation #2 or #3 make sense, and I suspect result in the same 
emitted code.

nathan

-- 
Nathan Sidwell



More information about the Gcc-patches mailing list