This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [openacc] tile, independent, default, private and firstprivate support in c/++


On 11/05/15 12:01, Thomas Schwinge wrote:

On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis <cesar@codesourcery.com> wrote:
On 11/05/2015 04:14 AM, Thomas Schwinge wrote:

Sorry, I must have mis-phrased it. The spec is unclear here. There are
three possible ways to interpret 'acc parallel loop reduction':

   1. acc parallel reduction
      acc loop

This is what you propose in your patch, but I don't think that makes
sense, or does it?  I'm happy to learn otherwise, but in my current
understanding, a reduction clause needs to be attached (at least) to the
innermost construct where reductions are to be processed.  (Let's also

Correct, the  above interpretation must be wrong.

consider multi-level gang/worker/vector loops/reductions.)  So, either:

   2. acc parallel
      acc loop reduction

... this, or even this:

   3. acc parallel reduction
      acc loop reduction

..., which I'm not sure what the execution model implementation requires.
(Nathan?)

interpretation #2 is sufficient, I think. However, both are lacking a 'copy (reduction_var)', clause as otherwise there's nothing changing the default data attribute of 'firstprivate' (working on that patch). Perhaps 'reduction' on 'parallel' is meant to imply that (because that's what makes sense), but the std doesn't say it.

In summary it's probably safe to implement interpretation #3. That way we can implement the hypothesis that reductions at the outer construct imply copy.

And while we're at it: the very same question also applies to the private
clause, which -- contrary to all other (as far as I remember) clauses --
also is applicable to both the parallel and loop constructs:

     #pragma acc parallel loop private([...])

... is to be decomposed into which of the following:

     #pragma acc parallel private([...])
     #pragma acc loop

     #pragma acc parallel
     #pragma acc loop private([...])

     #pragma acc parallel private([...])
     #pragma acc loop private([...])

(There is no private clause allowed to be specified with the kernels
construct for what it's worth, but that doesn't mean we couldn't use it
internally, of course, if so required.)

I think interpretation #2 or #3 make sense, and I suspect result in the same emitted code.

nathan

--
Nathan Sidwell


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]