Bug 70895

Summary: OpenACC: loop reduction does not work. Output is zero.
Product: gcc Reporter: Ralph Trenkler <ralph.trenkler>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: chunglin.tang, nathan, tschwinge, vries
Priority: P3 Keywords: openacc
Version: 6.1.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2016-05-02 00:00:00
Attachments: compiler configure command line, compiler call, program in f90, *.i-file

Description Ralph Trenkler 2016-05-01 19:17:24 UTC
Created attachment 38389 [details]
compiler configure command line, compiler call, program in f90, *.i-file

Problem with OpenACC: The output of the attached fortran 90 program "acc-test-04.f90" is zero but not "pi" (3.14...), as it should be. The compiler version is 6.1.0.
Comment 1 Thomas Schwinge 2016-05-02 10:56:34 UTC
(In reply to Ralph Trenkler from comment #0)
> Created attachment 38389 [details]
> compiler configure command line, compiler call, program in f90, *.i-file

Thanks for that!

> Problem with OpenACC: The output of the attached fortran 90 program
> "acc-test-04.f90" is zero but not "pi" (3.14...), as it should be. The
> compiler version is 6.1.0.

There is no "copy(pi)" or "reduction(+:pi)" clause on the outer "acc parallel" construct, so the offloaded computation of "pi" will not be copied back; hence the result of zero.  (Cesar CCed; please shout if I'm interpreting the OpenACC 2.0a specification wrongly.)

Also, due to the gang/worker/vector loop scheduling algorithm that's currently used, you'll want to add explicit "gang worker vector" clauses to the one-level "acc loop" construct.  (Nathan CCed for your information.)

Do you have any further questions?
Comment 2 cesar 2016-05-02 14:27:58 UTC
Thomas is correct. Note that when gcc-6.2 is released you should be able to replace

   !$acc parallel vector_length(vl)
   !$acc loop reduction(+:pi) private(t) 

with

   !$acc parallel loop reduction(+:pi) vector_length(vl)

and that will automatically add a copy clause for 'pi'. See PR70626 for more details.

Furthermore, as Thomas mentioned, gcc-6 does not automatically assign parallelism to loops inside parallel regions. Consequently, you need to explicitly use num_gangs, num_workers and vector_length to determine the amount of parallelism and gang, worker and vector to partition the acc loops accordingly. Also note that only nvptx targets are accelerated in gcc-6; the host code runs in a single thread.
Comment 3 Thomas Schwinge 2016-05-02 14:58:21 UTC
(In reply to cesar from comment #2)
> Furthermore, as Thomas mentioned, gcc-6 does not automatically assign
> parallelism to loops inside parallel regions.

:-) GCC 6.1 does do that that.

> Consequently, you need to
> explicitly use num_gangs, num_workers and vector_length to determine the
> amount of parallelism and gang, worker and vector to partition the acc loops
> accordingly.

GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers, and a vector length of 32 (so, you don't need to specify "num_gangs() num_workers() vector_length()" clauses).  What it will not do (and what was the point of my earlier note in #c1), is assign more than one of OpenACC's parallelism levels (gang, worker, vector) to a one-level loop constructs, which is why you'll want to specify "gang worker vector" clauses for the loop construct.
Comment 4 cesar 2016-05-02 15:04:53 UTC
(In reply to Thomas Schwinge from comment #3)
> (In reply to cesar from comment #2)

> > Consequently, you need to
> > explicitly use num_gangs, num_workers and vector_length to determine the
> > amount of parallelism and gang, worker and vector to partition the acc loops
> > accordingly.
> 
> GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers,
> and a vector length of 32 (so, you don't need to specify "num_gangs()
> num_workers() vector_length()" clauses).  What it will not do (and what was
> the point of my earlier note in #c1), is assign more than one of OpenACC's
> parallelism levels (gang, worker, vector) to a one-level loop constructs,
> which is why you'll want to specify "gang worker vector" clauses for the
> loop construct.

Thomas is correct. I've been focusing too much on the front ends and not the loop partitioning infrastructure. Sorry for the noise.
Comment 5 Thomas Schwinge 2016-08-02 21:57:54 UTC
We've gotten that aspect of the OpenACC specification clarified as follows (comments inside parens are mine): "A scalar variable referenced in a parallel construct without predetermined (already implemented) or explicitly determined data attributes (already implemented) will have its data attributes implicitly determined as described in this paragraph. If the scalar variable appears in a reduction clause on the parallel construct (already implemented), or in a reduction clause on a loop construct in the parallel construct (relevant for this PR70895), or is assigned in an atomic construct in the parallel construct (another change), the scalar variable is treated as if it appeared in a copy clause for the parallel construct. In all other cases, the scalar variable is treated as if it appeared in a firstprivate clause for the parallel construct (already implemented)."

Chung-Lin, please sanity-check that this will make work Ralph's example program as attached, and then implement these semantics for trunk, gcc-6-branch, gomp-4_0-branch (gcc/gimplify.c; hopefully the same patch will do for all three branches; please shout if not).  In addition to Ralph's example program as attached, also add ample testsuite coverage, for various nesting of OpenACC constructs, for all of C, C++, and Fortran.

If the change is non-trivial to detect that a "scalar variable [...] is assigned in an atomic construct" ("another change" above), please leave that aspect aside for the moment.
Comment 6 Chung-Lin Tang 2016-08-18 14:46:51 UTC
Author: cltang
Date: Thu Aug 18 14:46:19 2016
New Revision: 239576

URL: https://gcc.gnu.org/viewcvs?rev=239576&root=gcc&view=rev
Log:
2016-08-18  Chung-Lin Tang  <cltang@codesourcery.com>

	PR middle-end/70895
	gcc/
	* gimplify.c (omp_add_variable): Adjust/add variable mapping on
	enclosing parallel construct for reduction variables on OpenACC loop
	directives.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test.
	* c-c++-common/goacc/reduction-1.c: Likewise.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
	firstprivate clauses.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit
	copy clauses.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/c-c++-common/goacc/reduction-1.c
    trunk/gcc/testsuite/c-c++-common/goacc/reduction-2.c
    trunk/gcc/testsuite/c-c++-common/goacc/reduction-3.c
    trunk/gcc/testsuite/c-c++-common/goacc/reduction-4.c
    trunk/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
    trunk/libgomp/ChangeLog
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c
    trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c
    trunk/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
    trunk/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
Comment 7 Chung-Lin Tang 2016-08-18 14:56:43 UTC
Author: cltang
Date: Thu Aug 18 14:56:11 2016
New Revision: 239577

URL: https://gcc.gnu.org/viewcvs?rev=239577&root=gcc&view=rev
Log:
Backport from mainline:

2016-08-18  Chung-Lin Tang  <cltang@codesourcery.com>

	PR middle-end/70895
	gcc/
	* gimplify.c (omp_add_variable): Adjust/add variable mapping on
	enclosing parallel construct for reduction variables on OpenACC loop
	directives.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test.
	* c-c++-common/goacc/reduction-1.c: Likewise.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
	firstprivate clauses.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit
	copy clauses.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.


Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/gimplify.c
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-1.c
    branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-2.c
    branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-3.c
    branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-4.c
    branches/gcc-6-branch/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
    branches/gcc-6-branch/libgomp/ChangeLog
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
    branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
Comment 8 Chung-Lin Tang 2016-08-18 15:02:22 UTC
Author: cltang
Date: Thu Aug 18 15:01:50 2016
New Revision: 239579

URL: https://gcc.gnu.org/viewcvs?rev=239579&root=gcc&view=rev
Log:
Backport from mainline:

2016-08-18  Chung-Lin Tang  <cltang@codesourcery.com>

	PR middle-end/70895
	gcc/
	* gimplify.c (omp_add_variable): Adjust/add variable mapping on
	enclosing parallel construct for reduction variables on OpenACC loop
	directives.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test.
	* c-c++-common/goacc/reduction-1.c: Likewise.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
	firstprivate clauses.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit
	copy clauses.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.


Modified:
    branches/gomp-4_0-branch/gcc/gimplify.c
    branches/gomp-4_0-branch/gcc/testsuite/ChangeLog.gomp
    branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-1.c
    branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-2.c
    branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-3.c
    branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-4.c
    branches/gomp-4_0-branch/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
    branches/gomp-4_0-branch/libgomp/ChangeLog.gomp
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
    branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
Comment 9 Tom de Vries 2018-01-23 16:43:12 UTC
Chung-Lin, is there something that remains to be done here, or can this be closed?
Comment 10 Thomas Schwinge 2018-03-02 15:18:38 UTC
(In reply to Tom de Vries from comment #9)
> is there something that remains to be done here, or can this be closed?

As it should turn out, what we implemented is still not correctly modelling the intentions of OpenACC, so there are further changes required -- which we need to propagate from our development branch into trunk (and possibly release branches, too).