Summary: | OpenACC: loop reduction does not work. Output is zero. | ||
---|---|---|---|
Product: | gcc | Reporter: | Ralph Trenkler <ralph.trenkler> |
Component: | middle-end | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | chunglin.tang, nathan, tschwinge, vries |
Priority: | P3 | Keywords: | openacc |
Version: | 6.1.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2016-05-02 00:00:00 | |
Attachments: | compiler configure command line, compiler call, program in f90, *.i-file |
(In reply to Ralph Trenkler from comment #0) > Created attachment 38389 [details] > compiler configure command line, compiler call, program in f90, *.i-file Thanks for that! > Problem with OpenACC: The output of the attached fortran 90 program > "acc-test-04.f90" is zero but not "pi" (3.14...), as it should be. The > compiler version is 6.1.0. There is no "copy(pi)" or "reduction(+:pi)" clause on the outer "acc parallel" construct, so the offloaded computation of "pi" will not be copied back; hence the result of zero. (Cesar CCed; please shout if I'm interpreting the OpenACC 2.0a specification wrongly.) Also, due to the gang/worker/vector loop scheduling algorithm that's currently used, you'll want to add explicit "gang worker vector" clauses to the one-level "acc loop" construct. (Nathan CCed for your information.) Do you have any further questions? Thomas is correct. Note that when gcc-6.2 is released you should be able to replace !$acc parallel vector_length(vl) !$acc loop reduction(+:pi) private(t) with !$acc parallel loop reduction(+:pi) vector_length(vl) and that will automatically add a copy clause for 'pi'. See PR70626 for more details. Furthermore, as Thomas mentioned, gcc-6 does not automatically assign parallelism to loops inside parallel regions. Consequently, you need to explicitly use num_gangs, num_workers and vector_length to determine the amount of parallelism and gang, worker and vector to partition the acc loops accordingly. Also note that only nvptx targets are accelerated in gcc-6; the host code runs in a single thread. (In reply to cesar from comment #2) > Furthermore, as Thomas mentioned, gcc-6 does not automatically assign > parallelism to loops inside parallel regions. :-) GCC 6.1 does do that that. > Consequently, you need to > explicitly use num_gangs, num_workers and vector_length to determine the > amount of parallelism and gang, worker and vector to partition the acc loops > accordingly. GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers, and a vector length of 32 (so, you don't need to specify "num_gangs() num_workers() vector_length()" clauses). What it will not do (and what was the point of my earlier note in #c1), is assign more than one of OpenACC's parallelism levels (gang, worker, vector) to a one-level loop constructs, which is why you'll want to specify "gang worker vector" clauses for the loop construct. (In reply to Thomas Schwinge from comment #3) > (In reply to cesar from comment #2) > > Consequently, you need to > > explicitly use num_gangs, num_workers and vector_length to determine the > > amount of parallelism and gang, worker and vector to partition the acc loops > > accordingly. > > GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers, > and a vector length of 32 (so, you don't need to specify "num_gangs() > num_workers() vector_length()" clauses). What it will not do (and what was > the point of my earlier note in #c1), is assign more than one of OpenACC's > parallelism levels (gang, worker, vector) to a one-level loop constructs, > which is why you'll want to specify "gang worker vector" clauses for the > loop construct. Thomas is correct. I've been focusing too much on the front ends and not the loop partitioning infrastructure. Sorry for the noise. We've gotten that aspect of the OpenACC specification clarified as follows (comments inside parens are mine): "A scalar variable referenced in a parallel construct without predetermined (already implemented) or explicitly determined data attributes (already implemented) will have its data attributes implicitly determined as described in this paragraph. If the scalar variable appears in a reduction clause on the parallel construct (already implemented), or in a reduction clause on a loop construct in the parallel construct (relevant for this PR70895), or is assigned in an atomic construct in the parallel construct (another change), the scalar variable is treated as if it appeared in a copy clause for the parallel construct. In all other cases, the scalar variable is treated as if it appeared in a firstprivate clause for the parallel construct (already implemented)." Chung-Lin, please sanity-check that this will make work Ralph's example program as attached, and then implement these semantics for trunk, gcc-6-branch, gomp-4_0-branch (gcc/gimplify.c; hopefully the same patch will do for all three branches; please shout if not). In addition to Ralph's example program as attached, also add ample testsuite coverage, for various nesting of OpenACC constructs, for all of C, C++, and Fortran. If the change is non-trivial to detect that a "scalar variable [...] is assigned in an atomic construct" ("another change" above), please leave that aspect aside for the moment. Author: cltang Date: Thu Aug 18 14:46:19 2016 New Revision: 239576 URL: https://gcc.gnu.org/viewcvs?rev=239576&root=gcc&view=rev Log: 2016-08-18 Chung-Lin Tang <cltang@codesourcery.com> PR middle-end/70895 gcc/ * gimplify.c (omp_add_variable): Adjust/add variable mapping on enclosing parallel construct for reduction variables on OpenACC loop directives. gcc/testsuite/ * gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/reduction-2.c: Likewise. * c-c++-common/goacc/reduction-3.c: Likewise. * c-c++-common/goacc/reduction-4.c: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit firstprivate clauses. * testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit copy clauses. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/c-c++-common/goacc/reduction-1.c trunk/gcc/testsuite/c-c++-common/goacc/reduction-2.c trunk/gcc/testsuite/c-c++-common/goacc/reduction-3.c trunk/gcc/testsuite/c-c++-common/goacc/reduction-4.c trunk/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 trunk/libgomp/ChangeLog trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c trunk/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 trunk/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 Author: cltang Date: Thu Aug 18 14:56:11 2016 New Revision: 239577 URL: https://gcc.gnu.org/viewcvs?rev=239577&root=gcc&view=rev Log: Backport from mainline: 2016-08-18 Chung-Lin Tang <cltang@codesourcery.com> PR middle-end/70895 gcc/ * gimplify.c (omp_add_variable): Adjust/add variable mapping on enclosing parallel construct for reduction variables on OpenACC loop directives. gcc/testsuite/ * gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/reduction-2.c: Likewise. * c-c++-common/goacc/reduction-3.c: Likewise. * c-c++-common/goacc/reduction-4.c: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit firstprivate clauses. * testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit copy clauses. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise. Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/gimplify.c branches/gcc-6-branch/gcc/testsuite/ChangeLog branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-1.c branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-2.c branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-3.c branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/reduction-4.c branches/gcc-6-branch/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 branches/gcc-6-branch/libgomp/ChangeLog branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 branches/gcc-6-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 Author: cltang Date: Thu Aug 18 15:01:50 2016 New Revision: 239579 URL: https://gcc.gnu.org/viewcvs?rev=239579&root=gcc&view=rev Log: Backport from mainline: 2016-08-18 Chung-Lin Tang <cltang@codesourcery.com> PR middle-end/70895 gcc/ * gimplify.c (omp_add_variable): Adjust/add variable mapping on enclosing parallel construct for reduction variables on OpenACC loop directives. gcc/testsuite/ * gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/reduction-2.c: Likewise. * c-c++-common/goacc/reduction-3.c: Likewise. * c-c++-common/goacc/reduction-4.c: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit firstprivate clauses. * testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit copy clauses. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise. Modified: branches/gomp-4_0-branch/gcc/gimplify.c branches/gomp-4_0-branch/gcc/testsuite/ChangeLog.gomp branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-1.c branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-2.c branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-3.c branches/gomp-4_0-branch/gcc/testsuite/c-c++-common/goacc/reduction-4.c branches/gomp-4_0-branch/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 branches/gomp-4_0-branch/libgomp/ChangeLog.gomp branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 branches/gomp-4_0-branch/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 Chung-Lin, is there something that remains to be done here, or can this be closed? (In reply to Tom de Vries from comment #9) > is there something that remains to be done here, or can this be closed? As it should turn out, what we implemented is still not correctly modelling the intentions of OpenACC, so there are further changes required -- which we need to propagate from our development branch into trunk (and possibly release branches, too). |
Created attachment 38389 [details] compiler configure command line, compiler call, program in f90, *.i-file Problem with OpenACC: The output of the attached fortran 90 program "acc-test-04.f90" is zero but not "pi" (3.14...), as it should be. The compiler version is 6.1.0.