This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
- From: Cesar Philippidis <cesar_philippidis at mentor dot com>
- To: Thomas Schwinge <thomas at codesourcery dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 28 Jul 2014 10:36:03 -0700
- Subject: Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
- Authentication-results: sourceware.org; auth=none
- References: <53B9D780 dot 4040904 at mentor dot com> <87wqaxh0x7 dot fsf at kepler dot schwinge dot homeip dot net>
On 07/28/2014 10:02 AM, Thomas Schwinge wrote:
> Hi Cesar!
>
> On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
>> This patch is the first step to enabling parallel reductions in openacc.
>
> I think I have found one issue in this code -- but please verify that my
> understanding of reductions is correct. Namely:
>
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> +/* Helper function to finalize local data for the reduction arrays. The
>> + reduction array needs to be reduced to the original reduction variable.
>> + FIXME: This function assumes that there are vector_length threads in
>> + total. Also, it assumes that there are at least vector_length iterations
>> + in the for loop. */
>> +
>> +static void
>> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
>> + omp_context *ctx)
>> +{
>> + gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
>> +
>> + tree c, var, array, loop_header, loop_body, loop_exit;
>> + gimple stmt;
>> +
>> + /* Create for loop.
>> +
>> + let var = the original reduction variable
>> + let array = reduction variable array
>> +
>> + var = array[0]
>> + for (i = 1; i < nthreads; i++)
>> + var op= array[i]
>> + */
>
> This should also consider the reduction variable's original value. Test
> case (which does the expected thing if modified for OpenMP):
>
> #include <stdlib.h>
>
> int
> main(void)
> {
> #define I 5
> #define N 11
> #define A 8
>
> int a = A;
> int s = I;
>
> #pragma acc parallel vector_length(N)
> {
> int i;
> #pragma acc loop reduction(+:s)
> for (i = 0; i < N; ++i)
> s += a;
> }
>
> if (s != I + N * A)
> abort ();
>
> return 0;
> }
>
> OK to check in the following?
Reductions can be specified with both the parallel and loop constructs.
According to section 2.5.11 in the opacc spec, a reduction in a parallel
construct should behave as you described:
At the end of the region, the values for each gang are combined
using the reduction operator, and the result combined with the
value of the original variable and stored in the original
variable.
However,in section 2.7.11, a reduction in a loop construct behaves as
follows:
At the end of the loop, the values for each thread are combined
using the specified reduction operator, and the result stored
in the original variable at the end of the parallel or kernels
region.
The parallel reduction behavior does make more sense though. I'll ask
the openacc gurus if there's a typo in section 2.7.11. It does refer to
parallel reduction.
Thanks,
Cesar