This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1


On 07/28/2014 10:02 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
>> This patch is the first step to enabling parallel reductions in openacc.
> 
> I think I have found one issue in this code -- but please verify that my
> understanding of reductions is correct.  Namely:
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> +/* Helper function to finalize local data for the reduction arrays. The
>> +   reduction array needs to be reduced to the original reduction variable.
>> +   FIXME: This function assumes that there are vector_length threads in
>> +   total.  Also, it assumes that there are at least vector_length iterations
>> +   in the for loop.  */
>> +
>> +static void
>> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
>> +			 omp_context *ctx)
>> +{
>> +  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
>> +
>> +  tree c, var, array, loop_header, loop_body, loop_exit;
>> +  gimple stmt;
>> +
>> +  /* Create for loop.
>> +
>> +     let var = the original reduction variable
>> +     let array = reduction variable array
>> +
>> +     var = array[0]
>> +     for (i = 1; i < nthreads; i++)
>> +       var op= array[i]
>> + */
> 
> This should also consider the reduction variable's original value.  Test
> case (which does the expected thing if modified for OpenMP):
> 
>     #include <stdlib.h>
>     
>     int
>     main(void)
>     {
>     #define I 5
>     #define N 11
>     #define A 8
>     
>       int a = A;
>       int s = I;
>     
>     #pragma acc parallel vector_length(N)
>       {
>         int i;
>     #pragma acc loop reduction(+:s)
>         for (i = 0; i < N; ++i)
>           s += a;
>       }
>     
>       if (s != I + N * A)
>         abort ();
>     
>       return 0;
>     }
> 
> OK to check in the following?

Reductions can be specified with both the parallel and loop constructs.
According to section 2.5.11 in the opacc spec, a reduction in a parallel
construct should behave as you described:

	At the end of the region, the values for each gang are combined
	using the reduction operator, and the result combined with the
	value of the original variable and stored in the original
	variable.

However,in section 2.7.11, a reduction in a loop construct behaves as
follows:

	At the end of the loop, the values for each thread are combined
	using the specified reduction operator, and the result stored
	in the original variable at the end of the parallel or kernels 	
	region.

The parallel reduction behavior does make more sense though. I'll ask
the openacc gurus if there's a typo in section 2.7.11. It does refer to
parallel reduction.

Thanks,
Cesar


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]