Gang-level reductions in OpenACC routine

Thomas Schwinge thomas@codesourcery.com
Tue Nov 30 12:20:01 GMT 2021


Hi!

On 2020-03-19T17:12:02+0000, Kwok Cheung Yeung <kwok_yeung@mentor.com> wrote:
> On 18/03/2020 11:34 pm, Kwok Cheung Yeung wrote:
>> I was looking at the regression in c-c++-common/goacc/nested-reductions.c, which
>> has the following excess warnings in acc_routine:
>>
>> /scratch/kyeung/openacc/og10/nvidia/src/gcc-og10-branch/gcc/testsuite/c-c++-common/goacc/nested-reductions.c:360:15:
>> warning: insufficient partitioning available to parallelize loop
>> /scratch/kyeung/openacc/og10/nvidia/src/gcc-og10-branch/gcc/testsuite/c-c++-common/goacc/nested-reductions.c:369:17:
>> warning: insufficient partitioning available to parallelize loop
>> /scratch/kyeung/openacc/og10/nvidia/src/gcc-og10-branch/gcc/testsuite/c-c++-common/goacc/nested-reductions.c:375:17:
>> warning: insufficient partitioning available to parallelize loop
>> /scratch/kyeung/openacc/og10/nvidia/src/gcc-og10-branch/gcc/testsuite/c-c++-common/goacc/nested-reductions.c:320:6:
>> warning: region is gang partitioned but does not contain gang partitioned code
>>
>> It is caused by the following code in the patch 'Make OpenACC orphan
>> gang reductions errors"] (originally by Cesar):
>>
>> +      /* Orphan reductions cannot have gang partitioning.  */
>> +      if ((loop->flags & OLF_REDUCTION)
>> +         && oacc_get_fn_attrib (current_function_decl)
>> +         && !lookup_attribute ("omp target entrypoint",
>> +                               DECL_ATTRIBUTES (current_function_decl)))
>> +       this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);

Right.  However, that code doesn't implement what the OpenACC
specification actually says.  ;-)

>> The problem is that acc_routine is not declared with 'omp target entrypoint',
>> but it does have '#pragma acc_routine gang' applied to it. From what I
>> understand of the OpenACC spec, this means that the function can be called from
>> the accelerator, and may contain a loop at the gang-level.

Right.

>> So is allowing gang
>> reductions for functions with '#pragma acc_routine gang' (but not for worker or
>> vector) the right thing to do here?

No, that's precisely the thing that the compiler needs to diagnose.  See
OpenACC 2.6, 2.9.11. "reduction clause", which places a restriction such
that "The 'reduction' clause may not be specified on an orphaned 'loop'
construct with the 'gang' clause, or on an orphaned 'loop' construct that
will generate gang parallelism in a procedure that is compiled with the
'routine gang' clause."  */

Cesar apparently read the last part to mean that inside a 'routine gang',
a 'loop reduction' with implicit 'gang' level of parallelism should be
demoted to 'worker' level of parallelism.  But what actually is meant,
simply, is that in such cases we raise the same "gang reduction on an
orphan loop" error diagnostic that we raise for explicit 'gang' level of
parallelism.  (..., and adjust our offending test cases).

Now, re your og10 etc. change:

>     Allow gang-level reductions in OpenACC routines with gang-level parallelism

>       gcc/
>       * omp-offload.c (oacc_loop_auto_partitions): Check for 'omp declare
>       target' attributes with a gang clause attached.

> --- a/gcc/omp-offload.c
> +++ b/gcc/omp-offload.c
> @@ -1374,14 +1374,32 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,

>        /* Orphan reductions cannot have gang partitioning.  */
>        if ((loop->flags & OLF_REDUCTION)
> -       && oacc_get_fn_attrib (current_function_decl)
> -       && !lookup_attribute ("omp target entrypoint",
> +       && oacc_get_fn_attrib (current_function_decl))
> +     {
> +       bool gang_p = false;
> +       tree attr
> +           = lookup_attribute ("omp declare target",
> +                               DECL_ATTRIBUTES (current_function_decl));
> +
> +       if (attr)
> +         for (tree c = TREE_VALUE (attr); c; c = OMP_CLAUSE_CHAIN (c))
> +           if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_GANG)
> +             {
> +               gang_p = true;
> +               break;
> +             }
> +
> +       if (lookup_attribute ("omp target entrypoint",
>                               DECL_ATTRIBUTES (current_function_decl)))
> -     this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);
> +         gang_p = true;
> +
> +       if (!gang_p)
> +         this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);
> +     }

..., I don't understand what exactly that is meant to do: as far as I can
tell, we always get 'gang_p == true' from that code?

Instead, I've pushed to master branch
commit 365cd5f9ba812c389b404a53d99ab5dded5097f4 '[OpenACC] Remove
erroneous "Orphan reductions cannot have gang partitioning" handling',
see attached.  This implements the desired "gang reduction on an orphan
loop" error diagnostics also for these implicit 'gang' cases, via the
middle-end checking that I've just added in
commit 77d24d43644909852998043335b5a0e09d1e8f02
'Consolidate OpenACC "gang reduction on an orphan loop" checking'.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-OpenACC-Remove-erroneous-Orphan-reductions-cannot-ha.patch
Type: text/x-diff
Size: 26686 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20211130/6373b443/attachment-0001.bin>


More information about the Gcc-patches mailing list