[Bug target/88756] New: [nvptx, openacc] Override too many num_workers in nvptx plugin, instead of erroring out

vries at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Jan 8 13:55:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88756

            Bug ID: 88756
           Summary: [nvptx, openacc] Override too many num_workers in
                    nvptx plugin, instead of erroring out
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider this minimized/modified test-case:
...
$ cat libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c  
/* { dg-do run } */

#include <stdlib.h>

const int nw = 64;

int
main (void)
{
  const int n = 10;
  int i;
  int array[n];

  for (i = 0; i < n; i++)
    array[i] = i + 1;

  {
    int res, vres;

    res = 0;
    #pragma acc parallel num_workers (nw) copy (res)
    #pragma acc loop worker reduction (+:res)
    for (i = 0; i < n; i++)
      res = res + array[i];

    vres = 0;
    for (i = 0; i < n; i++)
      vres = vres + array[i];
    if (res != vres)
      abort ();
  }

  return 0;
}
...

When compiling with c.exp, we have:
...
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  (test for excess errors)
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
...
while with c++.exp, we have:
...
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  (test for excess errors)
PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
...

Looking first at c++, the c++ front-end delivers a hard-coded constant for
num_workers:
...
$ grep parallel reduction-1.c.004t.original 
    #pragma acc parallel map(tofrom:res) num_workers(64)
...

and the num_workers constant is then overridden in the compiler to 32:
...
In function 'main._omp_fn.0':
libgomp.oacc-c-c++-common/reduction-1.c:21:13: warning: using num_workers (32),
ignoring 64
$ grep FUNC_MAP reduction-1.s
//:FUNC_MAP "main$_omp_fn$0", 0x1, 0x20, 0x20
...

OTOH, the c frontend delivers a variable for num_workers (only at -O0,
otherwise it'll do the same as the c++ front-end):
...
$ grep parallel reduction-1.c.004t.original 
    #pragma acc parallel map(tofrom:res) num_workers(nw)
...

which cannot be overridden in the compiler (meaning, in
nvptx_goacc_validate_dims), given that it's value isn't known:
...
$ grep FUNC_MAP reduction-1.s
//:FUNC_MAP "main$_omp_fn$0", 0x1, 0, 0x20
...

and at runtime we run into a GOMP_PLUGIN_fatal in the libgomp nvptx plugin:
...
libgomp: The Nvidia accelerator has insufficient resources to launch
'main$_omp_fn$0' with num_workers = 64; recompile the program with 'num_workers
= 32' on that offloaded region or '-fopenacc-dim=:32'
...

For the user, it's somewhat confusing that this passes with warning when
compiling as C++, and fails to execute when compiling as C.

The difference originates in the front-ends, but that doesn't seem to be
openacc-specific, so while it looks possible to fix in the C frontend
(basically , make c_fully_fold_internal apply to launch dims even for
!optimize), I'm not sure that's a good and acceptable idea.

[ And, given this difference, it's probably good to test this behaviour in a
dedicated test-case, but otherwise avoid const int for dimension settings in
libgomp.oacc-c-c++-common test-case, which are tested for both C and C++.  At
first glance, this would mean fixing
libgomp.oacc-c-c++-common/reduction-[1-5].c . ]

OTOH, we can also look at the consequences of the front-end difference, which
are either:
- an override in the compiler, or
- a failure at runtime.

[ In other words, we can abstract away from the const int handling, and
conclude that we see the same difference for:
...
int nw = 64;
    #pragma acc parallel num_workers (nw) copy (res)
...
and
...
    #pragma acc parallel num_workers (64) copy (res)
...
]

While it's clear that in the compiler (that is, in nvptx_goacc_validate_dims)
we can't do better, I wonder why we don't do the same in the plugin, that is,
override with warning.

We would have the more acceptable difference of "compile with warning and run"
vs "compile and run with warning".


More information about the Gcc-bugs mailing list