[Bug target/105873] [amdgcn][OpenMP] task reductions fail with "team master not responding; slave thread aborting"

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Jun 7 13:52:08 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105873

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think it might be interesting to see which private values are used when:
#pragma omp declare target
int
foo (void)
{
  int result = 0;
  void **buf = __builtin_malloc (8192 * 2 * sizeof (void *));
#pragma omp taskgroup task_reduction(+: result)
  {
    for(int i = 0; i < 8192; ++i) {
#pragma omp task in_reduction(+: result)
      {
        result += 1;
        buf[2 * i] = &result;
        buf[2 * i + 1] = (void *) (__INTPTR_TYPE__) result;
      }
    }
  }
  for(int i = 0; i < 8192; ++i)
    __builtin_printf ("%d %p %d\n", i, buf[2 * i], (int) (__INTPTR_TYPE__)
buf[2 * i + 1]);
  __builtin_free (buf);
  return result;
}
#pragma omp end declare target

int
main ()
{
  int r = 0;
  #pragma omp target parallel map(from:r) num_threads(8)
  #pragma omp single
  r = foo ();
  if (r != 8192)
    __builtin_abort ();
  return 0;
}

Perhaps if it reproduces even with smaller value than 8192 would be nice to use
a smaller value that still reproduces.
If it hangs before printing that, would be nice to find out where.
The code in foo calls GOMP_taskgroup_start and
GOMP_taskgroup_reduction_register (the latter should allocate the 8 private
copies of result), then in the task body it calls GOMP_task_reduction_remap and
finally at the end of the taskgroup, it calls GOMP_taskgroup_end, performs
(serially) reduction from those 8 private copies, and then
GOMP_taskgroup_reduction_unregister to free that.
Adding some printfs to the library side somewhere might help, e.g. to find out
if GOMP_taskgroup_end has been reached (it should be reached by a single thread
only) and whether it has finished.


More information about the Gcc-bugs mailing list