[Bug target/105873] [amdgcn][OpenMP] task reductions fail with "team master not responding; slave thread aborting"
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Jun 7 13:52:08 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105873
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think it might be interesting to see which private values are used when:
#pragma omp declare target
int
foo (void)
{
int result = 0;
void **buf = __builtin_malloc (8192 * 2 * sizeof (void *));
#pragma omp taskgroup task_reduction(+: result)
{
for(int i = 0; i < 8192; ++i) {
#pragma omp task in_reduction(+: result)
{
result += 1;
buf[2 * i] = &result;
buf[2 * i + 1] = (void *) (__INTPTR_TYPE__) result;
}
}
}
for(int i = 0; i < 8192; ++i)
__builtin_printf ("%d %p %d\n", i, buf[2 * i], (int) (__INTPTR_TYPE__)
buf[2 * i + 1]);
__builtin_free (buf);
return result;
}
#pragma omp end declare target
int
main ()
{
int r = 0;
#pragma omp target parallel map(from:r) num_threads(8)
#pragma omp single
r = foo ();
if (r != 8192)
__builtin_abort ();
return 0;
}
Perhaps if it reproduces even with smaller value than 8192 would be nice to use
a smaller value that still reproduces.
If it hangs before printing that, would be nice to find out where.
The code in foo calls GOMP_taskgroup_start and
GOMP_taskgroup_reduction_register (the latter should allocate the 8 private
copies of result), then in the task body it calls GOMP_task_reduction_remap and
finally at the end of the taskgroup, it calls GOMP_taskgroup_end, performs
(serially) reduction from those 8 private copies, and then
GOMP_taskgroup_reduction_unregister to free that.
Adding some printfs to the library side somewhere might help, e.g. to find out
if GOMP_taskgroup_end has been reached (it should be reached by a single thread
only) and whether it has finished.
More information about the Gcc-bugs
mailing list