This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libgomp/69625] New: deadlock in libgomp.c/doacross-1.c test
- From: "vogt at linux dot vnet.ibm.com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 02 Feb 2016 14:08:48 +0000
- Subject: [Bug libgomp/69625] New: deadlock in libgomp.c/doacross-1.c test
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69625
Bug ID: 69625
Summary: deadlock in libgomp.c/doacross-1.c test
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: vogt at linux dot vnet.ibm.com
CC: jakub at gcc dot gnu.org
Target Milestone: ---
Target: s390x
Created attachment 37554
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37554&action=edit
.s file of test program
On s390x with -march=z196 -O2/-O3 the test hangs with a deadlock (and also
doacross-[2.3].c and doacross-1.C, but I haven't looked at them yet). I've
stripped down the test to this:
-- snip --
#include <stdio.h>
#define N 64
int b[N / 16][8][4];
int
main ()
{
int i, j, k, l;
(void)l;
#pragma omp parallel
{
printf("+++\n");
#pragma omp for schedule(static, 0) ordered (3) nowait
for (i = 2; i < N / 16 - 1; i++)
for (j = 0; j < 8; j += 2)
for (k = 1; k <= 3; k++)
{
#pragma omp atomic write
b[i][j][k] = 111111;
#pragma omp ordered depend(sink: i, j - 2, k - 1) \
depend(sink: i - 2, j - 2, k + 1)
#pragma omp ordered depend(sink: i - 3, j + 2, k - 2)
if (j >= 2 && k > 1)
{
#pragma omp atomic read
l = b[i][j - 2][k - 1];
}
#pragma omp atomic write
b[i][j][k] = 222222;
if (i >= 4 && j >= 2 && k < 3)
{
#pragma omp atomic read
l = b[i - 2][j - 2][k + 1];
}
#pragma omp ordered depend(source)
#pragma omp atomic write
b[i][j][k] = 333333;
}
printf("---\n");
}
printf("done\n");
return 0;
}
-- snip --
(See attachment for full .s file.)
(Running on an LPAR with 17 cores inside gdb.)
The function GOMP_parallel starts threads 2 to 17 which enter and leave the
parallel region (they print both "+++" and "---" then hang in a
team_barrier_wait_final() call in gomp_thread_start. Only then thread 1 runs
the thread function.
gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
fn (data);
Thread 1 comes across
0x0000000080000b7a <+522>: brasl %r14,0x800007b0
<GOMP_doacross_wait@plt>
with %r10 == 2 (which presumably contains k), then continues through
0x0000000080000cf6 <+902>: brasl %r14,0x800006f0
<GOMP_doacross_post@plt>
and finally comes back to
0x0000000080000b7a <+522>: brasl %r14,0x800007b0
<GOMP_doacross_wait@plt>
with %r10 == 3. In GOMP_doacross_wait() it ends up calling doacross_spin() and
never gets out of that again:
doacross_spin (array, flattened, cur);
0x000003fff7ef5562 <+282>: lg %r1,0(%r5)
0x000003fff7ef5568 <+288>: clgr %r1,%r2
0x000003fff7ef556c <+292>: jle 0x3fff7ef5562 <GOMP_doacross_wait+282>
The value of r1 (= *r5 (= *array?)) remains 6 (since there's no other thread
left that could modify it) while the value of r2 is 0xfffffffb4a1. To me this
looks as if doacross_spin() compares an integer value with an address or
rubbish.
Any ideas what's going on?