[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128
vries at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Apr 19 14:32:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381
--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
For this example:
...
#define n 1024
int
main (void)
{
#pragma acc parallel vector_length(128)
{
#pragma acc loop vector
for (int i = 0; i < n; i++)
;
#pragma acc loop vector
for (int i = 0; i < n; i++)
;
}
return 0;
}
...
we currently generate:
...
.entry main$_omp_fn$0
{
.reg.u64 %r24;
.reg.u64 %r25;
.reg.u64 %r26;
.reg.u64 %r27;
.reg.pred %r28;
{
.reg.u32 %x;
mov.u32 %x, %tid.x;
setp.ne.u32 %r28, %x, 0;
}
bar.sync 0;
@%r28 bra $L2;
// join 4;
// fork 4;
$L2:
bar.sync 0;
ret;
}
...
so if we fix the branch around nothing problem here, we'll get back-to-back
bar.syncs again, and may run into the JIT but again.
We may wanna insert dummy ops inbetween (it would be nice if something less
heavy than a membar.cta will work).
More information about the Gcc-bugs
mailing list