This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128
- From: "vries at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 19 Apr 2018 14:32:07 +0000
- Subject: [Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128
- Auto-submitted: auto-generated
- References: <bug-85381-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381
--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
For this example:
...
#define n 1024
int
main (void)
{
#pragma acc parallel vector_length(128)
{
#pragma acc loop vector
for (int i = 0; i < n; i++)
;
#pragma acc loop vector
for (int i = 0; i < n; i++)
;
}
return 0;
}
...
we currently generate:
...
.entry main$_omp_fn$0
{
.reg.u64 %r24;
.reg.u64 %r25;
.reg.u64 %r26;
.reg.u64 %r27;
.reg.pred %r28;
{
.reg.u32 %x;
mov.u32 %x, %tid.x;
setp.ne.u32 %r28, %x, 0;
}
bar.sync 0;
@%r28 bra $L2;
// join 4;
// fork 4;
$L2:
bar.sync 0;
ret;
}
...
so if we fix the branch around nothing problem here, we'll get back-to-back
bar.syncs again, and may run into the JIT but again.
We may wanna insert dummy ops inbetween (it would be nice if something less
heavy than a membar.cta will work).