This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
For this example:
...
#define n 1024

int
main (void)
{
  #pragma acc parallel vector_length(128)
  {
    #pragma acc loop vector
    for (int i = 0; i < n; i++)
      ;

    #pragma acc loop vector
    for (int i = 0; i < n; i++)
      ;
  }

  return 0;
}
...

we currently generate:
...
.entry main$_omp_fn$0
{
        .reg.u64 %r24;
        .reg.u64 %r25;
        .reg.u64 %r26;
        .reg.u64 %r27;
        .reg.pred %r28;
        {
                .reg.u32        %x;
                mov.u32 %x, %tid.x;
                setp.ne.u32     %r28, %x, 0;
        }
        bar.sync        0;
        @%r28   bra     $L2;
        // join 4;
        // fork 4;
$L2:
        bar.sync        0;
        ret;
}
...

so if we fix the branch around nothing problem here, we'll get back-to-back
bar.syncs again, and may run into the JIT but again.

We may wanna insert dummy ops inbetween (it would be nice if something less
heavy than a membar.cta will work).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]