This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/85653] New: [nvptx] Work around subsequent bar.sync JIT/ptxas bug


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85653

            Bug ID: 85653
           Summary: [nvptx] Work around subsequent bar.sync JIT/ptxas bug
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01023.html :
...
Hi,

when compiling this testcase with the og7 branch:
...
int
main (void)
{
  long long v1;
#pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
#pragma acc loop
  for (v1 = 0; v1 < 20; v1 += 2)
    ;

  return 0;
}
...

this ptx is generated:
...
{

  // fork 4;
  bar.sync 0;
  // forked 4;
  // joining 4;
  bar.sync 0;
  // join 4;
  ret;
}
...

This triggers some bug on my quadro m1200 (I'm assuming in the ptxas/JIT
compiler) that hangs the testcase. I can work around this by adding a
membar.cta before the bar.syc, or two membar.ctas inbetween, but I'm not really
sure what a minimal workaround should look like (I reported the bug to nvidia,
I'm hoping for them to answer that question). 
...

Nvidia came back confirming the two membar.cta inbetween workaround.

We'll want to implement this, even though this shouldn't be triggering on
either og7 or trunk.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]