This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/85653] New: [nvptx] Work around subsequent bar.sync JIT/ptxas bug
- From: "vries at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 04 May 2018 14:13:10 +0000
- Subject: [Bug target/85653] New: [nvptx] Work around subsequent bar.sync JIT/ptxas bug
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85653
Bug ID: 85653
Summary: [nvptx] Work around subsequent bar.sync JIT/ptxas bug
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01023.html :
...
Hi,
when compiling this testcase with the og7 branch:
...
int
main (void)
{
long long v1;
#pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
#pragma acc loop
for (v1 = 0; v1 < 20; v1 += 2)
;
return 0;
}
...
this ptx is generated:
...
{
// fork 4;
bar.sync 0;
// forked 4;
// joining 4;
bar.sync 0;
// join 4;
ret;
}
...
This triggers some bug on my quadro m1200 (I'm assuming in the ptxas/JIT
compiler) that hangs the testcase. I can work around this by adding a
membar.cta before the bar.syc, or two membar.ctas inbetween, but I'm not really
sure what a minimal workaround should look like (I reported the bug to nvidia,
I'm hoping for them to answer that question).
...
Nvidia came back confirming the two membar.cta inbetween workaround.
We'll want to implement this, even though this shouldn't be triggering on
either og7 or trunk.