This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/83920] New: [nvptx] bad predicate reset
- From: "cesar at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 17 Jan 2018 23:22:02 +0000
- Subject: [Bug target/83920] New: [nvptx] bad predicate reset
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83920
Bug ID: 83920
Summary: [nvptx] bad predicate reset
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: cesar at gcc dot gnu.org
Target Milestone: ---
Created attachment 43164
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43164&action=edit
gemm test case
Here <https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00516.html> Tom posted a
patch to workaround a PTX JIT bug. However, workaround may clobber a live
predicate register. Consider the offloaded code for the attached gemm.f90
(built using trunk with -fopenacc -O3, but the underlying problem is present in
og7 and impacts da-1.c).
The nvptx compiler generates:
$L35:
selp.u32 %r348, 1, 0, %r191;
shfl.idx.b32 %r348, %r348, 0, 31;
setp.ne.u32 %r191, %r348, 0;
@%r191 bra.uni $L2;
@%r341 bra $L34;
mov.u32 %r155, %r61;
shl.b64 %r158, %r43, 5;
cvt.s64.s32 %r192, %r155;
add.u64 %r161, %r192, 1;
add.u32 %r193, %r53, -1;
cvt.u64.u32 %r166, %r193;
add.u64 %r194, %r41, 2;
add.u64 %r195, %r194, %r166;
mad.lo.u64 %r156, %r43, %r161, %r195;
shl.b64 %r169, %r39, 5;
mad.lo.u64 %r167, %r161, %r39, %r37;
setp.eq.f32 %r266, %r49, 0f00000000;
setp.le.s32 %r267, %r53, 0;
add.u32 %r270, %r55, -1;
mov.f32 %r271, 0f00000000;
setp.eq.f32 %r272, %r49, 0f3f800000;
$L34:
$L11:
setp.eq.u32 %r266, 1, 0;
@%r341 bra $L33;
$L33:
selp.u32 %r347, 1, 0, %r266;
shfl.idx.b32 %r347, %r347, 0, 31;
setp.ne.u32 %r266, %r347, 0;
@!%r266 bra.uni $L22;
bra $L3;
$L12:
Note how %r266 is defined in block $L35, but then it gets clobbered in block
$L33.
This corresponds to the case where beta == 0 in the gemm.f90. I think there
might be other PTX JIT bugs lurking here, because the test program still works
as intended.