This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/83920] [nvptx] bad predicate reset


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83920

--- Comment #8 from cesar at gcc dot gnu.org ---
I tweaked your proposed fix as follows:

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 55c7e3cbf90..24625cd303f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4104,8 +4104,11 @@ nvptx_single (unsigned mask, basic_block from,
basic_block to)
                    mov.u32 %x,%tid.x;
                    setp.ne.u32 %rnotvzero,%x,0;
                 }
+          reg.pred %rcond2; // Scratch copy of the original rcond.

+          mov.pred %rcond2, %rcond;
                 @%rnotvzero bra Lskip;
+          mov.pred %rcond, %rcond2
                 setp.<op>.<type> %rcond,op1,op2;
                 Lskip:
                 selp.u32 %rcondu32,1,0,%rcond;
@@ -4126,8 +4129,11 @@ nvptx_single (unsigned mask, basic_block from,
basic_block to)
             There is nothing in the PTX spec to suggest that this is wrong, or
             to explain why the extra initialization is needed.  So, we
classify
             it as a JIT bug, and the extra initialization as workaround.  */
-     emit_insn_before (gen_movbi (pvar, const0_rtx),
-                       bb_first_real_insn (from));
+   rtx_insn *from_insn = bb_first_real_insn (from);
+   rtx ptmp = gen_reg_rtx (GET_MODE (pvar));
+   emit_insn_before (gen_rtx_SET (ptmp, pvar), from_insn);
+   emit_insn_before (gen_movbi (pvar, const0_rtx), from_insn);
+   emit_insn_before (gen_rtx_SET (pvar, ptmp), tail);
 #endif
          emit_insn_before (nvptx_gen_vcast (pvar), tail);
        }

This generates the following assembly code for gemm.f90:

$L34:
$L11:
                mov.pred        %r413, %r314;
                setp.eq.u32     %r314, 1, 0;
        @%r402  bra     $L33;
$L33:
                mov.pred        %r314, %r413;
                selp.u32        %r414, 1, 0, %r314;
                shfl.idx.b32    %r414, %r414, 0, 31;
                setp.ne.u32     %r314, %r414, 0;
        @!%r314 bra.uni $L22;
                bra     $L3;
$L12:

I'm not sure what's going on here, because this patch causes illegal memory
access errors in lsdalton. Any thoughts?

Maybe a more involved workaround would be to leave r314 alone, and use the
scratch %r413 register as the predicate. But, then wouldn't the prevent the PRE
code hoisting optimization which moved the computation for %r314 outside of the
loop in the first place?

Is this original PTX JIT bug still present in the current Nvidia drivers? You
mentioned that this problem first appeared in 381.22. I wonder if it has been
resolved in 387.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]