[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

vries at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sat Jan 20 07:33:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> This looks like another nvidia driver problem (with driver version 384.111).

Confirmed.

The empty branch

>  @ %r36 bra $L5;
>  $L5:

is translated into:
...
        /*0128*/               @P0 BRA `(.L_1);
.L_1:
...
so, no sync after the branch (or ssy before the branch).

Consequently, when executing the shfl.idx a bit later:
...
        /*0158*/                   SHFL.IDX PT, R0, R0, RZ, 0x1f;
        /*0168*/                   SHFL.IDX PT, R2, R2, RZ, 0x1f;
...
we are in divergent mode and get undefined results.

Inserting some sort of nop in the branched-around part:
...
  @ %r36 bra $L5;
    {
      .reg .u32 %nop_src;
      .reg .u32 %nop_dst;
      mov.u32 %nop_dst, %nop_src;
    }
  $L5:
...
makes the test pass, because then we generate:
...
        /*0128*/                   SSY `(.L_1);
        /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
        /*0138*/                   SYNC (*"TARGET= .L_1 "*);
.L_1:
...


More information about the Gcc-bugs mailing list