[nvptx, PR84954, committed] Fix prevent_branch_around_nothing

Tom de Vries Tom_deVries@mentor.com
Tue Mar 20 09:44:00 GMT 2018


[ was: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for 
branch-around-nothing JIT bug ]

On 01/24/2018 11:41 AM, Tom de Vries wrote:
> Hi,
> 
> this patch adds a workaround for the nvptx target JIT bug PR83589 - 
> "[nvptx] mode-transitions.c and private-variables.{c,f90} execution 
> FAILs at GOMP_NVPTX_JIT=-O0".
> 
> 
> When compiling a branch-around-nothing (where the branch is warp 
> neutering, so it's a divergent branch):
> ...
>    .reg .pred %r36;
>    {
>      .reg .u32 %x;
>      mov.u32 %x,%tid.x;
>      setp.ne.u32 %r36,%x,0;
>    }
> 
>    @ %r36 bra $L5;
>    $L5:
> ...
> 
> The JIT fails to generate a convergence point here:
> ...
>           /*0128*/               @P0 BRA `(.L_1);
> .L_1:
> ...
> 
> Consequently, we execute subsequent code in divergent mode, and when 
> executing a shfl.idx a bit later we run into the undefined behaviour 
> that shfl.idx has when executing in divergent mode.
> 
> The workaround detects branch-around-nothing, and inserts a ptx 
> operation that does nothing (I'm calling it a fake nop, I haven't been 
> able to come up with a better term yet):
> ...
>    @ %r36 bra $L5;
>      {
>        .reg .u32 %nop_src;
>        .reg .u32 %nop_dst;
>        mov.u32 %nop_dst, %nop_src;
>      }
>    $L5:
> ...
> which makes the test pass, because then we generate a convergence point 
> here at .L1:
> ...
>          /*0128*/                   SSY `(.L_1);
>          /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
>          /*0138*/                   SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
> 
> The workaround is not minimal given that it inserts the fake nop in all 
> branch-around-nothings it detects, not just the warp neutering ones, but 
> I think this is more robust than trying to identify the warp neutering 
> branches. Furthermore, I'm not going for optimality here anyway. The 
> optimal way to fix this is making sure we don't generate 
> branch-around-nothing, but that's for stage1.
> 
> Build and reg-tested on x86_64 with nvptx accelerator.
> 
> I'd like to commit in stage4, but I'd appreciate a review of the code. 
> Does the patch look OK?
> 
> Thanks,
> - Tom
> 
> 0002-nvptx-PR83589-Workaround-for-branch-around-nothing-JIT-bug.patch
> 
> 
> [nvptx, PR83589] Workaround for branch-around-nothing JIT bug
> 
> 2018-01-23  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/83589
> 	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
> 	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
> 	Add strict parameter.
> 	(prevent_branch_around_nothing): Insert dummy insn between branch to
> 	label and label with no ptx insn inbetween.
> 	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.
> 
> 	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.
> 
> ---
>   gcc/config/nvptx/nvptx.c                           | 92 ++++++++++++++++++++++
>   gcc/config/nvptx/nvptx.md                          |  9 +++
>   .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
>   3 files changed, 122 insertions(+)
> 

> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> +   insn inbetween the branch and the label.  This works around a JIT bug
> +   observed at driver version 384.111, at -O0 for sm_50.  */
> +
> +static void
> +prevent_branch_around_nothing (void)
> +{
> +  rtx_insn *seen_label = 0;
> +    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +      {
> +	if (seen_label == 0)
> +	  {
> +	    if (INSN_P (insn) && condjump_p (insn))
> +	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
> +
> +	    continue;
> +	  }
> +
> +	if (NOTE_P (insn))
> +	  continue;
> +
> +	if (INSN_P (insn))
> +	  switch (recog_memoized (insn))
> +	    {
> +	    case CODE_FOR_nvptx_fork:
> +	    case CODE_FOR_nvptx_forked:
> +	    case CODE_FOR_nvptx_joining:
> +	    case CODE_FOR_nvptx_join:
> +	      continue;
> +	    default:
> +	      seen_label = 0;
> +	      continue;
> +	    }
> +
> +	if (LABEL_P (insn) && insn == seen_label)
> +	  emit_insn_before (gen_fake_nop (), insn);
> +
> +	seen_label = 0;
> +      }
> +  }

Consider testcase:
...
int
main (void)
{
   int a[10];
#pragma acc parallel loop worker
   for (int i = 0; i < 10; i++)
     a[i] = i;

   return 0;
}
...

At -O2, we generate this, and fail to generate a fake nop:
...
   @ %r34 bra.uni $L8;
   @ %r33 bra $L9;
   // join 2;
  $L9:
  $L8:
...

What is happening in prevent_branch_around_nothing is:
- seen_label is NULL
- we process "@ %r34 bra.uni $L8" and seen_label becomes $L8
- we process "@ %r33 bra $L9" and since seen_label != NULL, we end up in
   the default case in the switch and reset seen_label to NULL
- we process the labels, seen_label remains NULL, and no fake nop is
   generated

What we want to happen instead, is that when processing "@ %r33 bra 
$L9", seen_label is updated to $L9. Patch below implements that.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to stage4 trunk.

Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-nvptx-Fix-prevent_branch_around_nothing.patch
Type: text/x-patch
Size: 1043 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180320/9374d28c/attachment.bin>


More information about the Gcc-patches mailing list