[nvptx, PR84954, committed] Fix prevent_branch_around_nothing
Tom de Vries
Tom_deVries@mentor.com
Tue Mar 20 09:44:00 GMT 2018
[ was: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for
branch-around-nothing JIT bug ]
On 01/24/2018 11:41 AM, Tom de Vries wrote:
> Hi,
>
> this patch adds a workaround for the nvptx target JIT bug PR83589 -
> "[nvptx] mode-transitions.c and private-variables.{c,f90} execution
> FAILs at GOMP_NVPTX_JIT=-O0".
>
>
> When compiling a branch-around-nothing (where the branch is warp
> neutering, so it's a divergent branch):
> ...
> Â .reg .pred %r36;
> Â {
> Â Â Â .reg .u32 %x;
> Â Â Â mov.u32 %x,%tid.x;
> Â Â Â setp.ne.u32 %r36,%x,0;
> Â }
>
> Â @ %r36 bra $L5;
> Â $L5:
> ...
>
> The JIT fails to generate a convergence point here:
> ...
> Â Â Â Â Â Â Â Â /*0128*/Â Â Â Â Â Â Â Â Â Â Â Â Â Â @P0 BRA `(.L_1);
> .L_1:
> ...
>
> Consequently, we execute subsequent code in divergent mode, and when
> executing a shfl.idx a bit later we run into the undefined behaviour
> that shfl.idx has when executing in divergent mode.
>
> The workaround detects branch-around-nothing, and inserts a ptx
> operation that does nothing (I'm calling it a fake nop, I haven't been
> able to come up with a better term yet):
> ...
> Â @ %r36 bra $L5;
> Â Â Â {
> Â Â Â Â Â .reg .u32 %nop_src;
> Â Â Â Â Â .reg .u32 %nop_dst;
> Â Â Â Â Â mov.u32 %nop_dst, %nop_src;
> Â Â Â }
> Â $L5:
> ...
> which makes the test pass, because then we generate a convergence point
> here at .L1:
> ...
> Â Â Â Â Â Â Â /*0128*/Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â SSY `(.L_1);
> Â Â Â Â Â Â Â /*0130*/Â Â Â Â Â Â Â Â Â Â Â Â Â Â @P0 SYNC (*"TARGET= .L_1 "*);
> Â Â Â Â Â Â Â /*0138*/Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
>
> The workaround is not minimal given that it inserts the fake nop in all
> branch-around-nothings it detects, not just the warp neutering ones, but
> I think this is more robust than trying to identify the warp neutering
> branches. Furthermore, I'm not going for optimality here anyway. The
> optimal way to fix this is making sure we don't generate
> branch-around-nothing, but that's for stage1.
>
> Build and reg-tested on x86_64 with nvptx accelerator.
>
> I'd like to commit in stage4, but I'd appreciate a review of the code.
> Does the patch look OK?
>
> Thanks,
> - Tom
>
> 0002-nvptx-PR83589-Workaround-for-branch-around-nothing-JIT-bug.patch
>
>
> [nvptx, PR83589] Workaround for branch-around-nothing JIT bug
>
> 2018-01-23 Tom de Vries <tom@codesourcery.com>
>
> PR target/83589
> * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
> (nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
> Add strict parameter.
> (prevent_branch_around_nothing): Insert dummy insn between branch to
> label and label with no ptx insn inbetween.
> * config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.
>
> * testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.
>
> ---
> gcc/config/nvptx/nvptx.c | 92 ++++++++++++++++++++++
> gcc/config/nvptx/nvptx.md | 9 +++
> .../testsuite/libgomp.oacc-c-c++-common/pr83589.c | 21 +++++
> 3 files changed, 122 insertions(+)
>
> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> + insn inbetween the branch and the label. This works around a JIT bug
> + observed at driver version 384.111, at -O0 for sm_50. */
> +
> +static void
> +prevent_branch_around_nothing (void)
> +{
> + rtx_insn *seen_label = 0;
> + for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> + {
> + if (seen_label == 0)
> + {
> + if (INSN_P (insn) && condjump_p (insn))
> + seen_label = label_ref_label (nvptx_condjump_label (insn, false));
> +
> + continue;
> + }
> +
> + if (NOTE_P (insn))
> + continue;
> +
> + if (INSN_P (insn))
> + switch (recog_memoized (insn))
> + {
> + case CODE_FOR_nvptx_fork:
> + case CODE_FOR_nvptx_forked:
> + case CODE_FOR_nvptx_joining:
> + case CODE_FOR_nvptx_join:
> + continue;
> + default:
> + seen_label = 0;
> + continue;
> + }
> +
> + if (LABEL_P (insn) && insn == seen_label)
> + emit_insn_before (gen_fake_nop (), insn);
> +
> + seen_label = 0;
> + }
> + }
Consider testcase:
...
int
main (void)
{
int a[10];
#pragma acc parallel loop worker
for (int i = 0; i < 10; i++)
a[i] = i;
return 0;
}
...
At -O2, we generate this, and fail to generate a fake nop:
...
@ %r34 bra.uni $L8;
@ %r33 bra $L9;
// join 2;
$L9:
$L8:
...
What is happening in prevent_branch_around_nothing is:
- seen_label is NULL
- we process "@ %r34 bra.uni $L8" and seen_label becomes $L8
- we process "@ %r33 bra $L9" and since seen_label != NULL, we end up in
the default case in the switch and reset seen_label to NULL
- we process the labels, seen_label remains NULL, and no fake nop is
generated
What we want to happen instead, is that when processing "@ %r33 bra
$L9", seen_label is updated to $L9. Patch below implements that.
Build and reg-tested on x86_64 with nvptx accelerator.
Committed to stage4 trunk.
Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-nvptx-Fix-prevent_branch_around_nothing.patch
Type: text/x-patch
Size: 1043 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180320/9374d28c/attachment.bin>
More information about the Gcc-patches
mailing list