This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug
- From: Richard Biener <rguenther at suse dot de>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 24 Jan 2018 12:00:00 +0100 (CET)
- Subject: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug
- Authentication-results: sourceware.org; auth=none
- References: <34fb1d00-dc5d-04f2-d601-ee6fe710ac3b@mentor.com>
On Wed, 24 Jan 2018, Tom de Vries wrote:
> Hi,
>
> this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx]
> mode-transitions.c and private-variables.{c,f90} execution FAILs at
> GOMP_NVPTX_JIT=-O0".
>
>
> When compiling a branch-around-nothing (where the branch is warp neutering, so
> it's a divergent branch):
> ...
> .reg .pred %r36;
> {
> .reg .u32 %x;
> mov.u32 %x,%tid.x;
> setp.ne.u32 %r36,%x,0;
> }
>
> @ %r36 bra $L5;
> $L5:
> ...
>
> The JIT fails to generate a convergence point here:
> ...
> /*0128*/ @P0 BRA `(.L_1);
> .L_1:
> ...
>
> Consequently, we execute subsequent code in divergent mode, and when executing
> a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has
> when executing in divergent mode.
>
> The workaround detects branch-around-nothing, and inserts a ptx operation that
> does nothing (I'm calling it a fake nop, I haven't been able to come up with a
> better term yet):
> ...
> @ %r36 bra $L5;
> {
> .reg .u32 %nop_src;
> .reg .u32 %nop_dst;
> mov.u32 %nop_dst, %nop_src;
> }
> $L5:
> ...
> which makes the test pass, because then we generate a convergence point here
> at .L1:
> ...
> /*0128*/ SSY `(.L_1);
> /*0130*/ @P0 SYNC (*"TARGET= .L_1 "*);
> /*0138*/ SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
>
> The workaround is not minimal given that it inserts the fake nop in all
> branch-around-nothings it detects, not just the warp neutering ones, but I
> think this is more robust than trying to identify the warp neutering branches.
> Furthermore, I'm not going for optimality here anyway. The optimal way to fix
> this is making sure we don't generate branch-around-nothing, but that's for
> stage1.
>
> Build and reg-tested on x86_64 with nvptx accelerator.
>
> I'd like to commit in stage4, but I'd appreciate a review of the code. Does
> the patch look OK?
Ok for stage4, but this isn't a review ;)
Richard.
> Thanks,
> - Tom
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)