This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH 06/11] [nvptx] only use one bar.sync barriers in OpenACC offloaded code
- From: <cesar at codesourcery dot com>
- To: <tdevries at suse dot de>, <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 24 Jul 2018 13:47:39 -0700
- Subject: [PATCH 06/11] [nvptx] only use one bar.sync barriers in OpenACC offloaded code
- References: <cover.1532464999.git.cesar@codesourcery.com>
From: Cesar Philippidis <cesar@codesourcery.com>
This patch teaches nvptx_single to always use barrier '0' for CTA
synchronization. This started off as a cosmetic change, but later on
each large vector (i.e. one that larger than a PTX warp) will need to
use its own unique thread barrier to avoid thread divergence.
Consequently, this patch begins the process of teaching the nvptx
state propagator how to use a common thread barrier for each
propagation level.
2018-XX-YY Cesar Philippidis <cesar@codesourcery.com>
gcc/
* config/nvptx/nvptx.c (nvptx_single): Always pass false to
nvptx_cta_sync.
(nvptx_process_pars): Likewise.
(cherry picked from openacc-gcc-7-branch commit
ac0a55b8e72363a09f7968474744c51c1fa7720a)
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4d46d89..1f954a6 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4374,7 +4374,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
/* This barrier is needed to avoid worker zero clobbering
the broadcast buffer before all the other workers have
had a chance to read this instance of it. */
- emit_insn_before (nvptx_cta_sync (true), tail);
+ emit_insn_before (nvptx_cta_sync (false), tail);
}
extract_insn (tail);
@@ -4501,7 +4501,7 @@ nvptx_process_pars (parallel *par)
{
/* Insert begin and end synchronizations. */
emit_insn_before (nvptx_cta_sync (false), par->forked_insn);
- emit_insn_before (nvptx_cta_sync (true), par->join_insn);
+ emit_insn_before (nvptx_cta_sync (false), par->join_insn);
}
}
else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
--
2.7.4