This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[nvptx, committed] Only use one logical barrier resource

[ was: Re: [nvptx] vector length patch series ]
On 14-12-18 20:58, Tom de Vries wrote:
> 0010-nvptx-only-use-one-bar.sync-barriers-in-OpenACC-offl.patch


- Tom

[nvptx] Only use one logical barrier resource

For openacc loops, we generate this style of code:
        @%r41   bra.uni $L5;
        @%r40   bra     $L6;
                mov.u64 %r32, %ar0;
                cvta.shared.u64 %r39, __worker_bcast;
                st.u64  [%r39], %r32;
                bar.sync        0;
        @%r40   bra     $L4;
                cvta.shared.u64 %r38, __worker_bcast;
                ld.u64  %r32, [%r38];
                bar.sync        1;

The first barrier is there to ensure that no thread reads the broadcast buffer
before it's written.  The second barrier is there to ensure that no thread
overwrites the broadcast buffer before all threads have read it (as well as
implementing the obligatory synchronization after a worker loop).

We've been using the logical barrier resources '0' and '1' for these two
barriers, but there's no reason why we can't use the same one.

Use logical barrier resource '0' for both barriers, making the openacc
implementation claim less resources.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  <>

	* config/nvptx/nvptx.c (nvptx_single): Always pass false to
	(nvptx_process_pars): Likewise.

 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9f834d35200..a354811194c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4351,7 +4351,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  /* This barrier is needed to avoid worker zero clobbering
 	     the broadcast buffer before all the other workers have
 	     had a chance to read this instance of it.  */
-	  emit_insn_before (nvptx_wsync (true), tail);
+	  emit_insn_before (nvptx_wsync (false), tail);
       extract_insn (tail);
@@ -4476,7 +4476,7 @@ nvptx_process_pars (parallel *par)
 	  /* Insert begin and end synchronizations.  */
 	  emit_insn_before (nvptx_wsync (false), par->forked_insn);
-	  emit_insn_before (nvptx_wsync (true), par->join_insn);
+	  emit_insn_before (nvptx_wsync (false), par->join_insn);
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]