This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

From: Alexander Monakov <amonakov at ispras dot ru>
To: Bernd Schmidt <bschmidt at redhat dot com>
Cc: gcc-patches at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
Date: Wed, 21 Oct 2015 00:51:10 +0300 (MSK)
Subject: Re: [gomp4 09/14] libgomp: provide barriers on NVPTX
Authentication-results: sourceware.org; auth=none
References: <1445366076-16082-1-git-send-email-amonakov at ispras dot ru> <1445366076-16082-10-git-send-email-amonakov at ispras dot ru> <5626AA66 dot 4080306 at redhat dot com>

On Tue, 20 Oct 2015, Bernd Schmidt wrote:

> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > On NVPTX, there's 16 hardware barriers for each thread team, each barrier
> > has
> > a variable waiter count.  The instruction 'bar.sync N, M;' allows to wait on
> > barrier number N until M threads have arrived.  M should be pre-multiplied
> > by
> > warp width.  It's also possible to 'post' the barrier without suspending
> > with
> > 'bar.arrive'.
> >
> > We should be able to provide gomp barrier via a combination of ptx barriers
> > and atomics.  This patch is a first step in that direction.
> >
> > It's mostly a copy of Linux implementation, and it's very likely that
> > functions more complex than gomp_barrier_wait_end are implemented
> > incorrectly.
> > I will have to review all of that (and optimize, hopefully).
> >
> > I'm not sure if naked asm()'s are OK.  It's possible to implement a builtin
> > instead for a minor beautification.  Thoughts?
> 
> I have no concerns about naked asms. I'm more concerned about whether this
> actually works - how much testing has this had?

It does survive libgomp c/c++ tests, which makes use of the simplest barrier,
gomp_barrier_wait_end, at least.

> My experience has been that there is practically no way of using bar.sync
> reliably, since we can't control warp divergence and reconvergence at the
> ptx level but the hardware bar.sync instruction only works when executed by
> all threads in a warp at the same time.

I don't think it's that bad.  Divergence and reconvergence are implicit: a
non-uniform branch is a divergence point, and the corresponding reconvergence
point is at its immediate post-dominator.  Though I do miss a possibility to
force reconvergence at a given point, "resurrecting" masked-out warp members.

For bar.sync behavior the documentation gives an explicit guarantee: every
time a warp encounters a bar.sync instruction, it bumps the count by the warp
width (32), irrespective of how many warp members are active at the time of
encounter.

Alexander

Follow-Ups:
- Re: [gomp4 09/14] libgomp: provide barriers on NVPTX
  - From: Bernd Schmidt

References:
- [gomp4 00/14] NVPTX: further porting
  - From: Alexander Monakov
- [gomp4 09/14] libgomp: provide barriers on NVPTX
  - From: Alexander Monakov
- Re: [gomp4 09/14] libgomp: provide barriers on NVPTX
  - From: Bernd Schmidt

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]