This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX


On Tue, 20 Oct 2015, Bernd Schmidt wrote:

> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > On NVPTX, there's 16 hardware barriers for each thread team, each barrier
> > has
> > a variable waiter count.  The instruction 'bar.sync N, M;' allows to wait on
> > barrier number N until M threads have arrived.  M should be pre-multiplied
> > by
> > warp width.  It's also possible to 'post' the barrier without suspending
> > with
> > 'bar.arrive'.
> >
> > We should be able to provide gomp barrier via a combination of ptx barriers
> > and atomics.  This patch is a first step in that direction.
> >
> > It's mostly a copy of Linux implementation, and it's very likely that
> > functions more complex than gomp_barrier_wait_end are implemented
> > incorrectly.
> > I will have to review all of that (and optimize, hopefully).
> >
> > I'm not sure if naked asm()'s are OK.  It's possible to implement a builtin
> > instead for a minor beautification.  Thoughts?
> 
> I have no concerns about naked asms. I'm more concerned about whether this
> actually works - how much testing has this had?

It does survive libgomp c/c++ tests, which makes use of the simplest barrier,
gomp_barrier_wait_end, at least.

> My experience has been that there is practically no way of using bar.sync
> reliably, since we can't control warp divergence and reconvergence at the
> ptx level but the hardware bar.sync instruction only works when executed by
> all threads in a warp at the same time.

I don't think it's that bad.  Divergence and reconvergence are implicit: a
non-uniform branch is a divergence point, and the corresponding reconvergence
point is at its immediate post-dominator.  Though I do miss a possibility to
force reconvergence at a given point, "resurrecting" masked-out warp members.

For bar.sync behavior the documentation gives an explicit guarantee: every
time a warp encounters a bar.sync instruction, it bumps the count by the warp
width (32), irrespective of how many warp members are active at the time of
encounter.

Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]