This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: The nvptx port

From: Jakub Jelinek <jakub at redhat dot com>
To: Cesar Philippidis <cesar_philippidis at mentor dot com>
Cc: Bernd Schmidt <bernds at codesourcery dot com>, Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org, Thomas Schwinge <thomas_schwinge at mentor dot com>
Date: Fri, 14 Nov 2014 17:18:36 +0100
Subject: Re: The nvptx port
Authentication-results: sourceware.org; auth=none
References: <20141114082948 dot GH5026 at tucnak dot redhat dot com> <5465E2CF dot 6000105 at codesourcery dot com> <20141114113945 dot GM5026 at tucnak dot redhat dot com> <5465F1B8 dot 80205 at codesourcery dot com> <546621CD dot 40007 at mentor dot com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Fri, Nov 14, 2014 at 07:37:49AM -0800, Cesar Philippidis wrote:
> > Hmm. It's worthwhile to keep in mind that GPU threads really behave
> > somewhat differently from CPUs (they don't really execute
> > independently); the OMP model may just be a poor match for the
> > architecture in general.
> > One could busywait on a spinlock, but AFAIK there isn't really a way to
> > put a thread to sleep. By not executing independently, I mean this: I
> > believe if one thread in a warp is waiting on the spinlock, all the
> > other ones are also busywaiting. There may be other effects that seem
> > odd if one approaches it from a CPU perspective - for example you
> > probably want only one thread in a warp to try to take the spinlock.
> 
> Thread synchronization in CUDA is different from conventional CPUs.
> Using the gang/thread terminology, there's no way to synchronize two
> threads in two different gangs in PTX without invoking separate kernels.
> Basically, after a kernel is invoked, the host/accelerator (the later
> using dynamic parallelism) waits for the kernel to finish, and that
> effectively creates a barrier.

I believe in OpenMP terminology a gang is a team, and inter-teams barriers
are not supposed to work etc. (though, I think locks and atomic instructions
still are, so is critical region, so I really hope atomics are atomic even
inter-gang).  So for synchronization (mutexes and semaphores, from which
barriers are implemented; but perhaps could also use bar.arrive and bar.sync)
we mainly need synchronization within the gang.

> Also, keep in mind that PTX doesn't have a global TID. The user needs to
> calculate it using ctaid/tid and friends.

Ok.  Is %gridid needed for that combo too?

	Jakub

Follow-Ups:
- Re: The nvptx port
  - From: Cesar Philippidis

References:
- The nvptx port
  - From: Jakub Jelinek
- Re: The nvptx port
  - From: Bernd Schmidt
- Re: The nvptx port
  - From: Jakub Jelinek
- Re: The nvptx port
  - From: Bernd Schmidt
- Re: The nvptx port
  - From: Cesar Philippidis

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]