This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

From: Jakub Jelinek <jakub at redhat dot com>
To: Alexander Monakov <amonakov at ispras dot ru>
Cc: Nathan Sidwell <nathan at acm dot org>, gcc-patches at gcc dot gnu dot org, Bernd Schmidt <bschmidt at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>, Thomas Schwinge <thomas at codesourcery dot com>
Date: Wed, 2 Dec 2015 16:12:05 +0100
Subject: Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
Authentication-results: sourceware.org; auth=none
References: <1448983707-18854-1-git-send-email-amonakov at ispras dot ru> <1448983707-18854-3-git-send-email-amonakov at ispras dot ru> <20151202104034 dot GG5675 at tucnak dot redhat dot com> <565EEBF7 dot 8070105 at acm dot org> <20151202131013 dot GL5675 at tucnak dot redhat dot com> <alpine dot LNX dot 2 dot 20 dot 1512021750530 dot 7950 at monopod dot intra dot ispras dot ru>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Wed, Dec 02, 2015 at 05:54:51PM +0300, Alexander Monakov wrote:
> On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> 
> > On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > > Don't know the HW good enough, is there any power consumption, heat etc.
> > > >difference between the two approaches?  I mean does the HW consume different
> > > >amount of power if only one thread in a warp executes code and the other
> > > >threads in the same warp just jump around it, vs. having all threads busy?
> > > 
> > > Having all threads busy will increase power consumption.  It's also bad if
> > > the other vectors are executing memory access instructions.  However, for
> > 
> > Then the uniform SIMT approach might not be that good idea.
> 
> Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
> too).  We don't know how the costs balance.  My intuition is that copying is
> worse compared to what I'm doing.
> 
> Anyhow, for good performance the offloaded code needs to be running in vector
> regions most of the time, where the concern doesn't apply.

But you never know if people actually use #pragma omp simd regions or not,
sometimes they will, sometimes they won't, and if the uniform SIMT increases
power consumption, it might not be desirable.

If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.

	Jakub

Follow-Ups:
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Nathan Sidwell

References:
- [gomp-nvptx 0/9] Codegen bits for NVPTX OpenMP SIMD
  - From: Alexander Monakov
- [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Alexander Monakov
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Jakub Jelinek
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Nathan Sidwell
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Jakub Jelinek
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Alexander Monakov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]