This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

From: Jakub Jelinek <jakub at redhat dot com>
To: Nathan Sidwell <nathan at acm dot org>
Cc: Alexander Monakov <amonakov at ispras dot ru>, gcc-patches at gcc dot gnu dot org, Bernd Schmidt <bschmidt at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>, Thomas Schwinge <thomas at codesourcery dot com>
Date: Wed, 2 Dec 2015 14:10:13 +0100
Subject: Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
Authentication-results: sourceware.org; auth=none
References: <1448983707-18854-1-git-send-email-amonakov at ispras dot ru> <1448983707-18854-3-git-send-email-amonakov at ispras dot ru> <20151202104034 dot GG5675 at tucnak dot redhat dot com> <565EEBF7 dot 8070105 at acm dot org>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> On 12/02/15 05:40, Jakub Jelinek wrote:
> > Don't know the HW good enough, is there any power consumption, heat etc.
> >difference between the two approaches?  I mean does the HW consume different
> >amount of power if only one thread in a warp executes code and the other
> >threads in the same warp just jump around it, vs. having all threads busy?
> 
> Having all threads busy will increase power consumption.  It's also bad if
> the other vectors are executing memory access instructions.  However, for

Then the uniform SIMT approach might not be that good idea.

> small blocks, it is probably a win over the jump around approach.  One of
> the optimizations for the future of the neutering algorithm is to add such
> predication for small blocks and keep branching for the larger blocks.
> 
> >How exactly does OpenACC copy the stack?  At least for OpenMP, one could
> >have automatic vars whose addresses are passed to simd regions in different
> >functions, say like:
> 
> The stack frame of the current function is copied when entering a
> partitioned region.  (There is no visibility of caller's frame and such.)
> Again, optimization would be trying to only copy the stack that's used in
> the partitioned region.

Always the whole stack, from the current stack pointer up to top of the
stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?

	Jakub

Follow-Ups:
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Nathan Sidwell
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Alexander Monakov

References:
- [gomp-nvptx 0/9] Codegen bits for NVPTX OpenMP SIMD
  - From: Alexander Monakov
- [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Alexander Monakov
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Jakub Jelinek
- Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
  - From: Nathan Sidwell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]