This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

From: Thomas Schwinge <thomas at codesourcery dot com>
To: Alexander Monakov <amonakov at ispras dot ru>
Cc: Nathan Sidwell <nathan at codesourcery dot com>, <gcc-patches at gcc dot gnu dot org>, Bernd Schmidt <bschmidt at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>
Date: Tue, 19 Jan 2016 15:55:51 +0100
Subject: Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Authentication-results: sourceware.org; auth=none
References: <1453195932 dot 96 dot 0 dot 59001766349 dot issue17226 at mentor dot com> <87oacheqlz dot fsf at hertz dot schwinge dot homeip dot net> <alpine dot LNX dot 2 dot 20 dot 1601191600540 dot 24832 at monopod dot intra dot ispras dot ru> <alpine dot LNX dot 2 dot 20 dot 1601191703460 dot 24832 at monopod dot intra dot ispras dot ru>

Hi!

On Tue, 19 Jan 2016 17:07:17 +0300, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Tue, 19 Jan 2016, Alexander Monakov wrote:
> > > ... to determine an optimal number of threads per block given the number
> > > of registers (maybe just querying CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK
> > > would do that already?).
> > 
> > I have implemented that for OpenMP offloading, but also since CUDA 6.0 there's
> > cuOcc* (occupancy query) interface that allows to simply ask the driver about
> > the per-function launch limit.

You mean you already have implemented something along the lines I
proposed?

> Sorry, I should have mentioned that CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK is
> indeed sufficient for limiting threads per block, which is trivially
> translatable into workers per gang in OpenACC.

That's good to know, thanks!

> IMO it's also a cleaner
> approach in this case, compared to iterative backoff (if, again, the
> implementation is free to do that).

It is not explicitly spelled out in OpenACC 2.0a, but it got clarified in
OpenACC 2.5.  See "2.5.7. num workers clause": "[...]  The implementation
may use a different value than specified based on limitations imposed by
the target architecture".

> When mentioning cuOcc* I was thinking about finding an optimal number of
> blocks per device, which is a different story.

:-)


GrÃÃe
 Thomas

Follow-Ups:
- Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
  - From: Alexander Monakov

References:
- [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
  - From: Thomas Schwinge
- Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
  - From: Alexander Monakov
- Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
  - From: Alexander Monakov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]