This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Thomas Schwinge <thomas at codesourcery dot com>
- Cc: Nathan Sidwell <nathan at codesourcery dot com>, gcc-patches at gcc dot gnu dot org, Bernd Schmidt <bschmidt at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 19 Jan 2016 18:10:00 +0300 (MSK)
- Subject: Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
- Authentication-results: sourceware.org; auth=none
- References: <1453195932 dot 96 dot 0 dot 59001766349 dot issue17226 at mentor dot com> <87oacheqlz dot fsf at hertz dot schwinge dot homeip dot net> <alpine dot LNX dot 2 dot 20 dot 1601191600540 dot 24832 at monopod dot intra dot ispras dot ru> <alpine dot LNX dot 2 dot 20 dot 1601191703460 dot 24832 at monopod dot intra dot ispras dot ru> <87fuxtehzc dot fsf at hertz dot schwinge dot homeip dot net>
On Tue, 19 Jan 2016, Thomas Schwinge wrote:
> Hi!
>
> On Tue, 19 Jan 2016 17:07:17 +0300, Alexander Monakov <amonakov@ispras.ru> wrote:
> > On Tue, 19 Jan 2016, Alexander Monakov wrote:
> > > > ... to determine an optimal number of threads per block given the number
> > > > of registers (maybe just querying CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK
> > > > would do that already?).
> > >
> > > I have implemented that for OpenMP offloading, but also since CUDA 6.0 there's
> > > cuOcc* (occupancy query) interface that allows to simply ask the driver about
> > > the per-function launch limit.
>
> You mean you already have implemented something along the lines I
> proposed?
Yes, I was implementing OpenMP teams, and it made sense to add warps per block
limiting at the same time (i.e. query CU_FUNC_ATTRIBUTE_... and limit if
default or requested number of threads per team is too high). I intend to
post that patch as part of a larger series shortly (but the patch itself is
simple enough, although a small tweak will be needed to make it apply to
OpenACC too).
Alexander