This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES


On Tue, 19 Jan 2016, Alexander Monakov wrote:
> > ... to determine an optimal number of threads per block given the number
> > of registers (maybe just querying CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK
> > would do that already?).
> 
> I have implemented that for OpenMP offloading, but also since CUDA 6.0 there's
> cuOcc* (occupancy query) interface that allows to simply ask the driver about
> the per-function launch limit.

Sorry, I should have mentioned that CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK is
indeed sufficient for limiting threads per block, which is trivially
translatable into workers per gang in OpenACC.  IMO it's also a cleaner
approach in this case, compared to iterative backoff (if, again, the
implementation is free to do that).

When mentioning cuOcc* I was thinking about finding an optimal number of
blocks per device, which is a different story.

Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]