This is the mail archive of the
mailing list for the GCC project.
Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions
- From: Tom de Vries <tdevries at suse dot de>
- To: Cesar Philippidis <cesar at codesourcery dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>, Thomas Schwinge <thomas at codesourcery dot com>
- Date: Mon, 2 Jul 2018 16:14:17 +0200
- Subject: Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
>>> Now it follows the formula contained in
>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>> Any reason we're not using the cuda runtime functions to get the
>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>> to determine launch configuration in nvptx ) ?
> There are two reasons:
> 1) cuda_occupancy.h depends on the CUDA runtime to extract the device
> properties instead of the CUDA driver API. However, we can always
> teach libgomp how to populate the cudaDeviceProp struct using the
> driver API.
> 2) CUDA is not always present on the build host, and that's why
> libgomp maintains its own cuda.h. So at the very least, this
> functionality would be good to have in libgomp as a fallback
Libgomp maintains its own cuda.h to "allow building GCC with PTX
offloading even without CUDA being installed" (
The libgomp nvptx plugin however uses the cuda driver API to launch
kernels etc, so we can assume that's always available at launch time.
And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
Configuration", the occupancy API is also available in the driver API.
What we cannot assume to be available is the occupancy API pre cuda-6.5.
So it's fine to have a fallback for that (properly isolated in utility
functions), but for cuda 6.5 and up we want to use the occupancy API.
> its not good to have program fail due to
> insufficient hardware resources errors when it is avoidable.
Right, in fact there are two separate things you're trying to address
here: launch failure and occupancy heuristic, so split the patch.