This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry
On 08/01/2018 07:12 AM, Tom de Vries wrote:
>>>> + gangs = grids * (blocks / warp_size);
>>>
>>> So, we launch with gangs == grids * workers ? Is that intentional?
>>
>> Yes. At least that's what I've been using in og8. Setting num_gangs =
>> grids alone caused significant slow downs.
>>
>
> Well, what you're saying here is: increasing num_gangs increases
> performance.
>
> You don't explain why you multiply with workers specifically.
I set it that way because I think the occupancy calculator is
determining the occupancy of a single multiprocessor unit, rather than
the entire GPU. Looking at the og8 code again, I had
num_gangs = 2 * threads_per_sm / warp_size * dev_size
which corresponds to
2 * grids * blocks / warp_size
Because blocks is generally smaller than threads_per_block, the driver
occupancy calculator ends up launching fewer gangs.
I don't have a firm position with this default behavior. Perhaps we
should just set
gang = grids
That's probably an improvement over what's there now.
Cesar