This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

On 08/01/2018 07:12 AM, Tom de Vries wrote:

>>>> +	      gangs = grids * (blocks / warp_size);
>>> So, we launch with gangs == grids * workers ? Is that intentional?
>> Yes. At least that's what I've been using in og8. Setting num_gangs =
>> grids alone caused significant slow downs.
> Well, what you're saying here is: increasing num_gangs increases
> performance.
> You don't explain why you multiply with workers specifically.

I set it that way because I think the occupancy calculator is
determining the occupancy of a single multiprocessor unit, rather than
the entire GPU. Looking at the og8 code again, I had

   num_gangs = 2 * threads_per_sm / warp_size * dev_size

which corresponds to

   2 * grids * blocks / warp_size

Because blocks is generally smaller than threads_per_block, the driver
occupancy calculator ends up launching fewer gangs.

I don't have a firm position with this default behavior. Perhaps we
should just set

  gang = grids

That's probably an improvement over what's there now.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]