[PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry
Tom de Vries
tdevries@suse.de
Sat Aug 4 20:16:00 GMT 2018
On 08/03/2018 05:37 PM, Cesar Philippidis wrote:
>> But I still see no rationale why blocks is used here, and I wonder
>> whether something like num_gangs = grids * 64 would give similar results.
> My original intent was to keep the load proportional to the block size.
> So, in the case were a block size is limited by shared-memory or the
> register file capacity, the runtime wouldn't excessively over assign
> gangs to the multiprocessor units if their state is going to be swapped
> out even more than necessary.
So, that's your rationale. Please add a comment describing this.
Thanks,
- Tom
More information about the Gcc-patches
mailing list