[PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

Tom de Vries tdevries@suse.de
Sat Aug 4 20:16:00 GMT 2018


On 08/03/2018 05:37 PM, Cesar Philippidis wrote:
>> But I still see no rationale why blocks is used here, and I wonder
>> whether something like num_gangs = grids * 64 would give similar results.

> My original intent was to keep the load proportional to the block size.
> So, in the case were a block size is limited by shared-memory or the
> register file capacity, the runtime wouldn't excessively over assign
> gangs to the multiprocessor units if their state is going to be swapped
> out even more than necessary.

So, that's your rationale. Please add a comment describing this.

Thanks,
- Tom



More information about the Gcc-patches mailing list