[gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
Bernd Schmidt
bschmidt@redhat.com
Wed Oct 21 00:07:00 GMT 2015
On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> (This patch serves as a straw man proposal to have something concrete for
> discussion and further patches)
>
> On PTX, stack memory is private to each thread. When master thread constructs
> 'omp_data_o' on its own stack and passes it to other threads via
> GOMP_parallel by reference, other threads cannot use the resulting pointer.
> We need to arrange structures passed between threads be in global, or better,
> in PTX __shared__ memory (private to each CUDA thread block).
I guess the question is - why is it better? Do you have multiple thread
blocks active in your execution model, and do they require different
omp_data_o structures? Are accesses to it performance critical (more so
than any other access?) If the answers are "no", then I think you
probably want to fall back to just normal malloced memory or a regular
static variable, as shared memory is a fairly limited resource.
It might be slightly cleaner to have the copy described as a new builtin
call that is always generated and expanded to nothing on normal targets
rather than modifying existing calls in the IL. Or maybe:
p = __builtin_omp_select_location (&stack_local_var, size)
....
__builtin_omp_maybe_free (p);
where the select_location could get simplified to a malloc for nvptx,
hopefully making the stack variable unused and discarded.
> Using separate variables is wasteful: they should go into a union to reduce
> shared memory consumption.
Not sure what you mean by separate variables?
Bernd
More information about the Gcc-patches
mailing list