[gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Bernd Schmidt bschmidt@redhat.com
Wed Oct 21 00:07:00 GMT 2015


On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> (This patch serves as a straw man proposal to have something concrete for
> discussion and further patches)
>
> On PTX, stack memory is private to each thread.  When master thread constructs
> 'omp_data_o' on its own stack and passes it to other threads via
> GOMP_parallel by reference, other threads cannot use the resulting pointer.
> We need to arrange structures passed between threads be in global, or better,
> in PTX __shared__ memory (private to each CUDA thread block).

I guess the question is - why is it better? Do you have multiple thread 
blocks active in your execution model, and do they require different 
omp_data_o structures? Are accesses to it performance critical (more so 
than any other access?) If the answers are "no", then I think you 
probably want to fall back to just normal malloced memory or a regular
static variable, as shared memory is a fairly limited resource.

It might be slightly cleaner to have the copy described as a new builtin 
call that is always generated and expanded to nothing on normal targets 
rather than modifying existing calls in the IL. Or maybe:

  p = __builtin_omp_select_location (&stack_local_var, size)
  ....
  __builtin_omp_maybe_free (p);

where the select_location could get simplified to a malloc for nvptx, 
hopefully making the stack variable unused and discarded.

> Using separate variables is wasteful: they should go into a union to reduce
> shared memory consumption.

Not sure what you mean by separate variables?


Bernd



More information about the Gcc-patches mailing list