This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX


On Tue, Oct 20, 2015 at 09:34:28PM +0300, Alexander Monakov wrote:
> (This patch serves as a straw man proposal to have something concrete for
> discussion and further patches)
> 
> On PTX, stack memory is private to each thread.  When master thread constructs
> 'omp_data_o' on its own stack and passes it to other threads via
> GOMP_parallel by reference, other threads cannot use the resulting pointer.
> We need to arrange structures passed between threads be in global, or better,
> in PTX __shared__ memory (private to each CUDA thread block).

Can you please clarify on what exactly doesn't work and what works and if it
is just a performance issue or some other?
Because .omp_data_o* variables are just one small part of the picture.
That structure holds sometimes the shared variables themselves (in that case
the model is that the variables are first copied into the structure and
after the end of parallel copied back from that back to the original
location), but often just addresses of the shared variables, where the
shared variables then live in their original location and just the address
is stored in .omp_data_o* field.  In this case, the variable could very well
be just a private automatic variable of the initial thread, living on its
stack.  And then .omp_data_o* contains fields for
firstprivate/lastprivate/reduction etc. variables, typically addresses of
the original variables, I believe for firstprivate it can be also the
variables themselves in certain cases.
In any case, user can do stuff like:
#pragma omp declare target
void bar (int *p)
{
  #pragma omp parallel shared (p)
  {
    use (*p);
  }
}
void foo (void)
{
  int a = 6;
  bar (&a);
}
#pragma omp end declare target
void baz (void)
{
  #pragma omp target
  foo ();
}
and then, even if you arrange for the p variable itself to be copied to heap
or .shared for the duration of the parallel region, what it points to is
still living in initial thread's stack.

If this is just a performance thing, can't you e.g. just copy the
.omp_data_o* structure inside of GOMP_parallel into either some .shared
buffer or heap allocated object and copy it back at the end?

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]