This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

From: Alexander Monakov <amonakov at ispras dot ru>
To: Bernd Schmidt <bschmidt at redhat dot com>
Cc: gcc-patches at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
Date: Wed, 21 Oct 2015 09:07:58 +0300 (MSK)
Subject: Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
Authentication-results: sourceware.org; auth=none
References: <1445366076-16082-1-git-send-email-amonakov at ispras dot ru> <1445366076-16082-7-git-send-email-amonakov at ispras dot ru> <5626D6C0 dot 70901 at redhat dot com>


On Wed, 21 Oct 2015, Bernd Schmidt wrote:

> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > (This patch serves as a straw man proposal to have something concrete for
> > discussion and further patches)
> >
> > On PTX, stack memory is private to each thread.  When master thread
> > constructs
> > 'omp_data_o' on its own stack and passes it to other threads via
> > GOMP_parallel by reference, other threads cannot use the resulting pointer.
> > We need to arrange structures passed between threads be in global, or
> > better,
> > in PTX __shared__ memory (private to each CUDA thread block).
> 
> I guess the question is - why is it better? Do you have multiple thread blocks
> active in your execution model, 

'#pragma omp teams' should map to spawning multiple thread blocks, so yes, at
least in plans I do (but honestly I don't see how it affects the
heap-vs-shared memory decision here)

> and do they require different omp_data_o structures?

yes, each omp_data_o should be private to a team

> Are accesses to it performance critical (more so than any other access?)

Not sure how to address the "more so than ..." part, but since omp_data_o is
accessed by all threads after entering a parallel region, potentially many
times throughout the region, it does seem helpful to arrange it in shared
memory.

I expect there will be other instances like this one, where some on-stack data
will need to be moved to team-shared storage for nvptx.

> It might be slightly cleaner to have the copy described as a new builtin
> call that is always generated and expanded to nothing on normal targets
> rather than modifying existing calls in the IL. Or maybe:
> 
>  p = __builtin_omp_select_location (&stack_local_var, size) ....
>  __builtin_omp_maybe_free (p);
> 
> where the select_location could get simplified to a malloc for nvptx,
> hopefully making the stack variable unused and discarded.

Agreed.

> > Using separate variables is wasteful: they should go into a union to
> > reduce shared memory consumption.
> 
> Not sure what you mean by separate variables?

If two parallel regions are nested in a target region, there will be two
omp_data_o variables of potentially different types, but they can reuse the
same storage.  The patch does not achieve that, because it simply emits a
static __shared__ declaration for each original variable.

Alexander

References:
- [gomp4 00/14] NVPTX: further porting
  - From: Alexander Monakov
- [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
  - From: Alexander Monakov
- Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
  - From: Bernd Schmidt

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]