This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
- Date: Wed, 21 Oct 2015 09:07:58 +0300 (MSK)
- Subject: Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
- Authentication-results: sourceware.org; auth=none
- References: <1445366076-16082-1-git-send-email-amonakov at ispras dot ru> <1445366076-16082-7-git-send-email-amonakov at ispras dot ru> <5626D6C0 dot 70901 at redhat dot com>
On Wed, 21 Oct 2015, Bernd Schmidt wrote:
> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > (This patch serves as a straw man proposal to have something concrete for
> > discussion and further patches)
> >
> > On PTX, stack memory is private to each thread. When master thread
> > constructs
> > 'omp_data_o' on its own stack and passes it to other threads via
> > GOMP_parallel by reference, other threads cannot use the resulting pointer.
> > We need to arrange structures passed between threads be in global, or
> > better,
> > in PTX __shared__ memory (private to each CUDA thread block).
>
> I guess the question is - why is it better? Do you have multiple thread blocks
> active in your execution model,
'#pragma omp teams' should map to spawning multiple thread blocks, so yes, at
least in plans I do (but honestly I don't see how it affects the
heap-vs-shared memory decision here)
> and do they require different omp_data_o structures?
yes, each omp_data_o should be private to a team
> Are accesses to it performance critical (more so than any other access?)
Not sure how to address the "more so than ..." part, but since omp_data_o is
accessed by all threads after entering a parallel region, potentially many
times throughout the region, it does seem helpful to arrange it in shared
memory.
I expect there will be other instances like this one, where some on-stack data
will need to be moved to team-shared storage for nvptx.
> It might be slightly cleaner to have the copy described as a new builtin
> call that is always generated and expanded to nothing on normal targets
> rather than modifying existing calls in the IL. Or maybe:
>
> p = __builtin_omp_select_location (&stack_local_var, size) ....
> __builtin_omp_maybe_free (p);
>
> where the select_location could get simplified to a malloc for nvptx,
> hopefully making the stack variable unused and discarded.
Agreed.
> > Using separate variables is wasteful: they should go into a union to
> > reduce shared memory consumption.
>
> Not sure what you mean by separate variables?
If two parallel regions are nested in a target region, there will be two
omp_data_o variables of potentially different types, but they can reuse the
same storage. The patch does not achieve that, because it simply emits a
static __shared__ declaration for each original variable.
Alexander