[gomp4 00/14] NVPTX: further porting

Fri Oct 23 17:36:00 GMT 2015

On Fri, 23 Oct 2015, Jakub Jelinek wrote:
> Thus, if .shared function local is allowed, we'd need to emit two copies of
> foo, one which assumes it is run in the teams context and one which assumes
> it is run in the parallel context.  If automatic vars can be only .local,
> we are just in big trouble and I guess we really want to investigate what
> others supporting PTX/Cuda are trying to do here.

.shared is statically allocated.  There's an implementation of nvptx
offloading in Clang/LLVM here https://github.com/clang-omp , they put data
that can be shared either in .shared or global memory (user configurable I
think).  Not sure how they deal with recursion or uncertainty that you
describe in regards to the 'foo' function in your example.

Can you point me to other compilers implementing OpenMP offloading for PTX?

Alexander