This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[gomp4 00/14] NVPTX: further porting
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
- Date: Tue, 20 Oct 2015 21:34:22 +0300
- Subject: [gomp4 00/14] NVPTX: further porting
- Authentication-results: sourceware.org; auth=none
Hello,
This patch series moves libgomp/nvptx porting further along to get initial
bits of parallel execution working, mostly unbreaking the testsuite. Please
have a look! I'm interested in feedback, and would like to know if it's
suitable to become a part of a branch.
This patch series ports enough of libgomp.c to get warp-level parallelism
working for OpenMP offloading. The overall approach is as follows.
I've opted not to use dynamic parallelism. It increases the hardware
requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link
time (libcudadevrt.a), and imposes overhead at run time. The last point might
be moot if we don't manage to make libgomp's own overhead low, but still my
judgement is that a hard dependency on dynamic parallelism is problematic.
The plugin launches one (for now) thread block with 8 warps, which begin
executing a new function in libgomp, gomp_nvptx_main. The warps for a
(pre-allocated) pool. Warp 0 is responsible for initialization and final
cleanup, and proceeds to execute target region functions. Other warps proceed
to gomp_thread_start.
With these patches, it's possible to have libgomp testsuite mostly passing.
The failures are as follows:
libgomp.c/target-{1,7,critical-1}.c: segfault in accelerator code
libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on
NVPTX. Note, the test does not run anything on the device because the target
region has 'if (0)' clause.
libgomp.c++/examples-4/declare_target-2.C: libgomp: Can't map target variables
(size mismatch). Will investigate later.
libgomp.c++/target-1.C: same as libgomp.c/target-1.c, segfault on device.
I didn't run the libgomp/gfortran testsuite yet. I'd like your input on
dealing with testsuite breaks (XFAIL?).
I have not rebased my private branch in a while, so context in
gcc/config/nvptx is probably out-of-date in places.
Yours,
Alexander
nvptx: emit kernels for 'omp target entrypoint' only for OpenACC
nvptx: emit pointers to OpenMP target region entry points
nvptx: expand support for address spaces
nvptx: fix output of _Bool global variables
omp-low: set 'omp target entrypoint' only on entypoints
omp-low: copy omp_data_o to shared memory on NVPTX
libgomp nvptx plugin: launch target functions via gomp_nvptx_main
libgomp nvptx: populate proc.c
libgomp: provide barriers on NVPTX
libgomp: arrange a team of pre-started threads via gomp_nvptx_main
libgomp: avoid variable-length stack allocation in team.c
libgomp: fixup error.c on nvptx
libgomp: provide minimal GOMP_teams
libgomp: use more generic implementations on nvptx
gcc/config/nvptx/nvptx.c | 78 +++++++++++++--
gcc/omp-low.c | 58 +++++++++--
libgomp/config/nvptx/alloc.c | 0
libgomp/config/nvptx/bar.c | 210 ++++++++++++++++++++++++++++++++++++++++
libgomp/config/nvptx/bar.h | 129 +++++++++++++++++++++++-
libgomp/config/nvptx/barrier.c | 0
libgomp/config/nvptx/critical.c | 57 -----------
libgomp/config/nvptx/error.c | 0
libgomp/config/nvptx/iter.c | 0
libgomp/config/nvptx/iter_ull.c | 0
libgomp/config/nvptx/loop.c | 0
libgomp/config/nvptx/loop_ull.c | 0
libgomp/config/nvptx/ordered.c | 0
libgomp/config/nvptx/parallel.c | 0
libgomp/config/nvptx/proc.c | 40 ++++++++
libgomp/config/nvptx/single.c | 0
libgomp/config/nvptx/target.c | 39 ++++++++
libgomp/config/nvptx/task.c | 0
libgomp/config/nvptx/team.c | 0
libgomp/config/nvptx/work.c | 0
libgomp/error.c | 5 +
libgomp/libgomp.h | 10 +-
libgomp/plugin/plugin-nvptx.c | 23 ++++-
libgomp/task.c | 7 +-
libgomp/team.c | 92 +++++++++++++++++-
25 files changed, 664 insertions(+), 84 deletions(-)
delete mode 100644 libgomp/config/nvptx/alloc.c
delete mode 100644 libgomp/config/nvptx/barrier.c
delete mode 100644 libgomp/config/nvptx/critical.c
delete mode 100644 libgomp/config/nvptx/error.c
delete mode 100644 libgomp/config/nvptx/iter.c
delete mode 100644 libgomp/config/nvptx/iter_ull.c
delete mode 100644 libgomp/config/nvptx/loop.c
delete mode 100644 libgomp/config/nvptx/loop_ull.c
delete mode 100644 libgomp/config/nvptx/ordered.c
delete mode 100644 libgomp/config/nvptx/parallel.c
delete mode 100644 libgomp/config/nvptx/single.c
delete mode 100644 libgomp/config/nvptx/task.c
delete mode 100644 libgomp/config/nvptx/team.c
delete mode 100644 libgomp/config/nvptx/work.c