This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[gomp4 00/14] NVPTX: further porting


Hello,

This patch series moves libgomp/nvptx porting further along to get initial
bits of parallel execution working, mostly unbreaking the testsuite.  Please
have a look!  I'm interested in feedback, and would like to know if it's
suitable to become a part of a branch.

This patch series ports enough of libgomp.c to get warp-level parallelism
working for OpenMP offloading.  The overall approach is as follows.

I've opted not to use dynamic parallelism.  It increases the hardware
requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link
time (libcudadevrt.a), and imposes overhead at run time.  The last point might
be moot if we don't manage to make libgomp's own overhead low, but still my
judgement is that a hard dependency on dynamic parallelism is problematic.

The plugin launches one (for now) thread block with 8 warps, which begin
executing a new function in libgomp, gomp_nvptx_main.  The warps for a
(pre-allocated) pool.  Warp 0 is responsible for initialization and final
cleanup, and proceeds to execute target region functions.  Other warps proceed
to gomp_thread_start.

With these patches, it's possible to have libgomp testsuite mostly passing.
The failures are as follows:

libgomp.c/target-{1,7,critical-1}.c: segfault in accelerator code

libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on
NVPTX.  Note, the test does not run anything on the device because the target
region has 'if (0)' clause.

libgomp.c++/examples-4/declare_target-2.C: libgomp: Can't map target variables
(size mismatch).  Will investigate later.

libgomp.c++/target-1.C: same as libgomp.c/target-1.c, segfault on device.

I didn't run the libgomp/gfortran testsuite yet.  I'd like your input on
dealing with testsuite breaks (XFAIL?).

I have not rebased my private branch in a while, so context in
gcc/config/nvptx is probably out-of-date in places.

Yours,
Alexander


  nvptx: emit kernels for 'omp target entrypoint' only for OpenACC
  nvptx: emit pointers to OpenMP target region entry points
  nvptx: expand support for address spaces
  nvptx: fix output of _Bool global variables
  omp-low: set 'omp target entrypoint' only on entypoints
  omp-low: copy omp_data_o to shared memory on NVPTX
  libgomp nvptx plugin: launch target functions via gomp_nvptx_main
  libgomp nvptx: populate proc.c
  libgomp: provide barriers on NVPTX
  libgomp: arrange a team of pre-started threads via gomp_nvptx_main
  libgomp: avoid variable-length stack allocation in team.c
  libgomp: fixup error.c on nvptx
  libgomp: provide minimal GOMP_teams
  libgomp: use more generic implementations on nvptx

 gcc/config/nvptx/nvptx.c        |  78 +++++++++++++--
 gcc/omp-low.c                   |  58 +++++++++--
 libgomp/config/nvptx/alloc.c    |   0
 libgomp/config/nvptx/bar.c      | 210 ++++++++++++++++++++++++++++++++++++++++
 libgomp/config/nvptx/bar.h      | 129 +++++++++++++++++++++++-
 libgomp/config/nvptx/barrier.c  |   0
 libgomp/config/nvptx/critical.c |  57 -----------
 libgomp/config/nvptx/error.c    |   0
 libgomp/config/nvptx/iter.c     |   0
 libgomp/config/nvptx/iter_ull.c |   0
 libgomp/config/nvptx/loop.c     |   0
 libgomp/config/nvptx/loop_ull.c |   0
 libgomp/config/nvptx/ordered.c  |   0
 libgomp/config/nvptx/parallel.c |   0
 libgomp/config/nvptx/proc.c     |  40 ++++++++
 libgomp/config/nvptx/single.c   |   0
 libgomp/config/nvptx/target.c   |  39 ++++++++
 libgomp/config/nvptx/task.c     |   0
 libgomp/config/nvptx/team.c     |   0
 libgomp/config/nvptx/work.c     |   0
 libgomp/error.c                 |   5 +
 libgomp/libgomp.h               |  10 +-
 libgomp/plugin/plugin-nvptx.c   |  23 ++++-
 libgomp/task.c                  |   7 +-
 libgomp/team.c                  |  92 +++++++++++++++++-
 25 files changed, 664 insertions(+), 84 deletions(-)
 delete mode 100644 libgomp/config/nvptx/alloc.c
 delete mode 100644 libgomp/config/nvptx/barrier.c
 delete mode 100644 libgomp/config/nvptx/critical.c
 delete mode 100644 libgomp/config/nvptx/error.c
 delete mode 100644 libgomp/config/nvptx/iter.c
 delete mode 100644 libgomp/config/nvptx/iter_ull.c
 delete mode 100644 libgomp/config/nvptx/loop.c
 delete mode 100644 libgomp/config/nvptx/loop_ull.c
 delete mode 100644 libgomp/config/nvptx/ordered.c
 delete mode 100644 libgomp/config/nvptx/parallel.c
 delete mode 100644 libgomp/config/nvptx/single.c
 delete mode 100644 libgomp/config/nvptx/task.c
 delete mode 100644 libgomp/config/nvptx/team.c
 delete mode 100644 libgomp/config/nvptx/work.c


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]