This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[gomp-nvptx 00/13] SIMD, teams, Fortran


Hello,

I'm pushing this patch series to the gomp-nvptx branch.  It adds the
following:

  - backend and omp-low.c changes for SIMT-style SIMD region handling
  - libgomp changes for running the fortran testsuite
  - libgomp changes for spawning multiple OpenMP teams

I'll perform a trunk merge and copyright years update on the branch shortly.
There are 4 tests that still fail in libgomp testsuite with NVPTX offloading:

  - 2 due to missing 'usleep'
  - 2 due to unimplemented 'target nowait'/GOMP_OFFLOAD_run_async.

The most interesting part of the series is omp-low.c additions for lowering of
SIMD regions for SIMT execution.  I've taken care to insert new code only when
the region could be offloaded to NVPTX, and make sure that added code can be
easily cleaned up on the host compiler side.

However, there's one infrastructure piece that I didn't manage to nail down
yet.  We are running in non-default mode outside of SIMD regions, with
per-warp soft-stacks and atomics instrumented to have an effect once per warp.
We need to transition to the opposite on SIMD region boundaries. While
switching atomics is easy, I see no way to model stack switching in GCC IL,
except for doing it at function boundaries (which is then also easy from the
backend point of view).  As a result, we need to outline SIMD regions for
NVPTX into separate functions, if they are not already outlined by virtue of
being combined into an 'omp parallel' or 'omp task'.

To achieve that, I think there are two general possibilities:

1) post lto-streamin, in omp_device_lower, in accel compiler only.  I'm not
sure how hard it would be, it's not something that GCC does normally, although
tree-parloops performs that.  I think this isn't preferable.

2) Up front during omp-lowering, properly outline it together with parallel
and task regions, and tweak inlining so inlining back happens on host side
only.  It looks like I'd need to invent a new ephemeral GIMPLE statement, say
OMP__SIMTREG_ that is handled like other 'taskreg' kinds (OMP_PARALLEL and
OMP_TASK) and artificially inject it in IL.  Or maybe to avoid excessive
surgery, it may be better to reuse existing taskreg kind (OMP_PARALLEL) and
attach and artificial clause instead that signals that this "parallel" is for
outlining a SIMD region.

Thoughts, comments?

Thanks.
Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]