[gomp-nvptx 0/5] Reorganize soft-stack setup

Alexander Monakov amonakov@ispras.ru
Mon Feb 15 18:44:00 GMT 2016


I've committed the following 5-patch series to amonakov/gomp-nvptx git branch.
The first two patches are unrelated fixes to previously landed code.  Patches
3-5 reorganize the way initial soft-stack setup is done.

Previously soft-stacks used to be allocated by libgomp/config/nvptx/team.c in
a function that wrapped gomp_nvptx_main.  However:

  - the default device heap is only 8 MB, which is not enough for multiple
    teams with 128 KiB per-warp stacks; libgomp plugin would need to increase
    heap size;
  - device heap persists between launches, so it's possible to leak
    soft-stack allocations if a team exits without cleaning up;
  - device malloc is rather slow, so I'd like to eliminate or reuse device
    allocations as much as possible; it's easier to arrange reuse of soft
    stack storage from the host side;
  - there's a chicken-and-egg problem with setting up soft stacks from C code.

So the above motivates a transition to a scheme where libgomp core is
oblivious to soft stack setup, and instead the storage is allocated from the
libgomp plugin (via cuMemAlloc) and passed to the compiler-emitted entry
function as the 2nd (base pointer) and 3rd (per-warp size) arguments.  This
obviously addresses bullets 1-2 above, bullet 4 is addressed since the entry
code is emitted in assembly from the backend, and bullet 3 is left to a
followup change: cuMemAlloc is roughly as slow on the host as malloc is slow
on the device, but we should be able to reuse allocations on the host.

This changes the binary interface between libgomp plugin (GOMP_OFFLOAD_run)
and compiler-emitted kernel entry functions for OpenMP target regions.  For
now, I am free to do that on the branch without worries, but if a similar
change is required in the future after a release, libgomp plugin should be
able to detect which arguments the entry expects.  Assuming the argument list
is only appended to, libgomp plugin only needs to know the argument count.  So
a possible solution is to invent a tagging mechanism when the change needs to
be made, and provide the default 3 arguments to untagged entries.  Old libgomp
plugins unaware of the change should be able to detect failure to provide
sufficient arguments to entries emitted from new compiler from the failure of
cuLaunchKernel

Alexander Monakov (5):
  libgomp plugin: correct types
  Revert "nvptx plugin: bump heap size to 1GB"
  nvptx backend: set up stacks in entry code
  libgomp: remove __nvptx_stacks setup code
  libgomp plugin: manage soft-stack storage

 gcc/ChangeLog.gomp-nvptx      |  6 +++++
 gcc/config/nvptx/nvptx.c      | 57 ++++++++++++++++++++++++++++++------------
 libgomp/ChangeLog.gomp-nvptx  | 26 +++++++++++++++++++
 libgomp/config/nvptx/team.c   | 31 ++++-------------------
 libgomp/plugin/plugin-nvptx.c | 58 +++++++++++++++++++++++++++++++++++--------
 5 files changed, 126 insertions(+), 52 deletions(-)



More information about the Gcc-patches mailing list