This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space

Hi Tom,
this is a completely new implementation of an earlier optimization
that Cesar submitted:

The objective is to transform the original single-record-pointer argument
form (OpenMP/pthreads originated) to multiple scalar parameters, that
the CUDA runtime will place directly in the .params space for GPU kernels:

#pragma acc parallel copy(a, b) copyin(c)
  a += b;
  b -= c;

compiles to GIMPLE as:

__attribute__((oacc function (1, 1, 32), omp target entrypoint))
main._omp_fn.0 (const struct .omp_data_t.8 & restrict .omp_data_i)
  _3 = .omp_data_i_2(D)->a;
  _4 = *_3;
  _5 = .omp_data_i_2(D)->b;
  _6 = *_5;

this patch adds pass to transform into:

__attribute__((oacc function (1, 1, 32), omp target entrypoint))
main._omp_fn.0 (int * c, int * b, int * a)
  _3 = a;
  _4 = *_3;
  _5 = b;
  _6 = *_5;

Cesar's original implementation tried to do this in the middle-end,
which required lots of changes throughout the compiler, libgomp interface,
etc. and required a dependency on libffi for the CPU-host fallback child
function (since there is no longer a known, fixed single-pointer argument
interface to all child functions)

This new implementation works by modifying the GIMPLE for child functions
directly at the very start (before, actually) of RTL expansion, and thus
is placed in TARGET_EXPAND_TO_RTL_HOOK, as the core issue is we inherently
need something different generated between the host-fallback vs for the GPU.

The new nvptx_expand_to_rtl_hook modifies the function decl type and
arguments, and scans the gimple body to remove occurrences of .omp_data_i.*
Detection of OpenACC child functions is done through "omp target entrypoint"
and "oacc function" attributes. Because OpenMP target child functions
have a more elaborate wrapper generated for them, this pass only supports
OpenACC right now.

The libgomp nvptx plugin changes are also quite contained, with lots of
now unneeded profiling code deleted (since we no longer first cuAlloc a
buffer for the argument record before cuLaunchKernel)

libgomp has tested with this patch x86_64-linux (nvptx-none accelerator)
without regressions (I'm currently undergoing more gcc tests as well).
Is this okay for trunk?


	* config/nvptx/nvptx.c (nvptx_expand_to_rtl_hook): New function
	implementing CUDA .params space transformation.
	(TARGET_EXPAND_TO_RTL_HOOK): implement hook with

	* plugin/plugin-nvptx.c (nvptx_exec): Adjust arguments, add
	kernel argument setup code, adjust cuLaunchKernel calling code.
	(GOMP_OFFLOAD_openacc_exec): Adjust nvptx_exec call, delete
	profiling code.
	(GOMP_OFFLOAD_openacc_async_exec): Likewise.
	(cuda_free_argmem): Delete function.

Attachment: openacc-cuda-params.patch
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]