This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH,WIP] Use functional parameters for data mappings in OpenACC child functions
- From: Cesar Philippidis <cesar at codesourcery dot com>
- To: Jakub Jelinek <jakub at redhat dot com>, Alexander Monakov <amonakov at ispras dot ru>, Thomas Schwinge <thomas at codesourcery dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 21 Dec 2017 13:46:56 -0800
- Subject: Re: [PATCH,WIP] Use functional parameters for data mappings in OpenACC child functions
- Authentication-results: sourceware.org; auth=none
- References: <7bfb59de-a141-46b4-0c6d-91ecd5b2a766@codesourcery.com>
On 12/18/2017 02:58 PM, Cesar Philippidis wrote:
> Jakub,
>
> I'd like your thoughts on the following problem.
>
> One of the offloading bottlenecks with GPU acceleration in OpenACC is
> the nontrivial offloaded function invocation overhead. At present, GCC
> generates code to pass a struct containing one field for each of the
> data mappings used in the OMP child function. I'm guessing a struct is
> used because pthread_create only accepts a single for new threads. What
> I'd like to do is to create the child function with one argument per
> data mapping. This has a number of advantages:
>
> 1. No device memory needs to be managed for the child function data
> mapping struct.
>
> 2. On PTX targets, the .param address space is cached. Using
> individual parameters for function arguments will allow the nvptx
> back end to generate a more relaxed "execution model" because the
> thread initialization code will be accessing cache memory instead
> of global memory.
>
> 3. It was my hope that this would set a path to eliminate the
> GOMP_MAP_FIRSTPRIVATE_INT optimization, by replacing those mappings
> with the actual value directly.
>
> 1) is huge for programs, such as cloverleaf, which launch a lot of small
> parallel regions a lot of times.
>
> For the execution model in 2), OpenACC begins each parallel region in a
> gang-redundant, worker-single and vector-single state. To transition
> from a single-threaded (or single vector lane) state to a multi-threaded
> partitioned state, GCC needs to emit code to propagate live variables,
> both on the stack and registers to the spawned threads. A lot of loops,
> including DGEMV from BLAS, can be executed in a fully-redundant state.
> Executing code redundantly has the advantage of not requiring any state
> transition code. The problem here is that because a) the struct is in
> global memory, and b) not all of the GPU threads are executing the same
> instruction at the same time. Consequently, initializing each thread in
> a fully redundant manner actually hurts performance. When I rewrote the
> same test case passing the data mappings via individual parameters, that
> optimization improved performance compared to GCC trunk's baseline.
>
> Lastly, 3) is more of a simplification than anything else. I'm not too
> concerned about this because those variables only get initialized once.
> So long as they don't require a separate COPYIN data mapping, the
> performance hit should be negligible.
>
> In this first attempt at using parameters I taught lower_omp_target how
> to create child functions for OpenACC parallel regions with individual
> parameters for the data mappings instead of using a large struct. This
> works for the most part, but I realized too late that pthread_create
> only passes one argument to each thread it creates. It should be noted
> that I left the kernels implementation as-is, using the global struct
> argument because kernels in GCC is largely ineffective and it usually
> falls back to executing code on the host CPU. Eventually, we want to
> redo kernels, but not until we get the parallel code running efficiently.
>
> For fallback host targets, libgomp is using libffi to pass arguments to
> the offloaded functions. This works OK at the moment because the host
> code is always single-threaded. Ideally, that would change in the
> future, but I'm not aware of any immediate plans to do so.
>
> Question: is this approach acceptable for Stage 1 in May, or should I
> make the offloaded function parameter expansion target-specific? I can
> think a couple of ways to make this target-specific:
>
> a. Create two child functions during lowering, one with individual
> parameters for the data mappings, and another which takes in a
> single struct. The latter then calls the former immediately on
> on entry.
>
> b. Teach oaccdevlow to expand the incoming struct into individual
> parameters.
>
> I'm concerned that b) is going to be a large pass. The SRA pass is
> somewhat large at 5k. While this should be simpler, I'm not sure by how
> much (probably a lot because it won't need to preform as much analysis).
>
> While this patch is functional, it's not complete. I still need to tweak
> a couple of things in the runtime. But I don't want to spend too much
> time on it if we decide to go with a different approach.
>
> Any thoughts are welcome.
>
> By the way, next we'll be working on increasing vector_length on nvptx
> targets. In conjunction with that, we'll simplifying the OpenACC
> execution model in the nvptx BE, along with adding a new reduction
> finalizer.
After thinking about this some more, I decided that it would be better
expand the offloaded function arguments into individual parameters
during omp lowering, rather than writing a separate pass later on. I
don't see too many disadvantages of using libffi after a pthread is
spawned by the host. If anything, the pthread's use of libffi is
equivalent of preforming SRA by the accelerator anyway.
I've committed this patch to openacc-gcc-7-branch.
Note that I had to xfail
libgomp.oacc-c-c++-common/combined-directives-1.c because I disabled
struct analysis analysis on parallel regions. Unfortunately, that makes
kernels slightly less effective. But more often than not, kernels
regions fall back to host execution anyway.
Cesar
2017-12-21 Cesar Philippidis <cesar@codesourcery.com>
Makefile.def: Make libgomp depend on libffi.
configure.ac: Likewise.
Makefile.in: Regenerate.
configure: Regenerate.
gcc/fortran/
* types.def: (BF_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
Define.
gcc/
* builtin-types.def (BF_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR):
Define.
* config/nvptx/nvptx.c (nvptx_expand_cmp_swap): Handle PARM_DECLs.
* omp-builtins.def (BUILD_IN_GOACC_PARALLEL): Call
GOACC_parallel_keyed_v2.
* omp-expand.c (expand_omp_target): Update call to
BUILT_IN_GOACC_PARALLEL.
* omp-low.c (struct omp_context): Add parm_map member.
(lookup_parm): New function.
(build_receiver_ref): Lookup parm_map decls.
(install_parm_decl): New function.
(install_var_field): Install parm_map decl for OpenACC parallel region
data clauses.
(delete_omp_context): Clean parm_map.
(scan_sharing_clauses): Install subarray variable mapping into parm_map.
(create_omp_child_function): Defer creation of child function for
OpenACC parallel regions.
(scan_omp_target): Likewise.
(append_decl_arg): New function.
(lower_omp_target): Create an child offloaded function using one
parameter per data mapping for OpenACC parallel regions.
* tree-ssa-structalias.c (find_func_aliases_for_builtin_call):
Ignore OpenACC parallel regions.
(find_func_clobbers): Likewise.
(ipa_pta_execute): Likewise.
libgomp/
* Makefile.am: Add libffi build dependency.
* configure.ac: Likewise.
* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
* libgomp-plugin.h: Define GOMP_OFFLOAD_openacc_exec_params and
GOMP_OFFLOAD_openacc_async_exec_params.
* libgomp.h (acc_dispatch_t): Use them here.
* libgomp.map (GOACC_parallel_keyed_v2): Declare.
* libgomp_g.h (GOACC_parallel_keyed_v2): Likewise.
* oacc-host.c (host_openacc_exec_params): New function.
(host_openacc_async_exec_params): Likewise.
* oacc-parallel.c (goacc_call_host_fn): Likewise.
(GOACC_parallel_keyed_internal): Likewise.
(GOACC_parallel_keyed): Wrapper for GOACC_parallel_keyed_internal.
(GOACC_parallel_keyed_v2): Likewise.
* plugin/plugin-nvptx.c (nvptx_exec): Replace CUDeviceptr dp parameter
with void **kargs.
(openacc_exec_internal): New function.
(GOMP_OFFLOAD_openacc_exec_params): New function.
(GOMP_OFFLOAD_openacc_exec): Update to call openacc_exec_internal.
(openacc_async_exec_internal): New function.
(GOMP_OFFLOAD_openacc_async_exec_params): New function.
(GOMP_OFFLOAD_openacc_async_exec): Update call to
openacc_async_exec_internal.
* target.c (gomp_load_plugin_for_device): Handle
openacc_exec_params and openacc_async_exec_params.
* testsuite/Makefile.in: Regenerate.
* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
Xfail on offloaded targets.
diff --git a/Makefile.def b/Makefile.def
index abfa9efe959..5e94062fa75 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -550,6 +550,7 @@ dependencies = { module=configure-target-libgo; on=all-target-libstdc++-v3; };
dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
dependencies = { module=all-target-libgo; on=all-target-libffi; };
dependencies = { module=all-target-libgo; on=all-target-libatomic; };
+dependencies = { module=configure-target-libgomp; on=configure-target-libffi; };
dependencies = { module=configure-target-libstdc++-v3; on=configure-target-libgomp; };
dependencies = { module=configure-target-liboffloadmic; on=configure-target-libgomp; };
dependencies = { module=configure-target-libsanitizer; on=all-target-libstdc++-v3; };
@@ -564,6 +565,7 @@ dependencies = { module=install-target-libgo; on=install-target-libatomic; };
dependencies = { module=install-target-libgfortran; on=install-target-libquadmath; };
dependencies = { module=install-target-libgfortran; on=install-target-libgcc; };
dependencies = { module=install-target-libsanitizer; on=install-target-libstdc++-v3; };
+dependencies = { module=install-target-libgomp; on=install-target-libffi; };
dependencies = { module=install-target-libsanitizer; on=install-target-libgcc; };
dependencies = { module=install-target-libvtv; on=install-target-libstdc++-v3; };
dependencies = { module=install-target-libvtv; on=install-target-libgcc; };
diff --git a/Makefile.in b/Makefile.in
index b824e0a0ca1..9b4497e3943 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -55803,6 +55803,7 @@ configure-target-libgo: maybe-all-target-libstdc++-v3
all-target-libgo: maybe-all-target-libbacktrace
all-target-libgo: maybe-all-target-libffi
all-target-libgo: maybe-all-target-libatomic
+configure-target-libgomp: maybe-configure-target-libffi
configure-target-libstdc++-v3: maybe-configure-target-libgomp
configure-stage1-target-libstdc++-v3: maybe-configure-stage1-target-libgomp
@@ -55849,6 +55850,7 @@ install-target-libgo: maybe-install-target-libatomic
install-target-libgfortran: maybe-install-target-libquadmath
install-target-libgfortran: maybe-install-target-libgcc
install-target-libsanitizer: maybe-install-target-libstdc++-v3
+install-target-libgomp: maybe-install-target-libffi
install-target-libsanitizer: maybe-install-target-libgcc
install-target-libvtv: maybe-install-target-libstdc++-v3
install-target-libvtv: maybe-install-target-libgcc
diff --git a/configure b/configure
index 32a38633ad8..ed47944d8f9 100755
--- a/configure
+++ b/configure
@@ -3472,11 +3472,19 @@ case "${target}" in
ft32-*-*)
noconfigdirs="$noconfigdirs target-libffi"
;;
+ nvptx-*-*)
+ noconfigdirs="$noconfigdirs target-libffi"
+ ;;
*-*-lynxos*)
noconfigdirs="$noconfigdirs target-libffi"
;;
esac
+libgomp_deps="target-libffi"
+if echo " ${noconfigdirs} " | grep " target-libffi " > /dev/null 2>&1 ; then
+ libgomp_deps=""
+fi
+
# Disable the go frontend on systems where it is known to not work. Please keep
# this in sync with contrib/config-list.mk.
case "${target}" in
@@ -6460,6 +6468,15 @@ esac
# $build_configdirs and $target_configdirs.
# If we have the source for $noconfigdirs entries, add them to $notsupp.
+# libgomp depends on libffi. Remove it from nonsupp if necessary.
+if ! (echo " $noconfigdirs " | grep " target-libgomp " >/dev/null 2>&1); then
+ if echo " $noconfigdirs " | grep " target-libffi " >/dev/null 2>&1; then
+ if test "x${libgomp_deps}" != x; then
+ noconfigdirs=`echo " $noconfigdirs " | sed -e "s/ target-libffi / /"`
+ fi
+ fi
+fi
+
notsupp=""
for dir in . $skipdirs $noconfigdirs ; do
dirname=`echo $dir | sed -e s/target-//g -e s/build-//g`
diff --git a/configure.ac b/configure.ac
index 12377499295..a3b9e116a05 100644
--- a/configure.ac
+++ b/configure.ac
@@ -800,11 +800,19 @@ case "${target}" in
ft32-*-*)
noconfigdirs="$noconfigdirs target-libffi"
;;
+ nvptx-*-*)
+ noconfigdirs="$noconfigdirs target-libffi"
+ ;;
*-*-lynxos*)
noconfigdirs="$noconfigdirs target-libffi"
;;
esac
+libgomp_deps="target-libffi"
+if echo " ${noconfigdirs} " | grep " target-libffi " > /dev/null 2>&1 ; then
+ libgomp_deps=""
+fi
+
# Disable the go frontend on systems where it is known to not work. Please keep
# this in sync with contrib/config-list.mk.
case "${target}" in
@@ -2127,6 +2135,15 @@ esac
# $build_configdirs and $target_configdirs.
# If we have the source for $noconfigdirs entries, add them to $notsupp.
+# libgomp depends on libffi. Remove it from nonsupp if necessary.
+if ! (echo " $noconfigdirs " | grep " target-libgomp " >/dev/null 2>&1); then
+ if echo " $noconfigdirs " | grep " target-libffi " >/dev/null 2>&1; then
+ if test "x${libgomp_deps}" != x; then
+ noconfigdirs=`echo " $noconfigdirs " | sed -e "s/ target-libffi / /"`
+ fi
+ fi
+fi
+
notsupp=""
for dir in . $skipdirs $noconfigdirs ; do
dirname=`echo $dir | sed -e s/target-//g -e s/build-//g`
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index ac9894467ec..7f647c65162 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -763,6 +763,10 @@ DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
BT_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+ BT_VOID, BT_INT, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
+ BT_PTR, BT_PTR, BT_PTR)
+
DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
BT_PTR, BT_INT, BT_INT)
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a7b4c09bf6c..55c7e3cbf90 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4737,6 +4737,10 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
NULL_RTX, mode, EXPAND_NORMAL);
rtx pat;
+ /* 'mem' might be a PARM_DECL. If so, convert it to a register. */
+ if (!REG_P (mem))
+ mem = copy_to_mode_reg (GET_MODE (mem), mem);
+
mem = gen_rtx_MEM (mode, mem);
if (!REG_P (cmp))
cmp = copy_to_mode_reg (mode, cmp);
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 1f8a5a1277c..3c3ad69d848 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -252,3 +252,7 @@ DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
BT_PTR, BT_PTR, BT_PTR)
+
+DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+ BT_VOID, BT_INT, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
+ BT_PTR, BT_PTR, BT_PTR)
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 69b73f4b8c4..a9ec667aa54 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -38,8 +38,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end",
DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
- BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed_v2",
+ BT_FN_VOID_INT_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
ATTR_NOTHROW_LIST)
DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index bf1f127d8d6..f674c74ec82 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7097,19 +7097,21 @@ expand_omp_target (struct omp_region *region)
gomp_target *entry_stmt;
gimple *stmt;
edge e;
- bool offloaded, data_region;
+ bool offloaded, data_region, oacc_parallel;
entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
new_bb = region->entry;
+ oacc_parallel = false;
offloaded = is_gimple_omp_offloaded (entry_stmt);
switch (gimple_omp_target_kind (entry_stmt))
{
+ case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+ oacc_parallel = true;
case GF_OMP_TARGET_KIND_REGION:
case GF_OMP_TARGET_KIND_UPDATE:
case GF_OMP_TARGET_KIND_ENTER_DATA:
case GF_OMP_TARGET_KIND_EXIT_DATA:
- case GF_OMP_TARGET_KIND_OACC_PARALLEL:
case GF_OMP_TARGET_KIND_OACC_KERNELS:
case GF_OMP_TARGET_KIND_OACC_UPDATE:
case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
@@ -7171,7 +7173,7 @@ expand_omp_target (struct omp_region *region)
.OMP_DATA_I may have been converted into a different local
variable. In which case, we need to keep the assignment. */
tree data_arg = gimple_omp_target_data_arg (entry_stmt);
- if (data_arg)
+ if (data_arg && !oacc_parallel)
{
basic_block entry_succ_bb = single_succ (entry_bb);
gimple_stmt_iterator gsi;
@@ -7489,6 +7491,11 @@ expand_omp_target (struct omp_region *region)
/* The maximum number used by any start_ix, without varargs. */
auto_vec<tree, 11> args;
args.quick_push (device);
+ if (start_ix == BUILT_IN_GOACC_PARALLEL)
+ {
+ tree use_params = oacc_parallel ? integer_one_node : integer_zero_node;
+ args.quick_push (use_params);
+ }
if (offloaded)
args.quick_push (build_fold_addr_expr (child_fn));
args.quick_push (t1);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e790f0f1bb2..a2869e49ebd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -89,6 +89,7 @@ struct omp_context
/* Map variables to fields in a structure that allows communication
between sending and receiving threads. */
splay_tree field_map;
+ splay_tree parm_map;
tree record_type;
tree sender_decl;
tree receiver_decl;
@@ -321,6 +322,14 @@ maybe_lookup_decl (const_tree var, omp_context *ctx)
}
static inline tree
+lookup_parm (const_tree var, omp_context *ctx)
+{
+ splay_tree_node n;
+ n = splay_tree_lookup (ctx->parm_map, (splay_tree_key) var);
+ return (tree) n->value;
+}
+
+static inline tree
lookup_field (tree var, omp_context *ctx)
{
splay_tree_node n;
@@ -501,15 +510,21 @@ build_receiver_ref (tree var, bool by_ref, omp_context *ctx)
{
tree x, field = lookup_field (var, ctx);
- /* If the receiver record type was remapped in the child function,
- remap the field into the new record type. */
- x = maybe_lookup_field (field, ctx);
- if (x != NULL)
- field = x;
+ if (is_oacc_parallel (ctx))
+ x = lookup_parm (var, ctx);
+ else
+ {
+ /* If the receiver record type was remapped in the child function,
+ remap the field into the new record type. */
+ x = maybe_lookup_field (field, ctx);
+ if (x != NULL)
+ field = x;
+
+ x = build_simple_mem_ref (ctx->receiver_decl);
+ TREE_THIS_NOTRAP (x) = 1;
+ x = omp_build_component_ref (x, field);
+ }
- x = build_simple_mem_ref (ctx->receiver_decl);
- TREE_THIS_NOTRAP (x) = 1;
- x = omp_build_component_ref (x, field);
if (by_ref)
{
x = build_simple_mem_ref (x);
@@ -644,6 +659,32 @@ build_sender_ref (tree var, omp_context *ctx)
return build_sender_ref ((splay_tree_key) var, ctx);
}
+static void
+install_parm_decl (tree var, tree type, omp_context *ctx)
+{
+ if (!is_oacc_parallel (ctx))
+ return;
+
+ splay_tree_key key = (splay_tree_key) var;
+ tree decl_name = NULL_TREE, t;
+ location_t loc = UNKNOWN_LOCATION;
+
+ if (DECL_P (var))
+ {
+ decl_name = get_identifier (get_name (var));
+ loc = DECL_SOURCE_LOCATION (var);
+ }
+ t = build_decl (loc, PARM_DECL, decl_name, type);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_NAMELESS (t) = 1;
+ DECL_ARG_TYPE (t) = type;
+ DECL_CONTEXT (t) = current_function_decl;
+ TREE_USED (t) = 1;
+ TREE_READONLY (t) = 1;
+
+ splay_tree_insert (ctx->parm_map, key, (splay_tree_value) t);
+}
+
/* Add a new field for VAR inside the structure CTX->SENDER_DECL. If
BASE_POINTERS_RESTRICT, declare the field with restrict. */
@@ -764,7 +805,10 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx,
}
if (mask & 1)
- splay_tree_insert (ctx->field_map, key, (splay_tree_value) field);
+ {
+ splay_tree_insert (ctx->field_map, key, (splay_tree_value) field);
+ install_parm_decl (var, type, ctx);
+ }
if ((mask & 2) && ctx->sfield_map)
splay_tree_insert (ctx->sfield_map, key, (splay_tree_value) sfield);
}
@@ -1068,6 +1112,8 @@ delete_omp_context (splay_tree_value value)
splay_tree_delete (ctx->field_map);
if (ctx->sfield_map)
splay_tree_delete (ctx->sfield_map);
+ if (ctx->parm_map)
+ splay_tree_delete (ctx->parm_map);
/* We hijacked DECL_ABSTRACT_ORIGIN earlier. We need to clear it before
it produces corrupt debug information. */
@@ -1506,6 +1552,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
insert_field_into_struct (ctx->record_type, field);
splay_tree_insert (ctx->field_map, (splay_tree_key) decl,
(splay_tree_value) field);
+ install_parm_decl (decl, ptr_type_node, ctx);
}
}
break;
@@ -1800,10 +1847,13 @@ omp_maybe_offloaded_ctx (omp_context *ctx)
}
/* Build a decl for the omp child function. It'll not contain a body
- yet, just the bare decl. */
+ yet, just the bare decl. Unlike omp child functions, acc child
+ functions for parallel regions have one argument per data
+ mapping. */
static void
-create_omp_child_function (omp_context *ctx, bool task_copy)
+create_omp_child_function (omp_context *ctx, bool task_copy,
+ unsigned int map_cnt = 0)
{
tree decl, type, name, t;
@@ -1825,6 +1875,13 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
type = build_function_type_list (void_type_node, ptr_type_node,
cilk_var_type, cilk_var_type, NULL_TREE);
}
+ else if (is_oacc_parallel (ctx))
+ {
+ tree *arg_types = (tree *) alloca (sizeof (tree) * map_cnt);
+ for (unsigned int i = 0; i < map_cnt; i++)
+ arg_types[i] = ptr_type_node;
+ type = build_function_type_array (void_type_node, map_cnt, arg_types);
+ }
else
type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
@@ -1899,35 +1956,37 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
DECL_ARGUMENTS (decl) = t;
}
- tree data_name = get_identifier (".omp_data_i");
- t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
- ptr_type_node);
- DECL_ARTIFICIAL (t) = 1;
- DECL_NAMELESS (t) = 1;
- DECL_ARG_TYPE (t) = ptr_type_node;
- DECL_CONTEXT (t) = current_function_decl;
- TREE_USED (t) = 1;
- TREE_READONLY (t) = 1;
- if (cilk_for_count)
- DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
- DECL_ARGUMENTS (decl) = t;
- if (!task_copy)
- ctx->receiver_decl = t;
- else
+ if (!is_oacc_parallel (ctx))
{
- t = build_decl (DECL_SOURCE_LOCATION (decl),
- PARM_DECL, get_identifier (".omp_data_o"),
+ tree data_name = get_identifier (".omp_data_i");
+ t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
ptr_type_node);
DECL_ARTIFICIAL (t) = 1;
DECL_NAMELESS (t) = 1;
DECL_ARG_TYPE (t) = ptr_type_node;
DECL_CONTEXT (t) = current_function_decl;
TREE_USED (t) = 1;
- TREE_ADDRESSABLE (t) = 1;
- DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+ TREE_READONLY (t) = 1;
+ if (cilk_for_count)
+ DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
DECL_ARGUMENTS (decl) = t;
+ if (!task_copy)
+ ctx->receiver_decl = t;
+ else
+ {
+ t = build_decl (DECL_SOURCE_LOCATION (decl),
+ PARM_DECL, get_identifier (".omp_data_o"),
+ ptr_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_NAMELESS (t) = 1;
+ DECL_ARG_TYPE (t) = ptr_type_node;
+ DECL_CONTEXT (t) = current_function_decl;
+ TREE_USED (t) = 1;
+ TREE_ADDRESSABLE (t) = 1;
+ DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+ DECL_ARGUMENTS (decl) = t;
+ }
}
-
/* Allocate memory for the function structure. The call to
allocate_struct_function clobbers CFUN, so we need to restore
it afterward. */
@@ -2608,6 +2667,7 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
ctx = new_omp_context (stmt, outer_ctx);
ctx->field_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
+ ctx->parm_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
ctx->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
ctx->record_type = lang_hooks.types.make_type (RECORD_TYPE);
name = create_tmp_var_name (".omp_data_t");
@@ -2621,8 +2681,11 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
bool base_pointers_restrict = false;
if (offloaded)
{
- create_omp_child_function (ctx, false);
- gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
+ if (!is_oacc_parallel (ctx))
+ {
+ create_omp_child_function (ctx, false);
+ gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
+ }
base_pointers_restrict = omp_target_base_pointers_restrict_p (clauses);
if (base_pointers_restrict
@@ -7921,6 +7984,18 @@ convert_from_firstprivate_int (tree var, tree orig_type, bool is_ref,
return var;
}
+static tree
+append_decl_arg (tree var, tree decl_args, omp_context *ctx)
+{
+ if (!is_oacc_parallel (ctx))
+ return NULL_TREE;
+
+ tree temp = lookup_parm (var, ctx);
+ DECL_CHAIN (temp) = decl_args;
+
+ return temp;
+}
+
/* Lower the GIMPLE_OMP_TARGET in the current statement
in GSI_P. CTX holds context information for the directive. */
@@ -7934,7 +8009,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
gimple_seq tgt_body, olist, ilist, fplist, new_body;
location_t loc = gimple_location (stmt);
bool offloaded, data_region;
- unsigned int map_cnt = 0;
+ unsigned int map_cnt = 0, init_cnt = 0;
offloaded = is_gimple_omp_offloaded (stmt);
switch (gimple_omp_target_kind (stmt))
@@ -7980,11 +8055,83 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
}
else if (data_region)
tgt_body = gimple_omp_body (stmt);
- child_fn = ctx->cb.dst_fn;
push_gimplify_context ();
fplist = NULL;
+ /* Determine init_cnt to finish initialize ctx. */
+
+ if (is_oacc_parallel (ctx))
+ {
+ for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
+ switch (OMP_CLAUSE_CODE (c))
+ {
+ tree var;
+
+ default:
+ break;
+ case OMP_CLAUSE_MAP:
+ case OMP_CLAUSE_TO:
+ case OMP_CLAUSE_FROM:
+ init_oacc_firstprivate:
+ var = OMP_CLAUSE_DECL (c);
+ if (!DECL_P (var))
+ {
+ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+ || (!OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_FIRSTPRIVATE_POINTER)))
+ init_cnt++;
+ continue;
+ }
+
+ if (DECL_SIZE (var)
+ && TREE_CODE (DECL_SIZE (var)) != INTEGER_CST)
+ {
+ tree var2 = DECL_VALUE_EXPR (var);
+ gcc_assert (TREE_CODE (var2) == INDIRECT_REF);
+ var2 = TREE_OPERAND (var2, 0);
+ gcc_assert (DECL_P (var2));
+ var = var2;
+ }
+
+ if (offloaded
+ && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+ && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
+ || (OMP_CLAUSE_MAP_KIND (c)
+ == GOMP_MAP_FIRSTPRIVATE_REFERENCE)))
+ {
+ continue;
+ }
+
+ if (!maybe_lookup_field (var, ctx))
+ continue;
+
+ init_cnt++;
+ break;
+
+ case OMP_CLAUSE_FIRSTPRIVATE:
+ if (is_oacc_parallel (ctx))
+ goto init_oacc_firstprivate;
+ init_cnt++;
+ break;
+
+ case OMP_CLAUSE_USE_DEVICE_PTR:
+ case OMP_CLAUSE_IS_DEVICE_PTR:
+ init_cnt++;
+ break;
+ }
+
+ /* Initialize the offloaded child function. */
+
+ create_omp_child_function (ctx, false, init_cnt);
+ gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
+ }
+
+ child_fn = ctx->cb.dst_fn;
+
+ /* Clause Pass 1: Scan and prepare sender decls VALUE_EXPRs for
+ usage on the child function. */
for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
switch (OMP_CLAUSE_CODE (c))
{
@@ -8247,6 +8394,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
if (offloaded)
{
+ if (is_oacc_parallel (ctx))
+ gcc_assert (init_cnt == map_cnt);
target_nesting_level++;
lower_omp (&tgt_body, ctx);
target_nesting_level--;
@@ -8293,6 +8442,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
vec_alloc (vsize, map_cnt);
vec_alloc (vkind, map_cnt);
unsigned int map_idx = 0;
+ tree decl_args = NULL_TREE;
for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
switch (OMP_CLAUSE_CODE (c))
@@ -8488,6 +8638,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
if (s == NULL_TREE)
s = integer_one_node;
s = fold_convert (size_type_node, s);
+ decl_args = append_decl_arg (ovar, decl_args, ctx);
purpose = size_int (map_idx++);
CONSTRUCTOR_APPEND_ELT (vsize, purpose, s);
if (TREE_CODE (s) != INTEGER_CST)
@@ -8628,6 +8779,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
else
s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
s = fold_convert (size_type_node, s);
+ decl_args = append_decl_arg (ovar, decl_args, ctx);
purpose = size_int (map_idx++);
CONSTRUCTOR_APPEND_ELT (vsize, purpose, s);
if (TREE_CODE (s) != INTEGER_CST)
@@ -8667,6 +8819,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
}
gimplify_assign (x, var, &ilist);
s = size_int (0);
+ decl_args = append_decl_arg (ovar, decl_args, ctx);
purpose = size_int (map_idx++);
CONSTRUCTOR_APPEND_ELT (vsize, purpose, s);
gcc_checking_assert (tkind
@@ -8679,6 +8832,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
}
gcc_assert (map_idx == map_cnt);
+ if (is_oacc_parallel (ctx))
+ DECL_ARGUMENTS (child_fn) = nreverse (decl_args);
DECL_INITIAL (TREE_VEC_ELT (t, 1))
= build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 1)), vsize);
@@ -8717,9 +8872,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
{
t = build_fold_addr_expr_loc (loc, ctx->sender_decl);
/* fixup_child_record_type might have changed receiver_decl's type. */
- t = fold_convert_loc (loc, TREE_TYPE (ctx->receiver_decl), t);
- gimple_seq_add_stmt (&new_body,
- gimple_build_assign (ctx->receiver_decl, t));
+ if (!is_oacc_parallel (ctx))
+ {
+ t = fold_convert_loc (loc, TREE_TYPE (ctx->receiver_decl), t);
+ gimple_seq_add_stmt (&new_body,
+ gimple_build_assign (ctx->receiver_decl, t));
+ }
}
gimple_seq_add_seq (&new_body, fplist);
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index aab6821e792..c23ddeb9c86 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4618,6 +4618,7 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
case BUILT_IN_GOMP_PARALLEL:
case BUILT_IN_GOACC_PARALLEL:
{
+ bool oacc_parallel = false;
if (in_ipa_mode)
{
unsigned int fnpos, argpos;
@@ -4631,13 +4632,17 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
case BUILT_IN_GOACC_PARALLEL:
/* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs,
sizes, kinds, ...). */
- fnpos = 1;
- argpos = 3;
+ fnpos = 2;
+ argpos = 4;
+ oacc_parallel = gimple_call_arg (t, 1) == integer_one_node;
break;
default:
gcc_unreachable ();
}
+ if (oacc_parallel)
+ break;
+
tree fnarg = gimple_call_arg (t, fnpos);
gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
tree fndecl = TREE_OPERAND (fnarg, 0);
@@ -5195,6 +5200,7 @@ find_func_clobbers (struct function *fn, gimple *origt)
unsigned int fnpos, argpos;
unsigned int implicit_use_args[2];
unsigned int num_implicit_use_args = 0;
+ bool oacc_parallel = false;
switch (DECL_FUNCTION_CODE (decl))
{
case BUILT_IN_GOMP_PARALLEL:
@@ -5205,15 +5211,19 @@ find_func_clobbers (struct function *fn, gimple *origt)
case BUILT_IN_GOACC_PARALLEL:
/* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs,
sizes, kinds, ...). */
- fnpos = 1;
- argpos = 3;
- implicit_use_args[num_implicit_use_args++] = 4;
+ fnpos = 2;
+ argpos = 4;
implicit_use_args[num_implicit_use_args++] = 5;
+ implicit_use_args[num_implicit_use_args++] = 6;
+ oacc_parallel = gimple_call_arg (t, 1) == integer_one_node;
break;
default:
gcc_unreachable ();
}
+ if (oacc_parallel)
+ break;
+
tree fnarg = gimple_call_arg (t, fnpos);
gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
tree fndecl = TREE_OPERAND (fnarg, 0);
@@ -7968,7 +7978,7 @@ ipa_pta_execute (void)
if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
called_decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
- called_decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
+ called_decl = TREE_OPERAND (gimple_call_arg (stmt, 2), 0);
if (called_decl != NULL_TREE
&& !fndecl_maybe_in_other_partition (called_decl))
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 99ad2fd456d..4de30914d3d 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -13,9 +13,16 @@ search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
+LIBFFI = @LIBFFI@
+LIBFFIINCS = @LIBFFIINCS@
+
+if USE_LIBFFI
+libgomp_la_LIBADD = $(LIBFFI)
+endif
+
vpath % $(strip $(search_path))
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) $(LIBFFIINCS)
AM_CFLAGS = $(XCFLAGS)
AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 7a84b5681e1..617615d4d52 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -171,7 +171,6 @@ libgomp_plugin_nvptx_la_LINK = $(LIBTOOL) --tag=CC \
$(libgomp_plugin_nvptx_la_LDFLAGS) $(LDFLAGS) -o $@
@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_rpath = -rpath \
@PLUGIN_NVPTX_TRUE@ $(toolexeclibdir)
-libgomp_la_LIBADD =
@USE_FORTRAN_TRUE@am__objects_1 = openacc.lo
am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
env.lo error.lo icv.lo icv-device.lo iter.lo iter_ull.lo \
@@ -279,6 +278,8 @@ INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LD = @LD@
LDFLAGS = @LDFLAGS@
+LIBFFI = @LIBFFI@
+LIBFFIINCS = @LIBFFIINCS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
@@ -410,7 +411,8 @@ search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+libgomp_la_LIBADD = $(LIBFFI)
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) $(LIBFFIINCS)
AM_CFLAGS = $(XCFLAGS)
AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 2f45aa74bbe..65e01c5376a 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -189,5 +189,8 @@
/* Define to 1 if the target use emutls for thread-local storage. */
#undef USE_EMUTLS
+/* Define to 1 if the target requires libffi to call the offloaded funtions. */
+#undef USE_LIBFFI
+
/* Version number of package */
#undef VERSION
diff --git a/libgomp/configure b/libgomp/configure
index 11f5b0b1e1c..cc24a81372e 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -649,6 +649,10 @@ PLUGIN_NVPTX
CUDA_DRIVER_LIB
CUDA_DRIVER_INCLUDE
offload_targets
+USE_LIBFFI_FALSE
+USE_LIBFFI_TRUE
+LIBFFIINCS
+LIBFFI
libtool_VERSION
ac_ct_FC
FCFLAGS
@@ -2655,7 +2659,6 @@ else
fi
-
# -------
# -------
@@ -11155,7 +11158,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 11158 "configure"
+#line 11161 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@@ -11261,7 +11264,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 11264 "configure"
+#line 11267 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@@ -15137,6 +15140,28 @@ $as_echo "#define LIBGOMP_OFFLOADED_ONLY 1" >>confdefs.h
fi
+# Prepare libffi when necessary.
+
+LIBFFI=
+LIBFFIINCS=
+if test -d ../libffi; then
+
+$as_echo "#define USE_LIBFFI 1" >>confdefs.h
+
+ LIBFFI=../libffi/libffi_convenience.la
+ LIBFFIINCS='-I$(top_srcdir)/../libffi/include -I../libffi/include'
+fi
+
+
+ if test -d ../libffi; then
+ USE_LIBFFI_TRUE=
+ USE_LIBFFI_FALSE='#'
+else
+ USE_LIBFFI_TRUE='#'
+ USE_LIBFFI_FALSE=
+fi
+
+
# Plugins for offload execution, configure.ac fragment. -*- mode: autoconf -*-
#
# Copyright (C) 2014-2017 Free Software Foundation, Inc.
@@ -16960,6 +16985,10 @@ if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
as_fn_error "conditional \"MAINTAINER_MODE\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
fi
+if test -z "${USE_LIBFFI_TRUE}" && test -z "${USE_LIBFFI_FALSE}"; then
+ as_fn_error "conditional \"USE_LIBFFI\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index a42d4f08b4b..aa49577537e 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -28,7 +28,6 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
AC_MSG_RESULT($enable_generated_files_in_srcdir)
AM_CONDITIONAL(GENINSRC, test "$enable_generated_files_in_srcdir" = yes)
-
# -------
# -------
@@ -215,6 +214,19 @@ if test x$libgomp_offloaded_only = xyes; then
[Define to 1 if building libgomp for an accelerator-only target.])
fi
+# Prepare libffi when necessary.
+
+LIBFFI=
+LIBFFIINCS=
+if test -d ../libffi; then
+ AC_DEFINE(USE_LIBFFI, 1, [Define if we're to use libffi.])
+ LIBFFI=../libffi/libffi_convenience.la
+ LIBFFIINCS='-I$(top_srcdir)/../libffi/include -I../libffi/include'
+fi
+AC_SUBST(LIBFFI)
+AC_SUBST(LIBFFIINCS)
+AM_CONDITIONAL([USE_LIBFFI], [test -d ../libffi])
+
m4_include([plugin/configfrag.ac])
# Check for functions needed.
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index c025069b457..44097cfd56a 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -119,6 +119,13 @@ extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void **,
void **, unsigned *, void *,
struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_exec_params (void (*) (void *), size_t,
+ void **, void **, unsigned *,
+ void *);
+extern void GOMP_OFFLOAD_openacc_async_exec_params (void (*) (void *), size_t,
+ void **, void **,
+ unsigned *, void *,
+ struct goacc_asyncqueue *);
extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 59e7ca8b8c8..a31c83cc656 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -885,6 +885,7 @@ typedef struct acc_dispatch_t
/* Execute. */
__typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
+ __typeof (GOMP_OFFLOAD_openacc_exec_params) *exec_params_func;
struct {
gomp_mutex_t lock;
@@ -900,6 +901,7 @@ typedef struct acc_dispatch_t
__typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;
__typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;
+ __typeof (GOMP_OFFLOAD_openacc_async_exec_params) *exec_params_func;
__typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
__typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;
} async;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 546ac929a0e..7a49acc1dfe 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -461,8 +461,10 @@ GOACC_2.0.1 {
GOACC_2.0.GOMP_4_BRANCH {
global:
GOMP_set_offload_targets;
+ GOACC_parallel_keyed_v2;
} GOACC_2.0.1;
+
GOMP_PLUGIN_1.0 {
global:
GOMP_PLUGIN_malloc;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 958ca6e9cc3..c40e67f2e80 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -298,6 +298,8 @@ extern void GOMP_teams (unsigned int, unsigned int);
extern void GOACC_parallel_keyed (int, void (*) (void *), size_t,
void **, size_t *, unsigned short *, ...);
+extern void GOACC_parallel_keyed_v2 (int, int, void (*) (void *), size_t,
+ void **, size_t *, unsigned short *, ...);
extern void GOACC_parallel (int, void (*) (void *), size_t, void **, size_t *,
unsigned short *, int, int, int, int, int, ...);
extern void GOACC_data_start (int, size_t, void **, size_t *,
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 3b2cafb2c55..5b4e34d7190 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -158,6 +158,30 @@ host_openacc_async_exec (void (*fn) (void *),
fn (hostaddrs);
}
+static void
+host_openacc_exec_params (void (*fn) (void *),
+ size_t mapnum __attribute__ ((unused)),
+ void **hostaddrs,
+ void **devaddrs __attribute__ ((unused)),
+ unsigned *dims __attribute__ ((unused)),
+ void *targ_mem_desc __attribute__ ((unused)))
+{
+ fn (hostaddrs);
+}
+
+static void
+host_openacc_async_exec_params (void (*fn) (void *),
+ size_t mapnum __attribute__ ((unused)),
+ void **hostaddrs,
+ void **devaddrs __attribute__ ((unused)),
+ unsigned *dims __attribute__ ((unused)),
+ void *targ_mem_desc __attribute__ ((unused)),
+ struct goacc_asyncqueue *aq __attribute__ ((unused)))
+{
+ fn (hostaddrs);
+}
+
+
static int
host_openacc_async_test (struct goacc_asyncqueue *aq __attribute__ ((unused)))
{
@@ -265,6 +289,7 @@ static struct gomp_device_descr host_dispatch =
.data_environ = NULL,
.exec_func = host_openacc_exec,
+ .exec_params_func = host_openacc_exec_params,
.async = {
.construct_func = host_openacc_async_construct,
@@ -274,6 +299,7 @@ static struct gomp_device_descr host_dispatch =
.serialize_func = host_openacc_async_serialize,
.queue_callback_func = host_openacc_async_queue_callback,
.exec_func = host_openacc_async_exec,
+ .exec_params_func = host_openacc_async_exec_params,
.dev2host_func = host_openacc_async_dev2host,
.host2dev_func = host_openacc_async_host2dev,
},
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 1172d739ec7..3c5aa24b5f5 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -31,6 +31,9 @@
#include "libgomp_g.h"
#include "gomp-constants.h"
#include "oacc-int.h"
+#if USE_LIBFFI
+# include "ffi.h"
+#endif
#ifdef HAVE_INTTYPES_H
# include <inttypes.h> /* For PRIu64. */
#endif
@@ -104,19 +107,47 @@ handle_ftn_pointers (size_t mapnum, void **hostaddrs, size_t *sizes,
static void goacc_wait (int async, int num_waits, va_list *ap);
+static void
+goacc_call_host_fn (void (*fn) (void *), size_t mapnum, void **hostaddrs,
+ int params)
+{
+#ifdef USE_LIBFFI
+ ffi_cif cif;
+ ffi_type *arg_types[mapnum];
+ void *arg_values[mapnum];
+ ffi_arg result;
+ int i;
+
+ if (params)
+ {
+ for (i = 0; i < mapnum; i++)
+ {
+ arg_types[i] = &ffi_type_pointer;
+ arg_values[i] = &hostaddrs[i];
+ }
+
+ if (ffi_prep_cif (&cif, FFI_DEFAULT_ABI, mapnum,
+ &ffi_type_void, arg_types) == FFI_OK)
+ ffi_call (&cif, FFI_FN (fn), &result, arg_values);
+ else
+ abort ();
+ }
+ else
+#endif
+ fn (hostaddrs);
+}
/* Launch a possibly offloaded function on DEVICE. FN is the host fn
address. MAPNUM, HOSTADDRS, SIZES & KINDS describe the memory
blocks to be copied to/from the device. Varadic arguments are
keyed optional parameters terminated with a zero. */
-void
-GOACC_parallel_keyed (int device, void (*fn) (void *),
- size_t mapnum, void **hostaddrs, size_t *sizes,
- unsigned short *kinds, ...)
+static void
+GOACC_parallel_keyed_internal (int device, int params, void (*fn) (void *),
+ size_t mapnum, void **hostaddrs, size_t *sizes,
+ unsigned short *kinds, va_list *ap)
{
bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
- va_list ap;
struct goacc_thread *thr;
struct gomp_device_descr *acc_dev;
struct target_mem_desc *tgt;
@@ -206,13 +237,13 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
prof_info.device_type = acc_device_host;
api_info.device_type = prof_info.device_type;
goacc_save_and_set_bind (acc_device_host);
- fn (hostaddrs);
+ goacc_call_host_fn (fn, mapnum, hostaddrs, params);
goacc_restore_bind ();
goto out;
}
else if (acc_device_type (acc_dev->type) == acc_device_host)
{
- fn (hostaddrs);
+ goacc_call_host_fn (fn, mapnum, hostaddrs, params);
goto out;
}
else if (profiling_dispatch_p)
@@ -222,9 +253,8 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
for (i = 0; i != GOMP_DIM_MAX; i++)
dims[i] = 0;
- va_start (ap, kinds);
/* TODO: This will need amending when device_type is implemented. */
- while ((tag = va_arg (ap, unsigned)) != 0)
+ while ((tag = va_arg (*ap, unsigned)) != 0)
{
if (GOMP_LAUNCH_DEVICE (tag))
gomp_fatal ("device_type '%d' offload parameters, libgomp is too old",
@@ -238,7 +268,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
for (i = 0; i != GOMP_DIM_MAX; i++)
if (mask & GOMP_DIM_MASK (i))
- dims[i] = va_arg (ap, unsigned);
+ dims[i] = va_arg (*ap, unsigned);
}
break;
@@ -248,7 +278,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
async = GOMP_LAUNCH_OP (tag);
if (async == GOMP_LAUNCH_OP_MAX)
- async = va_arg (ap, unsigned);
+ async = va_arg (*ap, unsigned);
if (profiling_dispatch_p)
{
@@ -267,7 +297,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
int num_waits = ((signed short) GOMP_LAUNCH_OP (tag));
if (num_waits > 0)
- goacc_wait (async, num_waits, &ap);
+ goacc_wait (async, num_waits, ap);
else if (num_waits == acc_async_noval)
acc_wait_all_async (async);
break;
@@ -278,7 +308,6 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
" libgomp is too old", GOMP_LAUNCH_CODE (tag));
}
}
- va_end (ap);
if (!(acc_dev->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC))
{
@@ -338,8 +367,12 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
if (aq == NULL)
{
- acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
- dims, tgt);
+ if (params)
+ acc_dev->openacc.exec_params_func (tgt_fn, mapnum, hostaddrs, devaddrs,
+ dims, tgt);
+ else
+ acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
+ dims, tgt);
if (profiling_dispatch_p)
{
prof_info.event_type = acc_ev_exit_data_start;
@@ -362,8 +395,12 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
}
else
{
- acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
- dims, tgt, aq);
+ if (params)
+ acc_dev->openacc.async.exec_params_func (tgt_fn, mapnum, hostaddrs,
+ devaddrs, dims, tgt, aq);
+ else
+ acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs,
+ devaddrs, dims, tgt, aq);
goacc_async_copyout_unmap_vars (tgt, aq);
}
@@ -381,6 +418,30 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
}
}
+void
+GOACC_parallel_keyed (int device, void (*fn) (void *),
+ size_t mapnum, void **hostaddrs, size_t *sizes,
+ unsigned short *kinds, ...)
+{
+ va_list ap;
+ va_start (ap, kinds);
+ GOACC_parallel_keyed_internal (device, 0, fn, mapnum, hostaddrs, sizes,
+ kinds, &ap);
+ va_end (ap);
+}
+
+void
+GOACC_parallel_keyed_v2 (int device, int args, void (*fn) (void *),
+ size_t mapnum, void **hostaddrs, size_t *sizes,
+ unsigned short *kinds, ...)
+{
+ va_list ap;
+ va_start (ap, kinds);
+ GOACC_parallel_keyed_internal (device, args, fn, mapnum, hostaddrs, sizes,
+ kinds, &ap);
+ va_end (ap);
+}
+
/* Legacy entry point, only provide host execution. */
void
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 94abfe2036f..bdc0c30e1f5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -697,12 +697,11 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
static void
nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
unsigned *dims, void *targ_mem_desc,
- CUdeviceptr dp, CUstream stream)
+ void **kargs, CUstream stream)
{
struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
CUfunction function;
int i;
- void *kargs[1];
int cpu_size = nvptx_thread ()->ptx_dev->max_threads_per_multiprocessor;
int block_size = nvptx_thread ()->ptx_dev->max_threads_per_block;
int dev_size = nvptx_thread ()->ptx_dev->num_sms;
@@ -888,7 +887,6 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
api_info);
}
- kargs[0] = &dp;
CUDA_CALL_ASSERT (cuLaunchKernel, function,
dims[GOMP_DIM_GANG], 1, 1,
dims[GOMP_DIM_VECTOR], dims[GOMP_DIM_WORKER], 1,
@@ -1293,22 +1291,29 @@ GOMP_OFFLOAD_free (int ord, void *ptr)
&& nvptx_free (ptr, ptx_devices[ord]));
}
-void
-GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
- void **hostaddrs, void **devaddrs,
- unsigned *dims, void *targ_mem_desc)
+static void
+openacc_exec_internal (void (*fn) (void *), int params, size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc)
{
GOMP_PLUGIN_debug (0, " %s: prepare mappings\n", __FUNCTION__);
- void **hp = NULL;
+ void **hp = alloca (mapnum * sizeof (void *));
CUdeviceptr dp = 0;
if (mapnum > 0)
{
- hp = alloca (mapnum * sizeof (void *));
- for (int i = 0; i < mapnum; i++)
- hp[i] = (devaddrs[i] ? devaddrs[i] : hostaddrs[i]);
- CUDA_CALL_ASSERT (cuMemAlloc, &dp, mapnum * sizeof (void *));
+ if (params)
+ {
+ for (int i = 0; i < mapnum; i++)
+ hp[i] = (devaddrs[i] ? &devaddrs[i] : &hostaddrs[i]);
+ }
+ else
+ {
+ for (int i = 0; i < mapnum; i++)
+ hp[i] = (devaddrs[i] ? devaddrs[i] : hostaddrs[i]);
+ CUDA_CALL_ASSERT (cuMemAlloc, &dp, mapnum * sizeof (void *));
+ }
}
/* Copy the (device) pointers to arguments to the device (dp and hp might in
@@ -1333,7 +1338,8 @@ GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
data_event_info.data_event.var_name = NULL; //TODO
data_event_info.data_event.bytes = mapnum * sizeof (void *);
data_event_info.data_event.host_ptr = hp;
- data_event_info.data_event.device_ptr = (void *) dp;
+ if (!params)
+ data_event_info.data_event.device_ptr = (void *) dp;
api_info->device_api = acc_device_api_cuda;
@@ -1341,7 +1347,7 @@ GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
api_info);
}
- if (mapnum > 0)
+ if (!params && mapnum > 0)
CUDA_CALL_ASSERT (cuMemcpyHtoD, dp, (void *) hp,
mapnum * sizeof (void *));
@@ -1353,8 +1359,15 @@ GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
api_info);
}
- nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
- dp, NULL);
+ if (params)
+ nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
+ hp, NULL);
+ else
+ {
+ void *kargs[1] = { &dp };
+ nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
+ kargs, NULL);
+ }
CUresult r = cuStreamSynchronize (NULL);
const char *maybe_abort_msg = "(perhaps abort was called)";
@@ -1363,7 +1376,27 @@ GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
maybe_abort_msg);
else if (r != CUDA_SUCCESS)
GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
- CUDA_CALL_ASSERT (cuMemFree, dp);
+
+ if (!params)
+ CUDA_CALL_ASSERT (cuMemFree, dp);
+}
+
+void
+GOMP_OFFLOAD_openacc_exec_params (void (*fn) (void *), size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc)
+{
+ openacc_exec_internal (fn, 1, mapnum, hostaddrs, devaddrs, dims,
+ targ_mem_desc);
+}
+
+void
+GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc)
+{
+ openacc_exec_internal (fn, 0, mapnum, hostaddrs, devaddrs, dims,
+ targ_mem_desc);
}
static void
@@ -1374,11 +1407,11 @@ cuda_free_argmem (void *ptr)
free (block);
}
-void
-GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
- void **hostaddrs, void **devaddrs,
- unsigned *dims, void *targ_mem_desc,
- struct goacc_asyncqueue *aq)
+static void
+openacc_async_exec_internal (void (*fn) (void *), int params, size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc,
+ struct goacc_asyncqueue *aq)
{
GOMP_PLUGIN_debug (0, " %s: prepare mappings\n", __FUNCTION__);
@@ -1388,11 +1421,20 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
if (mapnum > 0)
{
- block = (void **) GOMP_PLUGIN_malloc ((mapnum + 2) * sizeof (void *));
- hp = block + 2;
- for (int i = 0; i < mapnum; i++)
- hp[i] = (devaddrs[i] ? devaddrs[i] : hostaddrs[i]);
- CUDA_CALL_ASSERT (cuMemAlloc, &dp, mapnum * sizeof (void *));
+ if (params)
+ {
+ hp = alloca (sizeof (void *) * mapnum);
+ for (int i = 0; i < mapnum; i++)
+ hp[i] = (devaddrs[i] ? &devaddrs[i] : &hostaddrs[i]);
+ }
+ else
+ {
+ block = (void **) GOMP_PLUGIN_malloc ((mapnum + 2) * sizeof (void *));
+ hp = block + 2;
+ for (int i = 0; i < mapnum; i++)
+ hp[i] = (devaddrs[i] ? devaddrs[i] : hostaddrs[i]);
+ CUDA_CALL_ASSERT (cuMemAlloc, &dp, mapnum * sizeof (void *));
+ }
}
/* Copy the (device) pointers to arguments to the device (dp and hp might in
@@ -1417,7 +1459,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
data_event_info.data_event.var_name = NULL; //TODO
data_event_info.data_event.bytes = mapnum * sizeof (void *);
data_event_info.data_event.host_ptr = hp;
- data_event_info.data_event.device_ptr = (void *) dp;
+ if (!params)
+ data_event_info.data_event.device_ptr = (void *) dp;
api_info->device_api = acc_device_api_cuda;
@@ -1425,7 +1468,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
api_info);
}
- if (mapnum > 0)
+ if (!params && mapnum > 0)
{
CUDA_CALL_ASSERT (cuMemcpyHtoDAsync, dp, (void *) hp,
mapnum * sizeof (void *), aq->cuda_stream);
@@ -1443,14 +1486,42 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &data_event_info,
api_info);
}
-
- nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
- dp, aq->cuda_stream);
- if (mapnum > 0)
+ if (params)
+ nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
+ hp, aq->cuda_stream);
+ else
+ {
+ void *kargs[1] = { &dp };
+ nvptx_exec (fn, mapnum, hostaddrs, devaddrs, dims, targ_mem_desc,
+ kargs, aq->cuda_stream);
+ }
+
+ if (!params && mapnum > 0)
GOMP_OFFLOAD_openacc_async_queue_callback (aq, cuda_free_argmem, block);
}
+void
+GOMP_OFFLOAD_openacc_async_exec_params (void (*fn) (void *), size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc,
+ struct goacc_asyncqueue *aq)
+{
+ openacc_async_exec_internal (fn, 1, mapnum, hostaddrs, devaddrs, dims,
+ targ_mem_desc, aq);
+}
+
+void
+GOMP_OFFLOAD_openacc_async_exec (void (*fn) (void *), size_t mapnum,
+ void **hostaddrs, void **devaddrs,
+ unsigned *dims, void *targ_mem_desc,
+ struct goacc_asyncqueue *aq)
+{
+ openacc_async_exec_internal (fn, 0, mapnum, hostaddrs, devaddrs, dims,
+ targ_mem_desc, aq);
+}
+
+
void *
GOMP_OFFLOAD_openacc_create_thread_data (int ord)
{
diff --git a/libgomp/target.c b/libgomp/target.c
index 336581d2196..10c5e34f378 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2908,6 +2908,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
{
if (!DLSYM_OPT (openacc.exec, openacc_exec)
+ || !DLSYM_OPT (openacc.exec_params, openacc_exec_params)
|| !DLSYM_OPT (openacc.create_thread_data,
openacc_create_thread_data)
|| !DLSYM_OPT (openacc.destroy_thread_data,
@@ -2920,6 +2921,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
|| !DLSYM_OPT (openacc.async.queue_callback,
openacc_async_queue_callback)
|| !DLSYM_OPT (openacc.async.exec, openacc_async_exec)
+ || !DLSYM_OPT (openacc.async.exec_params, openacc_async_exec_params)
|| !DLSYM_OPT (openacc.async.dev2host, openacc_async_dev2host)
|| !DLSYM_OPT (openacc.async.host2dev, openacc_async_host2dev))
{
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 6edb7ae7ade..4d7f43abe3d 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -120,6 +120,8 @@ INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LD = @LD@
LDFLAGS = @LDFLAGS@
+LIBFFI = @LIBFFI@
+LIBFFIINCS = @LIBFFIINCS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
index dad6d13eb60..c6abc1d724a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
@@ -1,6 +1,11 @@
/* This test exercises combined directives. */
+/* This test falls back to host execution because struct alias
+ analysis is deactivated on OpenACC parallel regions. Consequently,
+ parloops can no longer disambiguate arrays a and b. */
+
/* { dg-do run } */
+/* { dg-xfail-if "n/a" { openacc_nvidia_accel_selected } { "-O2" } { "" } } */
#include <stdlib.h>