[gcc/devel/omp/gcc-14] Use OpenACC code to process OpenMP target regions
Paul-Antoine Arras
parras@gcc.gnu.org
Fri Jun 28 09:53:25 GMT 2024
https://gcc.gnu.org/g:027cbe929314ee40bcca49fce3265f3ecbf72ed5
commit 027cbe929314ee40bcca49fce3265f3ecbf72ed5
Author: Chung-Lin Tang <cltang@codesourcery.com>
Date: Fri May 19 12:14:04 2023 -0700
Use OpenACC code to process OpenMP target regions
(forward ported from devel/omp/gcc-12)
This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html
This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.
Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.
gcc/ChangeLog:
* builtins.cc (expand_builtin_omp_builtins): New function.
(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
* cgraphunit.cc (analyze_functions): Add call to
omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
* common.opt (fopenmp-target=): Add new option and enums.
* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
prototype.
(nvptx_mem_shared_p): Likewise.
* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
symbol for number of threads in team.
(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
(need_omp_num_threads): New bool for if any function references
omp_num_threads_sym.
(nvptx_option_override): Initialize omp_num_threads_sym/align.
(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
(nvptx_declare_function_name): Disable shim function under OMPACC mode.
Disable soft-stack under OMPACC mode. Add generation of neutering init
code under OMPACC mode.
(nvptx_output_set_softstack): Return "" under OMPACC mode.
(nvptx_expand_call): Set parallelism to vector for function calls with
"ompacc for" attached.
(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
(nvptx_expand_oacc_join): Likewise.
(nvptx_expand_omp_get_num_threads): New function.
(nvptx_mem_shared_p): New function.
(nvptx_mach_max_workers): Return 1 under OMPACC mode.
(nvptx_mach_vector_length): Return 32 under OMPACC mode.
(nvptx_single): Add adjustments for OMPACC mode, which have
parallel-construct fork/joins, and regions of code where neutering is
dynamically determined.
(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
attribute is attached to function. Disable uniform-simt when under
OMPACC mode.
(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.
* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.
* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.
* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
mode cases.
libgomp/ChangeLog:
* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.
(cherry picked from commit 5f881613fa9128edae5bbfa4e19f9752809e4bd7)
Diff:
---
gcc/builtins.cc | 71 ++++++
gcc/cgraphunit.cc | 7 +-
gcc/common.opt | 13 +
gcc/config/nvptx/mkoffload.cc | 13 +
gcc/config/nvptx/nvptx-protos.h | 2 +
gcc/config/nvptx/nvptx.cc | 269 +++++++++++++++++++--
gcc/config/nvptx/nvptx.h | 3 +
gcc/config/nvptx/nvptx.md | 68 ++++++
gcc/expr.cc | 3 +-
gcc/flag-types.h | 6 +
gcc/gimplify.cc | 33 +++
gcc/lto-wrapper.cc | 1 +
gcc/omp-builtins.def | 4 +-
gcc/omp-expand.cc | 67 +++++-
gcc/omp-general.cc | 11 +-
gcc/omp-low.cc | 145 +++++++++++-
gcc/omp-offload.cc | 302 +++++++++++++++++++++++-
gcc/omp-offload.h | 1 +
gcc/opts.cc | 8 +
gcc/target-insns.def | 5 +
gcc/tree-core.h | 4 +
gcc/tree-nested.cc | 2 +
gcc/tree-pretty-print.cc | 6 +
gcc/tree.cc | 2 +
gcc/tree.h | 3 +
libgomp/config/nvptx/team.c | 3 +
libgomp/testsuite/libgomp.c-c++-common/for-17.c | 69 ++++++
libgomp/testsuite/libgomp.c-c++-common/for-18.c | 5 +
28 files changed, 1071 insertions(+), 55 deletions(-)
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..6b74cc7be60 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -7520,6 +7520,62 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
return target;
}
+static rtx
+expand_builtin_omp_builtins (tree exp, rtx target, int ignore)
+{
+ rtx ret = NULL;
+ rtx_insn *(*gen_fn) (rtx) = NULL;
+
+ switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp)))
+ {
+ case BUILT_IN_GOMP_BARRIER:
+ if (targetm.have_gomp_barrier ())
+ {
+ emit_insn (targetm.gen_gomp_barrier ());
+ return target;
+ }
+ break;
+
+ case BUILT_IN_OMP_GET_THREAD_NUM:
+ if (targetm.have_omp_get_thread_num ())
+ gen_fn = targetm.gen_omp_get_thread_num;
+ break;
+
+ case BUILT_IN_OMP_GET_NUM_THREADS:
+ if (targetm.have_omp_get_num_threads ())
+ gen_fn = targetm.gen_omp_get_num_threads;
+ break;
+
+ case BUILT_IN_OMP_GET_TEAM_NUM:
+ if (targetm.have_omp_get_team_num ())
+ gen_fn = targetm.gen_omp_get_team_num;
+ break;
+
+ case BUILT_IN_OMP_GET_NUM_TEAMS:
+ if (targetm.have_omp_get_num_teams ())
+ gen_fn = targetm.gen_omp_get_num_teams;
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+
+ if (ignore)
+ return const0_rtx;
+
+ if (gen_fn)
+ {
+ rtx reg = (MEM_P (target)
+ ? gen_reg_rtx (GET_MODE (target))
+ : target);
+ emit_insn (gen_fn (reg));
+ if (reg != target)
+ emit_move_insn (target, reg);
+ ret = target;
+ }
+ return ret;
+}
+
/* Expand a string compare operation using a sequence of char comparison
to get rid of the calling overhead, with result going to TARGET if
that's convenient.
@@ -8917,6 +8973,21 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
case BUILT_IN_GOACC_PARLEVEL_SIZE:
return expand_builtin_goacc_parlevel_id_size (exp, target, ignore);
+ case BUILT_IN_GOMP_BARRIER:
+ case BUILT_IN_OMP_GET_THREAD_NUM:
+ case BUILT_IN_OMP_GET_NUM_THREADS:
+ case BUILT_IN_OMP_GET_TEAM_NUM:
+ case BUILT_IN_OMP_GET_NUM_TEAMS:
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ {
+ target = expand_builtin_omp_builtins (exp, target, ignore);
+ if (target)
+ return target;
+ }
+ break;
+
case BUILT_IN_SPECULATION_SAFE_VALUE_PTR:
return expand_speculation_safe_value (VOIDmode, exp, target, ignore);
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index 2bd0289ffba..c4d222a4111 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -1184,7 +1184,12 @@ analyze_functions (bool first_time)
build_type_inheritance_graph ();
if (flag_openmp && first_time)
- omp_discover_implicit_declare_target ();
+ {
+ omp_discover_implicit_declare_target ();
+
+ if(flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ omp_ompacc_attribute_tagging ();
+ }
/* Analysis adds static variables that in turn adds references to new functions.
So we need to iterate the process until it stabilize. */
diff --git a/gcc/common.opt b/gcc/common.opt
index e19c4ef1166..98ba02b2f17 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2381,6 +2381,19 @@ Enum(target_simd_clone_device) String(nohost) Value(OMP_TARGET_SIMD_CLONE_NOHOST
EnumValue
Enum(target_simd_clone_device) String(any) Value(OMP_TARGET_SIMD_CLONE_ANY)
+fopenmp-target=
+Common Joined RejectNegative Enum(openmp_target) Var(flag_openmp_target) Init(OMP_TARGET_MODE_DEFAULT)
+Execution model used for OpenMP target regions.
+
+Enum
+Name(openmp_target) Type(int)
+
+EnumValue
+Enum(openmp_target) String(default) Value(OMP_TARGET_MODE_DEFAULT)
+
+EnumValue
+Enum(openmp_target) String(acc) Value(OMP_TARGET_MODE_OMPACC)
+
fopt-info
Common Var(flag_opt_info) Optimization
Enable all optimization info dumps on stderr.
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 503b1abcefd..19064deb622 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -704,6 +704,7 @@ main (int argc, char **argv)
/* Scan the argument vector. */
bool fopenmp = false;
+ bool fopenmp_target = false;
bool fopenacc = false;
bool fPIC = false;
bool fpic = false;
@@ -723,6 +724,9 @@ main (int argc, char **argv)
#undef STR
else if (strcmp (argv[i], "-fopenmp") == 0)
fopenmp = true;
+ else if (strncmp (argv[i], "-fopenmp-target=",
+ strlen ("-fopenmp-target=")) == 0)
+ fopenmp_target = true;
else if (strcmp (argv[i], "-fopenacc") == 0)
fopenacc = true;
else if (strcmp (argv[i], "-fPIC") == 0)
@@ -752,6 +756,15 @@ main (int argc, char **argv)
if (!(fopenacc ^ fopenmp))
fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> "
"must be set");
+ if (fopenmp_target)
+ {
+ if (fopenacc)
+ fatal_error (input_location, "%<-fopenacc%> not compatible with "
+ "%<-fopenmp-target=%>");
+ if (!fopenmp)
+ fatal_error (input_location, "%<-fopenmp-target=%> requires "
+ "%<-fopenmp%>");
+ }
struct obstack argv_obstack;
obstack_init (&argv_obstack);
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 3fc86c17bad..ed2ec0e3282 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -50,6 +50,7 @@ extern unsigned int ptx_version_to_number (enum ptx_version, bool);
extern void nvptx_expand_oacc_fork (unsigned);
extern void nvptx_expand_oacc_join (unsigned);
extern void nvptx_expand_call (rtx, rtx);
+extern void nvptx_expand_omp_get_num_threads (rtx);
extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind);
extern rtx nvptx_expand_compare (rtx);
extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
@@ -63,5 +64,6 @@ extern const char *nvptx_output_red_partition (rtx, rtx);
extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
extern bool nvptx_mem_local_p (rtx);
extern bool nvptx_mem_maybe_shared_p (const_rtx);
+extern bool nvptx_mem_shared_p (const_rtx);
#endif
#endif
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 3e52354bd12..9f77071a384 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -175,6 +175,9 @@ static unsigned gang_private_shared_align;
static GTY(()) rtx gang_private_shared_sym;
static hash_map<tree_decl_hash, unsigned int> gang_private_shared_hmap;
+static GTY(()) rtx omp_num_threads_sym;
+static unsigned omp_num_threads_align;
+
/* Global lock variable, needed for 128bit worker & gang reductions. */
static GTY(()) tree global_lock_var;
@@ -184,6 +187,9 @@ static bool need_softstack_decl;
/* True if any function references __nvptx_uni. */
static bool need_unisimt_decl;
+/* True if any function references __nvptx_omp_num_threads. */
+static bool need_omp_num_threads;
+
static int nvptx_mach_max_workers ();
/* Allocate a new, cleared machine_function structure. */
@@ -391,6 +397,10 @@ nvptx_option_override (void)
SET_SYMBOL_DATA_AREA (gang_private_shared_sym, DATA_AREA_SHARED);
gang_private_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+ omp_num_threads_sym = gen_rtx_SYMBOL_REF (Pmode, "__nvptx_omp_num_threads");
+ SET_SYMBOL_DATA_AREA (omp_num_threads_sym, DATA_AREA_SHARED);
+ omp_num_threads_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -959,7 +969,8 @@ write_as_kernel (tree attrs)
{
return (lookup_attribute ("kernel", attrs) != NULL_TREE
|| (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE
- && lookup_attribute ("oacc function", attrs) != NULL_TREE));
+ && (lookup_attribute ("oacc function", attrs) != NULL_TREE
+ || lookup_attribute ("ompacc", attrs) != NULL_TREE)));
/* For OpenMP target regions, the corresponding kernel entry is emitted from
write_omp_entry as a separate function. */
}
@@ -1493,6 +1504,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
DECL_ATTRIBUTES (decl)))
force_public = true;
if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
+ && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
{
char *buf = (char *) alloca (strlen (name) + sizeof ("$impl"));
@@ -1546,7 +1558,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
HOST_WIDE_INT sz = get_frame_size ();
bool need_frameptr = sz || cfun->machine->has_chain;
int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT;
- if (!TARGET_SOFT_STACK)
+ if (!TARGET_SOFT_STACK || lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)))
{
/* Declare a local var for outgoing varargs. */
if (cfun->machine->has_variadic)
@@ -1617,6 +1629,45 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
nvptx_init_unisimt_predicate (file);
if (cfun->machine->bcast_partition || cfun->machine->sync_bar)
nvptx_init_oacc_workers (file);
+
+ if (offloading_function_p ((tree) decl)
+ && lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
+ && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl)))
+ {
+ int nthr_regno = REGNO (cfun->machine->omp_fn_entry_num_threads_reg);
+ if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)))
+ {
+ fprintf (file, "\t{\n");
+ if (cfun->machine->omp_parallel_predicate)
+ {
+ /* Borrow num-threads regno as temp register. */
+ fprintf (file, "\t\tmov.u32 %%r%d, %%tid.x;\n", nthr_regno);
+ fprintf (file, "\t\tsetp.ne.u32 %%r%d, %%r%d, 0;\n",
+ REGNO (cfun->machine->omp_parallel_predicate), nthr_regno);
+ }
+ fprintf (file, "\t\tmov.u32 %%r%d, 1;\n", nthr_regno);
+ fprintf (file, "\t\tst.shared.u32 [__nvptx_omp_num_threads], %%r%d;\n", nthr_regno);
+ fprintf (file, "\t}\n");
+ need_omp_num_threads = true;
+ }
+ else
+ {
+ fprintf (file, "\t\tld.shared.u32 %%r%d, [__nvptx_omp_num_threads];\n", nthr_regno);
+ if (cfun->machine->omp_parallel_predicate)
+ {
+ fprintf (file, "\t{\n");
+ fprintf (file, "\t\t.reg.u32 %%tmp1;\n");
+ fprintf (file, "\t\t.reg.pred %%not_parallel_mode, %%v1_lane;\n");
+ fprintf (file, "\t\tsetp.eq.u32 %%not_parallel_mode, %%r%d, 1;\n", nthr_regno);
+ fprintf (file, "\t\tmov.u32 %%tmp1, %%tid.x;\n");
+ fprintf (file, "\t\tsetp.ne.u32 %%v1_lane, %%tmp1, 0;\n");
+ fprintf (file, "\t\tand.pred %%r%d, %%not_parallel_mode, %%v1_lane;\n",
+ REGNO (cfun->machine->omp_parallel_predicate));
+ fprintf (file, "\t}\n");
+ need_omp_num_threads = true;
+ }
+ }
+ }
}
/* Output code for switching uniform-simt state. ENTERING indicates whether
@@ -1734,6 +1785,10 @@ nvptx_output_simt_exit (rtx src)
const char *
nvptx_output_set_softstack (unsigned src_regno)
{
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ return "";
if (cfun->machine->has_softstack && !crtl->is_leaf)
{
fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ",
@@ -1852,20 +1907,29 @@ nvptx_expand_call (rtx retval, rtx address)
if (DECL_STATIC_CHAIN (decl))
cfun->machine->has_chain = true;
- tree attr = oacc_get_fn_attrib (decl);
- if (attr)
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
{
- tree dims = TREE_VALUE (attr);
-
- parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1;
- for (int ix = 0; ix != GOMP_DIM_MAX; ix++)
+ if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
+ && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl)))
+ parallel = GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+ }
+ else
+ {
+ tree attr = oacc_get_fn_attrib (decl);
+ if (attr)
{
- if (TREE_PURPOSE (dims)
- && !integer_zerop (TREE_PURPOSE (dims)))
- break;
- /* Not on this axis. */
- parallel ^= GOMP_DIM_MASK (ix);
- dims = TREE_CHAIN (dims);
+ tree dims = TREE_VALUE (attr);
+
+ parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1;
+ for (int ix = 0; ix != GOMP_DIM_MAX; ix++)
+ {
+ if (TREE_PURPOSE (dims)
+ && !integer_zerop (TREE_PURPOSE (dims)))
+ break;
+ /* Not on this axis. */
+ parallel ^= GOMP_DIM_MASK (ix);
+ dims = TREE_CHAIN (dims);
+ }
}
}
}
@@ -1928,15 +1992,27 @@ nvptx_expand_compare (rtx compare)
void
nvptx_expand_oacc_fork (unsigned mode)
{
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ mode = GOMP_DIM_VECTOR;
nvptx_emit_forking (GOMP_DIM_MASK (mode), false);
}
void
nvptx_expand_oacc_join (unsigned mode)
{
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ mode = GOMP_DIM_VECTOR;
nvptx_emit_joining (GOMP_DIM_MASK (mode), false);
}
+void
+nvptx_expand_omp_get_num_threads (rtx target)
+{
+ rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+ emit_insn (gen_rtx_SET (target, mem));
+ need_omp_num_threads = true;
+}
+
/* Generate instruction(s) to unpack a 64 bit object into 2 32 bit
objects. */
@@ -2870,6 +2946,13 @@ nvptx_mem_maybe_shared_p (const_rtx x)
return area == DATA_AREA_SHARED || area == DATA_AREA_GENERIC;
}
+bool
+nvptx_mem_shared_p (const_rtx x)
+{
+ nvptx_data_area area = nvptx_mem_data_area (x);
+ return area == DATA_AREA_SHARED;
+}
+
/* Print an operand, X, to FILE, with an optional modifier in CODE.
Meaning of CODE:
@@ -3474,6 +3557,11 @@ init_axis_dim (void)
static int ATTRIBUTE_UNUSED
nvptx_mach_max_workers ()
{
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ return 1;
+
if (!cfun->machine->axis_dim_init_p)
init_axis_dim ();
return cfun->machine->axis_dim[MACH_MAX_WORKERS];
@@ -3482,6 +3570,11 @@ nvptx_mach_max_workers ()
static int ATTRIBUTE_UNUSED
nvptx_mach_vector_length ()
{
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ return 32;
+
if (!cfun->machine->axis_dim_init_p)
init_axis_dim ();
return cfun->machine->axis_dim[MACH_VECTOR_LENGTH];
@@ -4864,11 +4957,27 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
rtx_insn *tail = BB_END (to);
unsigned skip_mask = mask;
+ rtx_insn *join = NULL;
+ rtx_insn *fork = NULL;
+
while (true)
{
/* Find first insn of from block. */
- while (head != BB_END (from) && !needs_neutering_p (head))
- head = NEXT_INSN (head);
+ while (true)
+ {
+ if (INSN_P (head)
+ && recog_memoized (head) == CODE_FOR_nvptx_join)
+ {
+ /* Record join if we see it. */
+ gcc_assert (!join);
+ join = head;
+ }
+
+ if (head != BB_END (from) && !needs_neutering_p (head))
+ head = NEXT_INSN (head);
+ else
+ break;
+ }
if (from == to)
break;
@@ -4886,8 +4995,46 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
/* Find last insn of to block */
rtx_insn *limit = from == to ? head : BB_HEAD (to);
- while (tail != limit && !INSN_P (tail) && !LABEL_P (tail))
- tail = PREV_INSN (tail);
+ while (true)
+ {
+ if (INSN_P (tail)
+ && recog_memoized (tail) == CODE_FOR_nvptx_fork)
+ {
+ /* Record join if we see it. */
+ gcc_assert (!fork);
+ fork = tail;
+ }
+
+ if (tail != limit && !INSN_P (tail) && !LABEL_P (tail))
+ tail = PREV_INSN (tail);
+ else
+ break;
+ }
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ if (join
+ /* We do not set/restore parallel state across function calls. */
+ && !(INTVAL (XVECEXP (PATTERN (join), 0, 0)) & (1 << GOMP_DIM_MAX)))
+ {
+ rtx reg = cfun->machine->omp_fn_entry_num_threads_reg;
+ rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+ emit_insn_before (gen_nvptx_omp_parallel_join (mem, reg), head);
+ need_omp_num_threads = true;
+ head = PREV_INSN (head);
+ }
+
+ if (fork
+ /* We do not set/restore parallel state across function calls. */
+ && !(INTVAL (XVECEXP (PATTERN (fork), 0, 0)) & (1 << GOMP_DIM_MAX)))
+ {
+ rtx reg = gen_reg_rtx (SImode);
+ rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+ emit_insn_before (gen_get_ntid (reg), tail);
+ emit_insn_before (gen_nvptx_omp_parallel_fork (mem, reg), tail);
+ need_omp_num_threads = true;
+ }
+ }
/* Detect if tail is a branch. */
rtx tail_branch = NULL_RTX;
@@ -4934,16 +5081,31 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
if (GOMP_DIM_MASK (mode) & skip_mask)
{
rtx_code_label *label = gen_label_rtx ();
- rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
rtx_insn **mode_jump
= mode == GOMP_DIM_VECTOR ? &vector_jump : &worker_jump;
rtx_insn **mode_label
= mode == GOMP_DIM_VECTOR ? &vector_label : &worker_label;
- if (!pred)
+ rtx pred;
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && mode == GOMP_DIM_VECTOR)
+ {
+ pred = cfun->machine->omp_parallel_predicate;
+ if (!pred)
+ {
+ pred = gen_reg_rtx (BImode);
+ cfun->machine->omp_parallel_predicate = pred;
+ }
+ }
+ else
{
- pred = gen_reg_rtx (BImode);
- cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
+ pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
+ if (!pred)
+ {
+ pred = gen_reg_rtx (BImode);
+ cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
+ }
}
rtx br;
@@ -5058,7 +5220,38 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
rtx tmp = gen_reg_rtx (BImode);
emit_insn_before (gen_movbi (tmp, const0_rtx),
bb_first_real_insn (from));
- emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+
+ if(flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ rtx nthr = cfun->machine->omp_fn_entry_num_threads_reg;
+ rtx single_p = gen_reg_rtx (BImode);
+
+ rtx_code_label *lbl_copy_tmp_pvar = gen_label_rtx ();
+ LABEL_NUSES (lbl_copy_tmp_pvar) = 1;
+
+ rtx_insn *lbl_fallthru = NEXT_INSN (tail);
+ gcc_assert (lbl_fallthru);
+ if (!LABEL_P (lbl_fallthru))
+ {
+ rtx_code_label *nlbl = gen_label_rtx ();
+ LABEL_NUSES (nlbl) = 1;
+ emit_label_before (nlbl, lbl_fallthru);
+ lbl_fallthru = nlbl;
+ }
+ emit_insn_before
+ (gen_rtx_SET (single_p,
+ gen_rtx_EQ (BImode, nthr, GEN_INT (1))),
+ label);
+ emit_insn_before
+ (gen_br_true (single_p, lbl_copy_tmp_pvar), label);
+ emit_jump_insn_before (copy_rtx (tail_branch), label);
+ emit_insn_before (gen_jump (lbl_fallthru), label);
+ emit_label_before (lbl_copy_tmp_pvar, label);
+ emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+ }
+ else
+ emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+
emit_insn_before (gen_rtx_SET (pvar, tmp), tail);
#endif
emit_insn_before (nvptx_gen_warp_bcast (pvar), tail);
@@ -5817,10 +6010,29 @@ nvptx_reorg (void)
delete pars;
}
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && offloading_function_p (current_function_decl)
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl))
+ && !lookup_attribute ("ompacc seq",
+ DECL_ATTRIBUTES (current_function_decl)))
+ {
+ cfun->machine->omp_fn_entry_num_threads_reg = gen_reg_rtx (SImode);
+
+ /* Discover & process partitioned regions. */
+ parallel *pars = nvptx_discover_pars (&bb_insn_map);
+ nvptx_process_pars (pars);
+ nvptx_neuter_pars (pars, GOMP_DIM_MASK (GOMP_DIM_VECTOR), 0);
+ delete pars;
+ }
+
/* Replace subregs. */
nvptx_reorg_subreg ();
- if (TARGET_UNIFORM_SIMT)
+ if (TARGET_UNIFORM_SIMT
+ && (flag_openmp_target != OMP_TARGET_MODE_OMPACC
+ || !lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl))))
nvptx_reorg_uniform_simt ();
#if WORKAROUND_PTXJIT_BUG_2
@@ -6071,6 +6283,12 @@ nvptx_file_end (void)
write_var_marker (asm_out_file, false, true, "__nvptx_uni");
fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n");
}
+ if (need_omp_num_threads)
+ {
+ write_var_marker (asm_out_file, false, true, "__nvptx_omp_num_threads");
+ fprintf (asm_out_file,
+ ".extern .shared .u32 __nvptx_omp_num_threads;\n");
+ }
}
/* Expander for the shuffle builtins. */
@@ -6758,6 +6976,9 @@ nvptx_goacc_fork_join (gcall *call, const int dims[],
tree arg = gimple_call_arg (call, 2);
unsigned axis = TREE_INT_CST_LOW (arg);
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ return true;
+
/* We only care about worker and vector partitioning. */
if (axis < GOMP_DIM_WORKER)
return false;
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 74f4a68924c..cadc6fb4ab1 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -235,6 +235,9 @@ struct GTY(()) machine_function
for per-lane storage in OpenMP SIMD regions. */
unsigned HOST_WIDE_INT simt_stack_size;
unsigned HOST_WIDE_INT simt_stack_align;
+
+ rtx omp_parallel_predicate;
+ rtx omp_fn_entry_num_threads_reg;
};
#endif
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4c32a20176a..872f4341899 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -78,6 +78,14 @@
UNSPECV_SIMT_EXIT
UNSPECV_RED_PART
+
+ UNSPECV_GET_TID
+ UNSPECV_GET_NTID
+ UNSPECV_GET_CTAID
+ UNSPECV_GET_NCTAID
+
+ UNSPECV_OMP_PARALLEL_FORK
+ UNSPECV_OMP_PARALLEL_JOIN
])
(define_attr "subregs_ok" "false,true"
@@ -121,6 +129,12 @@
: immediate_operand (op, mode));
})
+(define_predicate "nvptx_shared_mem_operand"
+ (match_code "mem")
+{
+ return nvptx_mem_shared_p (op);
+})
+
(define_predicate "const0_operand"
(and (match_code "const_int")
(match_test "op == const0_rtx")))
@@ -1771,6 +1785,60 @@
return asms[INTVAL (operands[1])];
})
+(define_expand "gomp_barrier"
+ [(const_int 1)]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+{
+ emit_insn (gen_nvptx_barsync (GEN_INT (0), GEN_INT (0)));
+ DONE;
+})
+
+(define_expand "omp_get_num_threads"
+ [(match_operand 0 "nvptx_register_operand" "=R")]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+{
+ nvptx_expand_omp_get_num_threads (operands[0]);
+ DONE;
+})
+
+(define_insn "omp_get_num_teams"
+ [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+ (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NCTAID))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tmov.u32\\t%0, %%nctaid.x;")
+
+(define_insn "omp_get_thread_num"
+ [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+ (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_TID))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tmov.u32\\t%0, %%tid.x;")
+
+(define_insn "omp_get_team_num"
+ [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+ (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_CTAID))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tmov.u32\\t%0, %%ctaid.x;")
+
+(define_insn "get_ntid"
+ [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+ (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NTID))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tmov.u32\\t%0, %%ntid.x;")
+
+(define_insn "nvptx_omp_parallel_fork"
+ [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m")
+ (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")]
+ UNSPECV_OMP_PARALLEL_FORK))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tst.shared.u32\\t%0, %1; //omp parallel fork")
+
+(define_insn "nvptx_omp_parallel_join"
+ [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m")
+ (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")]
+ UNSPECV_OMP_PARALLEL_JOIN))]
+ "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+ "%.\\tst.shared.u32\\t%0, %1; //omp parallel join")
+
(define_insn "nvptx_fork"
[(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
UNSPECV_FORK)]
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d4414e242cb..ec7a4f82137 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -11296,7 +11296,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
/* Allow accel compiler to handle variables that require special
treatment, e.g. if they have been modified in some way earlier in
compilation by the adjust_private_decl OpenACC hook. */
- if (flag_openacc && targetm.goacc.expand_var_decl)
+ if ((flag_openacc || flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ && targetm.goacc.expand_var_decl)
{
temp = targetm.goacc.expand_var_decl (exp);
if (temp)
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5062f59bc8f..283b1ddfcba 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -522,6 +522,12 @@ enum omp_target_simd_clone_device_kind
OMP_TARGET_SIMD_CLONE_ANY = 3
};
+enum omp_target_mode_kind
+{
+ OMP_TARGET_MODE_DEFAULT = 0,
+ OMP_TARGET_MODE_OMPACC = 1
+};
+
#endif
#endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 87d83ec39f1..b4b70b43db9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -260,6 +260,7 @@ struct gimplify_omp_ctx
bool order_concurrent;
bool has_depend;
bool in_for_exprs;
+ bool ompacc;
int defaultmap[5];
};
@@ -13210,6 +13211,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
case OMP_CLAUSE_FINALIZE:
break;
+ case OMP_CLAUSE__OMPACC_:
+ ctx->ompacc = true;
+ break;
+
case OMP_CLAUSE_ORDER:
ctx->order_concurrent = true;
break;
@@ -14711,6 +14716,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
case OMP_CLAUSE_INCLUSIVE:
case OMP_CLAUSE_EXCLUSIVE:
case OMP_CLAUSE_USES_ALLOCATORS:
+ case OMP_CLAUSE__OMPACC_:
break;
case OMP_CLAUSE_NOHOST:
@@ -15434,6 +15440,21 @@ gimplify_omp_loop_xform (tree *expr_p, gimple_seq *pre_p)
return GS_ALL_DONE;
}
+/* Return true if in an omp_context in OMPACC mode. */
+static bool
+gimplify_omp_ctx_ompacc_p (void)
+{
+ if (cgraph_node::get (current_function_decl)->offloadable
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ return true;
+ struct gimplify_omp_ctx *ctx;
+ for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context)
+ if (ctx->ompacc)
+ return true;
+ return false;
+}
+
/* Gimplify the gross structure of an OMP_FOR statement. */
static enum gimplify_status
@@ -15465,6 +15486,18 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
*expr_p = NULL_TREE;
return GS_ERROR;
}
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && gimplify_omp_ctx_ompacc_p ())
+ {
+ gcc_assert (inner_for_stmt && TREE_CODE (for_stmt) == OMP_DISTRIBUTE);
+ *expr_p = OMP_FOR_BODY (for_stmt);
+ tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_GANG);
+ OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (inner_for_stmt);
+ OMP_FOR_CLAUSES (inner_for_stmt) = c;
+ return GS_OK;
+ }
+
gcc_assert (inner_for_stmt == *data[3]);
omp_maybe_apply_loop_xforms (data[3],
data[2]
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 02579951569..c356698a1f9 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -738,6 +738,7 @@ append_compiler_options (obstack *argv_obstack, vec<cl_decoded_option> opts)
case OPT_fcommon:
case OPT_fgnu_tm:
case OPT_fopenmp:
+ case OPT_fopenmp_target_:
case OPT_fopenacc:
case OPT_fopenacc_dim_:
case OPT_foffload_abi_:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f9ce137d0b4..65393ab3210 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -71,9 +71,9 @@ DEF_GOACC_BUILTIN_ONLY (BUILT_IN_GOACC_SINGLE_COPY_END, "GOACC_single_copy_end",
DEF_GOMP_BUILTIN (BUILT_IN_OMP_IS_INITIAL_DEVICE, "omp_is_initial_device",
BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_THREAD_NUM, "omp_get_thread_num",
- BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+ BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_THREADS, "omp_get_num_threads",
- BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+ BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_TEAM_NUM, "omp_get_team_num",
BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_TEAMS, "omp_get_num_teams",
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 3f5acca95ec..102f1e988d5 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -1047,11 +1047,16 @@ remove_exit_barrier (struct omp_region *region)
from within current function (this would be easy to check)
or from some function it calls and gets passed an address
of such a variable. */
+ gomp_parallel *parallel_stmt
+ = as_a <gomp_parallel *> (last_nondebug_stmt (region->entry));
+ tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt);
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && child_fun == NULL_TREE)
+ any_addressable_vars = 0;
+
if (any_addressable_vars < 0)
{
- gomp_parallel *parallel_stmt
- = as_a <gomp_parallel *> (last_nondebug_stmt (region->entry));
- tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt);
tree local_decls, block, decl;
unsigned ix;
@@ -7732,6 +7737,17 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
/* The SSA parallelizer does gang parallelism. */
gwv = build_int_cst (integer_type_node, GOMP_DIM_MASK (GOMP_DIM_GANG));
}
+ else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ tree clauses = gimple_omp_for_clauses (for_stmt);
+ int omp_mask = 0;
+ if (omp_find_clause (clauses, OMP_CLAUSE_GANG))
+ omp_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG);
+ if (omp_find_clause (clauses, OMP_CLAUSE_VECTOR))
+ omp_mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+ gcc_assert (omp_mask);
+ gwv = build_int_cst (integer_type_node, omp_mask);
+ }
if (fd->collapse > 1 || fd->tiling)
{
@@ -9759,6 +9775,13 @@ get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c);
else
t = integer_minus_one_node;
+
+ /* Currently, OMPACC mode has a limitation of only one warp thread. */
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && lookup_attribute
+ ("ompacc", DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt_stmt))))
+ t = integer_one_node;
+
push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
GOMP_TARGET_ARG_THREAD_LIMIT, t,
&args);
@@ -10656,6 +10679,44 @@ expand_omp (struct omp_region *region)
switch (region->type)
{
case GIMPLE_OMP_PARALLEL:
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ struct omp_region *r;
+ for (r = region->outer; r; r = r->outer)
+ if (r->type == GIMPLE_OMP_TARGET)
+ {
+ gomp_target *tgt
+ = as_a <gomp_target *> (last_nondebug_stmt (r->entry));
+ tree tgtfn_attrs
+ = DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt));
+ if (!lookup_attribute ("ompacc", tgtfn_attrs))
+ r = NULL;
+ break;
+ }
+ if (r != NULL
+ || (lookup_attribute
+ ("ompacc", DECL_ATTRIBUTES (current_function_decl))))
+ {
+ gimple_stmt_iterator gsi;
+ gsi = gsi_last_nondebug_bb (region->entry);
+ gcc_assert (!gsi_end_p (gsi)
+ && gimple_code
+ (gsi_stmt (gsi)) == GIMPLE_OMP_PARALLEL);
+ gsi_remove (&gsi, true);
+
+ if (region->exit)
+ {
+ gsi = gsi_last_nondebug_bb (region->exit);
+ gcc_assert (!gsi_end_p (gsi)
+ && gimple_code
+ (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+ gsi_remove (&gsi, true);
+ }
+ break;
+ }
+ }
+ /* Fallthrough. */
+
case GIMPLE_OMP_TASK:
expand_omp_taskreg (region);
break;
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index d7b09eae5ff..190130e16a3 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -213,8 +213,12 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
struct omp_for_data_loop dummy_loop;
location_t loc = gimple_location (for_stmt);
bool simd = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_SIMD;
- bool distribute = gimple_omp_for_kind (for_stmt)
- == GF_OMP_FOR_KIND_DISTRIBUTE;
+ bool distribute =
+ (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_DISTRIBUTE
+ || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP
+ && omp_find_clause (gimple_omp_for_clauses (for_stmt),
+ OMP_CLAUSE_GANG)));
bool taskloop = gimple_omp_for_kind (for_stmt)
== GF_OMP_FOR_KIND_TASKLOOP;
bool order_reproducible = false;
@@ -453,7 +457,8 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
loop->n2 = gimple_omp_for_final (for_stmt, i);
gcc_assert (loop->cond_code != NE_EXPR
|| (gimple_omp_for_kind (for_stmt)
- != GF_OMP_FOR_KIND_OACC_LOOP));
+ != GF_OMP_FOR_KIND_OACC_LOOP)
+ || flag_openmp_target == OMP_TARGET_MODE_OMPACC);
if (TREE_CODE (loop->n2) == TREE_VEC)
{
if (loop->outer)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 3f6d97f88f4..6152750b5b8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -181,6 +181,10 @@ struct omp_context
than teams is strictly nested in it. */
bool nonteams_nested_p;
+ /* Indicates that context is in OMPACC mode, set after _ompacc_ internal
+ clauses are removed. */
+ bool ompacc_p;
+
/* Candidates for adjusting OpenACC privatization level. */
vec<tree> oacc_privatization_candidates;
};
@@ -1957,6 +1961,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_TASK_REDUCTION:
case OMP_CLAUSE_ALLOCATE:
case OMP_CLAUSE_USES_ALLOCATORS:
+ case OMP_CLAUSE__OMPACC_:
break;
case OMP_CLAUSE_ALIGNED:
@@ -2176,6 +2181,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FILTER:
case OMP_CLAUSE__CONDTEMP_:
case OMP_CLAUSE_USES_ALLOCATORS:
+ case OMP_CLAUSE__OMPACC_:
break;
case OMP_CLAUSE__CACHE_:
@@ -2245,6 +2251,21 @@ omp_maybe_offloaded_ctx (omp_context *ctx)
return false;
}
+static bool
+ompacc_ctx_p (omp_context *ctx)
+{
+ if (cgraph_node::get (current_function_decl)->offloadable
+ && lookup_attribute ("ompacc",
+ DECL_ATTRIBUTES (current_function_decl)))
+ return true;
+ for (; ctx; ctx = ctx->outer)
+ if (is_gimple_omp_offloaded (ctx->stmt))
+ return (ctx->ompacc_p
+ || omp_find_clause (gimple_omp_target_clauses (ctx->stmt),
+ OMP_CLAUSE__OMPACC_));
+ return false;
+}
+
/* Build a decl for the omp child function. It'll not contain a body
yet, just the bare decl. */
@@ -2550,8 +2571,28 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx)
DECL_NAMELESS (name) = 1;
TYPE_NAME (ctx->record_type) = name;
TYPE_ARTIFICIAL (ctx->record_type) = 1;
- create_omp_child_function (ctx, false);
- gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && ompacc_ctx_p (ctx))
+ {
+ tree data_name = get_identifier (".omp_data_i_par");
+ tree t = build_decl (gimple_location (stmt), VAR_DECL, data_name,
+ ptr_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_NAMELESS (t) = 1;
+ DECL_CONTEXT (t) = current_function_decl;
+ DECL_SEEN_IN_BIND_EXPR_P (t) = 1;
+ DECL_CHAIN (t) = ctx->block_vars;
+ ctx->block_vars = t;
+ TREE_USED (t) = 1;
+ TREE_READONLY (t) = 1;
+ ctx->receiver_decl = t;
+ }
+ else
+ {
+ create_omp_child_function (ctx, false);
+ gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+ }
scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx);
scan_omp (gimple_omp_body_ptr (stmt), ctx);
@@ -3382,6 +3423,24 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
scan_sharing_clauses (clauses, ctx);
scan_omp (gimple_omp_body_ptr (stmt), ctx);
+ if (offloaded && flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ for (tree *cp = gimple_omp_target_clauses_ptr (stmt); *cp;
+ cp = &OMP_CLAUSE_CHAIN (*cp))
+ if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE__OMPACC_)
+ {
+ DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt))
+ = tree_cons (get_identifier ("ompacc"), NULL_TREE,
+ DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt)));
+ /* Unlink and remove. */
+ *cp = OMP_CLAUSE_CHAIN (*cp);
+
+ /* Set to true. */
+ ctx->ompacc_p = true;
+ break;
+ }
+ }
+
if (TYPE_FIELDS (ctx->record_type) == NULL)
ctx->record_type = ctx->receiver_decl = NULL;
else
@@ -8612,6 +8671,9 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
gcc_unreachable ();
else if (is_oacc_kernels_decomposed_part (tgt))
;
+ else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && is_omp_target (tgt->stmt))
+ ;
else
gcc_unreachable ();
@@ -8629,7 +8691,13 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
gcc_assert (!(tag & OLF_AUTO));
}
- if (tag & OLF_TILE)
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && gimple_code (ctx->stmt) == GIMPLE_OMP_PARALLEL
+ && tgt
+ && ompacc_ctx_p (tgt))
+ levels = 1;
+ else
+ if (tag & OLF_TILE)
/* Tiling could use all 3 levels. */
levels = 3;
else
@@ -11893,6 +11961,23 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
push_gimplify_context ();
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))
+ {
+ enum omp_clause_code code = OMP_CLAUSE_ERROR;
+ if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR)
+ code = OMP_CLAUSE_VECTOR;
+ else if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_DISTRIBUTE)
+ code = OMP_CLAUSE_GANG;
+ if (code)
+ {
+ /* Adjust into OACC loop kind with vector/gang clause. */
+ gimple_omp_for_set_kind (stmt, GF_OMP_FOR_KIND_OACC_LOOP);
+ tree c = build_omp_clause (UNKNOWN_LOCATION, code);
+ OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (stmt);
+ gimple_omp_for_set_clauses (stmt, c);
+ }
+ }
+
if (is_gimple_omp_oacc (ctx->stmt))
oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
@@ -11914,7 +11999,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
gbind *inner_bind
= as_a <gbind *> (gimple_seq_first_stmt (omp_for_body));
tree vars = gimple_bind_vars (inner_bind);
- if (is_gimple_omp_oacc (ctx->stmt))
+ if (is_gimple_omp_oacc (ctx->stmt)
+ || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && ompacc_ctx_p (ctx)))
oacc_privatization_scan_decl_chain (ctx, vars);
gimple_bind_append_vars (new_stmt, vars);
/* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't
@@ -12030,7 +12117,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
lower_omp (gimple_omp_body_ptr (stmt), ctx);
gcall *private_marker = NULL;
- if (is_gimple_omp_oacc (ctx->stmt)
+ if ((is_gimple_omp_oacc (ctx->stmt)
+ || (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)))
&& !gimple_seq_empty_p (omp_for_body))
private_marker = lower_oacc_private_marker (ctx);
@@ -12085,11 +12173,13 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
/* Once lowered, extract the bounds and clauses. */
omp_extract_for_data (stmt, &fd, NULL);
- if (is_gimple_omp_oacc (ctx->stmt)
- && !ctx_in_oacc_kernels_region (ctx))
- lower_oacc_head_tail (gimple_location (stmt),
- gimple_omp_for_clauses (stmt), private_marker,
- &oacc_head, &oacc_tail, ctx);
+ if (flag_openacc)
+ {
+ if (is_gimple_omp_oacc (ctx->stmt) && !ctx_in_oacc_kernels_region (ctx))
+ lower_oacc_head_tail (gimple_location (stmt),
+ gimple_omp_for_clauses (stmt), private_marker,
+ &oacc_head, &oacc_tail, ctx);
+ }
/* Add OpenACC partitioning and reduction markers just before the loop. */
if (oacc_head)
@@ -12873,9 +12963,20 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
else
bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
+
+ gimple_seq oacc_head = NULL, oacc_tail = NULL;
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && gimple_code (stmt) == GIMPLE_OMP_PARALLEL
+ && ompacc_ctx_p (ctx))
+ lower_oacc_head_tail (gimple_location (stmt), clauses,
+ NULL, &oacc_head, &oacc_tail,
+ ctx);
+
gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
gimple_bind_add_seq (bind, ilist);
+ gimple_bind_add_seq (bind, oacc_head);
gimple_bind_add_stmt (bind, stmt);
+ gimple_bind_add_seq (bind, oacc_tail);
gimple_bind_add_seq (bind, olist);
pop_gimplify_context (NULL);
@@ -14731,7 +14832,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
gimple_seq fork_seq = NULL;
gimple_seq join_seq = NULL;
- if (offloaded && is_gimple_omp_oacc (ctx->stmt))
+ if (offloaded && (is_gimple_omp_oacc (ctx->stmt)
+ || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && ompacc_ctx_p (ctx))))
{
/* If there are reductions on the offloaded region itself, treat
them as a dummy GANG loop. */
@@ -14854,6 +14957,22 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
lower_omp (gimple_omp_body_ptr (teams_stmt), ctx);
lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist,
NULL, ctx);
+
+ if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))
+ {
+ /* Forward the team/gang-wide variables to outer target region. */
+ struct omp_context *tgt = ctx;
+ while (tgt && !is_gimple_omp_offloaded (tgt->stmt))
+ tgt = tgt->outer;
+ if (tgt)
+ {
+ int i;
+ tree decl;
+ FOR_EACH_VEC_ELT (ctx->oacc_privatization_candidates, i, decl)
+ tgt->oacc_privatization_candidates.safe_push (decl);
+ }
+ }
+
gimple_seq_add_stmt (&bind_body, teams_stmt);
gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt));
@@ -15021,7 +15140,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
ctx);
break;
case GIMPLE_BIND:
- if (ctx && is_gimple_omp_oacc (ctx->stmt))
+ if (ctx && (is_gimple_omp_oacc (ctx->stmt)
+ || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+ && ompacc_ctx_p (ctx))))
{
tree vars = gimple_bind_vars (as_a <gbind *> (stmt));
oacc_privatization_scan_decl_chain (ctx, vars);
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 6c652387a07..6371177d054 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -391,6 +391,268 @@ omp_discover_implicit_declare_target (void)
lang_hooks.decls.omp_finish_decl_inits ();
}
+static bool ompacc_supported_clauses_p (tree clauses)
+{
+ for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ switch (OMP_CLAUSE_CODE (c))
+ {
+ case OMP_CLAUSE_COLLAPSE:
+ case OMP_CLAUSE_NOWAIT:
+ continue;
+ default:
+ return false;
+ }
+ return true;
+}
+
+struct target_region_data
+{
+ tree func_decl;
+ bool has_omp_for;
+ bool has_omp_parallel;
+ bool ompacc_invalid;
+ auto_vec<const char *> warning_msgs;
+ auto_vec<location_t> warning_locs;
+ target_region_data (void)
+ : func_decl (NULL_TREE),
+ has_omp_for (false), has_omp_parallel (false), ompacc_invalid (false),
+ warning_msgs (), warning_locs () {}
+};
+
+static tree scan_omp_target_region_r (tree *, int *, void *);
+
+static void
+scan_fndecl_for_ompacc (tree decl, target_region_data *tgtdata)
+{
+ target_region_data td;
+ td.func_decl = decl;
+ walk_tree_without_duplicates (&DECL_SAVED_TREE (decl),
+ scan_omp_target_region_r, &td);
+ tree v;
+ if ((v = lookup_attribute ("omp declare variant base",
+ DECL_ATTRIBUTES (decl)))
+ || (v = lookup_attribute ("omp declare variant variant",
+ DECL_ATTRIBUTES (decl))))
+ {
+ td.ompacc_invalid = true;
+ td.warning_msgs.safe_push ("declare variant not supported for OMPACC");
+ td.warning_locs.safe_push (EXPR_LOCATION (v));
+ }
+ if (tgtdata)
+ {
+ tgtdata->has_omp_for |= td.has_omp_for;
+ tgtdata->has_omp_parallel |= td.has_omp_parallel;
+ tgtdata->ompacc_invalid |= td.ompacc_invalid;
+ for (unsigned i = 0; i < td.warning_msgs.length (); i++)
+ tgtdata->warning_msgs.safe_push (td.warning_msgs[i]);
+ for (unsigned i = 0; i < td.warning_locs.length (); i++)
+ tgtdata->warning_locs.safe_push (td.warning_locs[i]);
+ }
+
+ if (!td.ompacc_invalid
+ && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)))
+ {
+ DECL_ATTRIBUTES (decl)
+ = tree_cons (get_identifier ("ompacc"), NULL_TREE,
+ DECL_ATTRIBUTES (decl));
+ if (!td.has_omp_parallel)
+ DECL_ATTRIBUTES (decl)
+ = tree_cons (get_identifier ("ompacc seq"), NULL_TREE,
+ DECL_ATTRIBUTES (decl));
+ }
+}
+
+static tree
+scan_omp_target_region_r (tree *tp, int *walk_subtrees, void *data)
+{
+ target_region_data *tgtdata = (target_region_data *) data;
+
+ if (TREE_CODE (*tp) == FUNCTION_DECL
+ && !(fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_THREAD_NUM)
+ || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_THREADS)
+ || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_TEAM_NUM)
+ || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_TEAMS)
+ || id_equal (DECL_NAME (*tp), "omp_get_thread_num")
+ || id_equal (DECL_NAME (*tp), "omp_get_num_threads")
+ || id_equal (DECL_NAME (*tp), "omp_get_team_num")
+ || id_equal (DECL_NAME (*tp), "omp_get_num_teams"))
+ && *tp != tgtdata->func_decl)
+ {
+ tree decl = *tp;
+ symtab_node *node = symtab_node::get (*tp);
+ if (node)
+ {
+ node = node->ultimate_alias_target ();
+ decl = node->decl;
+ }
+
+ if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
+ {
+ scan_fndecl_for_ompacc (decl, tgtdata);
+ }
+ else
+ {
+ tgtdata->warning_msgs.safe_push ("referencing external function");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ tgtdata->ompacc_invalid = true;
+ }
+ *walk_subtrees = 0;
+ return NULL_TREE;
+ }
+
+ switch (TREE_CODE (*tp))
+ {
+ case OMP_FOR:
+ if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+ {
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("clauses not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ }
+ else if (OMP_FOR_NON_RECTANGULAR (*tp))
+ {
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("non-rectangular loops not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ }
+ else
+ tgtdata->has_omp_for = true;
+ break;
+
+ case OMP_PARALLEL:
+ if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+ {
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("clauses not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ }
+ else
+ tgtdata->has_omp_parallel = true;
+ break;
+
+ case OMP_DISTRIBUTE:
+ case OMP_TEAMS:
+ if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+ {
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("clauses not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ }
+ /* Fallthru. */
+
+ case OMP_ATOMIC:
+ case OMP_ATOMIC_READ:
+ case OMP_ATOMIC_CAPTURE_OLD:
+ case OMP_ATOMIC_CAPTURE_NEW:
+ break;
+
+ case OMP_SIMD:
+ case OMP_TASK:
+ case OMP_LOOP:
+ case OMP_TASKLOOP:
+ case OMP_TASKGROUP:
+ case OMP_SECTION:
+ case OMP_MASTER:
+ case OMP_MASKED:
+ case OMP_ORDERED:
+ case OMP_CRITICAL:
+ case OMP_SCAN:
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("construct not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ *walk_subtrees = 0;
+ break;
+
+ case OMP_TARGET:
+ tgtdata->ompacc_invalid = true;
+ tgtdata->warning_msgs.safe_push ("nested target/reverse offload "
+ "not supported");
+ tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+ *walk_subtrees = 0;
+ break;
+
+ default:
+ break;
+ }
+ return NULL_TREE;
+}
+
+static tree
+scan_omp_target_construct_r (tree *tp, int *walk_subtrees,
+ void *data)
+{
+ if (TREE_CODE (*tp) == OMP_TARGET)
+ {
+ target_region_data td;
+ td.func_decl = (tree) data;
+ walk_tree_without_duplicates (&OMP_TARGET_BODY (*tp),
+ scan_omp_target_region_r, &td);
+ for (tree c = OMP_TARGET_CLAUSES (*tp); c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ switch (OMP_CLAUSE_CODE (c))
+ {
+ case OMP_CLAUSE_MAP:
+ continue;
+ default:
+ td.ompacc_invalid = true;
+ td.warning_msgs.safe_push ("clause not supported");
+ td.warning_locs.safe_push (EXPR_LOCATION (c));
+ break;
+ }
+ break;
+ }
+ if (!td.ompacc_invalid)
+ {
+ tree c = build_omp_clause (EXPR_LOCATION (*tp), OMP_CLAUSE__OMPACC_);
+ if (!td.has_omp_parallel)
+ OMP_CLAUSE__OMPACC__SEQ (c) = 1;
+ OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (*tp);
+ OMP_TARGET_CLAUSES (*tp) = c;
+ }
+ else
+ {
+ warning_at (EXPR_LOCATION (*tp), 0, "Target region not suitable for "
+ "OMPACC mode");
+ for (unsigned i = 0; i < td.warning_locs.length (); i++)
+ warning_at (td.warning_locs[i], 0, td.warning_msgs[i]);
+ }
+ *walk_subtrees = 0;
+ }
+ return NULL_TREE;
+}
+
+void
+omp_ompacc_attribute_tagging (void)
+{
+ cgraph_node *node;
+ FOR_EACH_DEFINED_FUNCTION (node)
+ if (DECL_SAVED_TREE (node->decl))
+ {
+ if (DECL_STRUCT_FUNCTION (node->decl)
+ && DECL_STRUCT_FUNCTION (node->decl)->has_omp_target)
+ walk_tree_without_duplicates (&DECL_SAVED_TREE (node->decl),
+ scan_omp_target_construct_r,
+ node->decl);
+
+ for (cgraph_node *cgn = first_nested_function (node);
+ cgn; cgn = next_nested_function (cgn))
+ if (omp_declare_target_fn_p (cgn->decl))
+ {
+ scan_fndecl_for_ompacc (cgn->decl, NULL);
+
+ if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (cgn->decl))
+ && !lookup_attribute ("noinline", DECL_ATTRIBUTES (cgn->decl)))
+ {
+ DECL_ATTRIBUTES (cgn->decl)
+ = tree_cons (get_identifier ("noinline"),
+ NULL, DECL_ATTRIBUTES (cgn->decl));
+ DECL_ATTRIBUTES (cgn->decl)
+ = tree_cons (get_identifier ("noipa"),
+ NULL, DECL_ATTRIBUTES (cgn->decl));
+ }
+ }
+ }
+}
/* Create new symbols containing (address, size) pairs for global variables,
marked with "omp declare target" attribute, as well as addresses for the
@@ -509,6 +771,22 @@ omp_finish_file (void)
static tree
oacc_dim_call (bool pos, int dim, gimple_seq *seq)
{
+ if (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+ {
+ enum built_in_function fn;
+ if (dim == GOMP_DIM_VECTOR)
+ fn = pos ? BUILT_IN_OMP_GET_THREAD_NUM : BUILT_IN_OMP_GET_NUM_THREADS;
+ else if (dim == GOMP_DIM_GANG)
+ fn = pos ? BUILT_IN_OMP_GET_TEAM_NUM : BUILT_IN_OMP_GET_NUM_TEAMS;
+ else
+ gcc_unreachable ();
+ tree size = create_tmp_var (integer_type_node);
+ gimple *call = gimple_build_call (builtin_decl_explicit (fn), 0);
+ gimple_call_set_lhs (call, size);
+ gimple_seq_add_stmt (seq, call);
+ return size;
+ }
+
tree arg = build_int_cst (unsigned_type_node, dim);
tree size = create_tmp_var (integer_type_node);
enum internal_fn fn = pos ? IFN_GOACC_DIM_POS : IFN_GOACC_DIM_SIZE;
@@ -2252,15 +2530,19 @@ execute_oacc_loop_designation ()
static unsigned int
execute_oacc_device_lower ()
{
- tree attrs = oacc_get_fn_attrib (current_function_decl);
+ tree attrs;
+ int dims[GOMP_DIM_MAX];
- if (!attrs)
- /* Not an offloaded function. */
- return 0;
+ if (flag_openacc)
+ {
+ attrs = oacc_get_fn_attrib (current_function_decl);
+ if (!attrs)
+ /* Not an offloaded function. */
+ return 0;
- int dims[GOMP_DIM_MAX];
- for (unsigned i = 0; i < GOMP_DIM_MAX; i++)
- dims[i] = oacc_get_fn_dim_size (current_function_decl, i);
+ for (unsigned i = 0; i < GOMP_DIM_MAX; i++)
+ dims[i] = oacc_get_fn_dim_size (current_function_decl, i);
+ }
hash_map<tree, tree> adjusted_vars;
@@ -2329,7 +2611,8 @@ execute_oacc_device_lower ()
case IFN_UNIQUE_OACC_FORK:
case IFN_UNIQUE_OACC_JOIN:
- if (integer_minus_onep (gimple_call_arg (call, 2)))
+ if (flag_openacc
+ && integer_minus_onep (gimple_call_arg (call, 2)))
remove = true;
else if (!targetm.goacc.fork_join
(call, dims, kind == IFN_UNIQUE_OACC_FORK))
@@ -2616,7 +2899,8 @@ public:
{}
/* opt_pass methods: */
- bool gate (function *) final override { return flag_openacc; };
+ bool gate (function *) final override
+ { return flag_openacc || (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC); };
unsigned int execute (function *) final override
{
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index d972bb7eafd..92d0231d04d 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -32,5 +32,6 @@ extern GTY(()) vec<tree, va_gc> *offload_ind_funcs;
extern void omp_finish_file (void);
extern void omp_discover_implicit_declare_target (void);
+extern void omp_ompacc_attribute_tagging (void);
#endif /* GCC_OMP_DEVICE_H */
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 3333600e0ea..badd1f3e445 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1461,6 +1461,14 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
" %<-fstrict-flex-arrays%> is not present");
}
+ if (opts_set->x_flag_openmp_target)
+ {
+ if (opts->x_flag_openacc)
+ error ("%<-fopenacc%> not compatible with %<-fopenmp-target=%>");
+ if (!opts->x_flag_openmp)
+ error ("%<-fopenmp-target=%> requires %<-fopenmp%> setting");
+ }
+
diagnose_options (opts, opts_set, loc);
}
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 74efb0a70c1..2b5f1202a33 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -68,6 +68,11 @@ DEF_TARGET_INSN (oacc_dim_pos, (rtx x0, rtx x1))
DEF_TARGET_INSN (oacc_dim_size, (rtx x0, rtx x1))
DEF_TARGET_INSN (oacc_fork, (rtx x0, rtx x1, rtx x2))
DEF_TARGET_INSN (oacc_join, (rtx x0, rtx x1, rtx x2))
+DEF_TARGET_INSN (gomp_barrier, (void))
+DEF_TARGET_INSN (omp_get_thread_num, (rtx x0))
+DEF_TARGET_INSN (omp_get_num_threads, (rtx x0))
+DEF_TARGET_INSN (omp_get_team_num, (rtx x0))
+DEF_TARGET_INSN (omp_get_num_teams, (rtx x0))
DEF_TARGET_INSN (omp_simt_enter, (rtx x0, rtx x1, rtx x2))
DEF_TARGET_INSN (omp_simt_exit, (rtx x0))
DEF_TARGET_INSN (omp_simt_lane, (rtx x0))
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 749341e7782..1ca18257316 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -515,6 +515,10 @@ enum omp_clause_code {
loop or not. */
OMP_CLAUSE__SIMT_,
+ /* Internally used only clause, flag whether this is an "ompacc"
+ target region or not. */
+ OMP_CLAUSE__OMPACC_,
+
/* OpenACC clause: independent. */
OMP_CLAUSE_INDEPENDENT,
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 4e5f3be7676..b13c036e2b0 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -1518,6 +1518,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
case OMP_CLAUSE_BIND:
case OMP_CLAUSE__CONDTEMP_:
case OMP_CLAUSE__SCANTEMP_:
+ case OMP_CLAUSE__OMPACC_:
break;
/* The following clause belongs to the OpenACC cache directive, which
@@ -2303,6 +2304,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
case OMP_CLAUSE_BIND:
case OMP_CLAUSE__CONDTEMP_:
case OMP_CLAUSE__SCANTEMP_:
+ case OMP_CLAUSE__OMPACC_:
break;
/* The following clause belongs to the OpenACC cache directive, which
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index f7439e2f597..f7be9347de5 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -1386,6 +1386,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
pp_string (pp, "_simt_");
break;
+ case OMP_CLAUSE__OMPACC_:
+ pp_string (pp, "_ompacc_");
+ if (OMP_CLAUSE__OMPACC__SEQ (clause))
+ pp_string (pp, "(seq)");
+ break;
+
case OMP_CLAUSE_GANG:
pp_string (pp, "gang");
if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE)
diff --git a/gcc/tree.cc b/gcc/tree.cc
index e234d4a936a..c8cf45e3fc1 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -321,6 +321,7 @@ unsigned const char omp_clause_num_ops[] =
1, /* OMP_CLAUSE_SIZES */
1, /* OMP_CLAUSE__SIMDUID_ */
0, /* OMP_CLAUSE__SIMT_ */
+ 0, /* OMP_CLAUSE__OMPACC_ */
0, /* OMP_CLAUSE_INDEPENDENT */
1, /* OMP_CLAUSE_WORKER */
1, /* OMP_CLAUSE_VECTOR */
@@ -418,6 +419,7 @@ const char * const omp_clause_code_name[] =
"sizes",
"_simduid_",
"_simt_",
+ "_ompacc_",
"independent",
"worker",
"vector",
diff --git a/gcc/tree.h b/gcc/tree.h
index aacdbc8b078..7dfdc289f14 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2025,6 +2025,9 @@ class auto_suppress_location_wrappers
#define OMP_CLAUSE__SIMDUID__DECL(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__SIMDUID_), 0)
+#define OMP_CLAUSE__OMPACC__SEQ(NODE) \
+ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__OMPACC_)->base.public_flag)
+
#define OMP_CLAUSE_SCHEDULE_KIND(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->omp_clause.subcode.schedule_kind)
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index d5361917a24..82dec209f6e 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -34,6 +34,9 @@
struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
int __gomp_team_num __attribute__((shared,nocommon));
+/* Number of active target threads in team, used in ACC mode. */
+unsigned int __nvptx_omp_num_threads __attribute__((shared,nocommon));
+
static void gomp_thread_start (struct gomp_thread_pool *);
extern void build_indirect_map (void);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-17.c b/libgomp/testsuite/libgomp.c-c++-common/for-17.c
new file mode 100644
index 00000000000..9771aaf2ab5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/for-17.c
@@ -0,0 +1,69 @@
+/* { dg-options "-fopenmp-target=acc" } */
+/* { dg-additional-options "-std=gnu99" { target c } } */
+
+#define M(x, y, z) O(x, y, z)
+#define O(x, y, z) x ## _ ## y ## _ ## z
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#undef OMPFROM
+#undef OMPTO
+#define OMPFROM(v) DO_PRAGMA (omp target update from(v))
+#define OMPTO(v) DO_PRAGMA (omp target update to(v))
+
+#pragma omp declare target
+
+#define OMPTGT DO_PRAGMA (omp target)
+#define F parallel for
+#define G pf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+#undef OMPTGT
+
+#pragma omp end declare target
+
+#define F target parallel for
+#define G tpf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+#define F target teams distribute
+#define G ttd
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+#define F target teams distribute parallel for
+#define G ttdpf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+int
+main ()
+{
+ if (test_pf_ompacc ()
+ || test_tpf_ompacc ()
+ || test_ttd_ompacc ()
+ || test_ttdpf_ompacc ())
+ __builtin_abort ();
+ return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-18.c b/libgomp/testsuite/libgomp.c-c++-common/for-18.c
new file mode 100644
index 00000000000..2486d3aa665
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/for-18.c
@@ -0,0 +1,5 @@
+/* { dg-options "-fopenmp-target=acc" } */
+/* { dg-additional-options "-std=gnu99" {target c } } */
+
+#define CONDNE
+#include "for-17.c"
More information about the Gcc-cvs
mailing list