Jakub Jelinek [Tue, 5 Jul 2022 08:01:09 +0000 (10:01 +0200)]
openmp: Add support for firstprivate and allocate clauses on scope construct
OpenMP 5.2 adds support for firstprivate and allocate clauses on the scope
construct and this patch adds that support to GCC.
5.2 unfortunately (IMNSHO mistakenly) marked scope construct as worksharing,
which implies that it isn't possible to nest inside of it other scope,
worksharing loop, sections, explicit barriers, single etc. which would
make scope far less useful. I'm not implementing that part, keeping the
5.1 behavior here, and will file an issue to revert that for OpenMP 6.0.
But, for firstprivate it keeps the restriction that is now implied from
worksharing construct that listed var can't be private in outer context,
where for reduction 5.1 had similar restriction explicit even for scope
and 5.2 has it implicitly through worksharing construct.
2022-05-31 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-low.cc (build_outer_var_ref): For code == OMP_CLAUSE_ALLOCATE
allow var to be private in the outer context.
(lower_private_allocate): Pass OMP_CLAUSE_ALLOCATE as last argument
to build_outer_var_ref.
gcc/c/
* c-parser.cc (OMP_SCOPE_CLAUSE_MASK): Add firstprivate and allocate
clauses.
gcc/cp/
* parser.cc (OMP_SCOPE_CLAUSE_MASK): Add firstprivate and allocate
clauses.
gcc/testsuite/
* c-c++-common/gomp/scope-5.c: New test.
* c-c++-common/gomp/scope-6.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Add firstprivate and allocate clauses
to scope construct.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/allocate-1.c (foo): Add testcase for
scope construct with allocate clause.
* testsuite/libgomp.c-c++-common/allocate-3.c (foo): Likewise.
* testsuite/libgomp.c-c++-common/scope-2.c: New test.
* libgomp.texi (OpenMP 5.2): Mark 'enter' clause as supported.
* testsuite/libgomp.fortran/declare-target-1.f90: Extend to test
explicit 'to' and 'enter' clause.
* testsuite/libgomp.fortran/declare-target-2.f90: Update accordingly.
Jakub Jelinek [Tue, 5 Jul 2022 07:29:23 +0000 (09:29 +0200)]
openmp: Adjust diagnostics about same variable in link and to/enter clauses
On Fri, May 27, 2022 at 04:52:17PM +0200, Tobias Burnus wrote:
> The 'TO'/'ENTER' usage is first stored in a linked list – and
> then as attribute to the symbol. I am not sure how to handle it best.
This reminds me I've left the C/C++ FE diagnostics about mixing link and
to/enter on the same variable in separate directives as is, so it always
talked about mixing link and to clauses.
This patch adjusts it, so that if link is first, it talks about the
clause actually used and if link is later, uses to or enter together
in the wording.
2022-05-28 Jakub Jelinek <jakub@redhat.com>
gcc/c/
* c-parser.cc (c_parser_omp_declare_target): If OMP_CLAUSE_LINK was
seen first, use "%<to%>" or "%<enter%>" depending on
OMP_CLAUSE_ENTER_TO of the current clause, otherwise use
"%<to%> or %<enter%>" wording.
gcc/cp/
* parser.cc (handle_omp_declare_target_clause): If OMP_CLAUSE_LINK was
seen first, use "%<to%>" or "%<enter%>" depending on
OMP_CLAUSE_ENTER_TO of the current clause, otherwise use
"%<to%> or %<enter%>" wording.
gcc/testsuite/
* c-c++-common/gomp/declare-target-2.c: Add further tests for mixing of
link and to/enter clauses on separate directives.
Jakub Jelinek [Tue, 5 Jul 2022 07:27:36 +0000 (09:27 +0200)]
openmp: Add support for enter clause on declare target
OpenMP 5.1 and earlier had 2 different uses of to clause, one for target
update construct with one semantics, and one for declare target directive
with a different semantics.
Under the hood we were using OMP_CLAUSE_TO_DECLARE to represent the latter.
OpenMP 5.2 renamed the declare target clause to to enter, the old one is
kept as a deprecated alias.
As we are far from having full OpenMP 5.2 support, this patch adds support
for the enter clause (and renames OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER
with a flag to tell the spelling of the clause for better diagnostics),
but doesn't deprecate the to clause on declare target just yet (that
should be done as one of the last steps in 5.2 support).
2022-05-27 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TO_DECLARE
to OMP_CLAUSE_ENTER.
* tree.h (OMP_CLAUSE_ENTER_TO): Define.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Rename
OMP_CLAUSE_TO_DECLARE to OMP_CLAUSE_ENTER.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ENTER
instead of OMP_CLAUSE_TO_DECLARE, if OMP_CLAUSE_ENTER_TO, print
"to" instead of "enter".
* tree-nested.cc (convert_nonlocal_omp_clauses,
convert_local_omp_clauses): Handle OMP_CLAUSE_ENTER instead of
OMP_CLAUSE_TO_DECLARE.
gcc/c-family/
* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ENTER.
gcc/c/
* c-parser.cc (c_parser_omp_clause_name): Parse enter clause.
(c_parser_omp_all_clauses): For to clause on declare target, use
OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of
OMP_CLAUSE_TO_DECLARE clause. Handle PRAGMA_OMP_CLAUSE_ENTER.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause.
(c_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of
OMP_CLAUSE_TO_DECLARE.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead
of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause
name in diagnostics instead of
omp_clause_code_name[OMP_CLAUSE_CODE (c)].
gcc/cp/
* parser.cc (cp_parser_omp_clause_name): Parse enter clause.
(cp_parser_omp_all_clauses): For to clause on declare target, use
OMP_CLAUSE_ENTER clause with OMP_CLAUSE_ENTER_TO instead of
OMP_CLAUSE_TO_DECLARE clause. Handle PRAGMA_OMP_CLAUSE_ENTER.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add enter clause.
(cp_parser_omp_declare_target): Use OMP_CLAUSE_ENTER instead of
OMP_CLAUSE_TO_DECLARE.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_ENTER instead
of OMP_CLAUSE_TO_DECLARE, to OMP_CLAUSE_ENTER_TO use "to" as clause
name in diagnostics instead of
omp_clause_code_name[OMP_CLAUSE_CODE (c)].
gcc/testsuite/
* c-c++-common/gomp/clauses-3.c: Add tests with enter clause instead
of to or modify some existing to clauses to enter.
* c-c++-common/gomp/declare-target-1.c: Likewise.
* c-c++-common/gomp/declare-target-2.c: Likewise.
* c-c++-common/gomp/declare-target-3.c: Likewise.
* g++.dg/gomp/attrs-9.C: Likewise.
* g++.dg/gomp/declare-target-1.C: Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/target-40.c: Modify some existing to
clauses to enter.
* testsuite/libgomp.c/target-41.c: Likewise.
Jakub Jelinek [Tue, 5 Jul 2022 07:19:47 +0000 (09:19 +0200)]
libgomp: Fix occassional hangs with taskwait nowait depend
Richi reported occassional hangs with taskwait-depend-nowait-1.*
tests and I've finally manged to reproduce. The problem is if
taskwait depend without nowait is encountered soon after
taskwait depend nowait and the former depends on the latter and there
is no other work to do, the taskwait depend without nowait is put
to sleep, but the empty_task optimization in
gomp_task_run_post_handle_dependers wouldn't wake it up in that
case. gomp_task_run_post_handle_dependers normally does some wakeups
because it schedules more work (another task), which is not the
case of empty_task, but we need to do the wakeups that would be done
upon task completion so that we awake sleeping threads when the
last child is done.
So, the taskwait-depend-nowait-1.* testcase is fixed with the
else if (__builtin_expect (task->parent_depends_on, 0) part of
the patch.
The new testcase can hang on another problem, if the empty task
is the last task of a taskgroup, we need to use atomic store
like elsewhere to decrease the counter to 0, and wake up taskgroup
end if needed.
Yet another spot which can sleep is normal taskwait (without depend),
but I believe nothing needs to be done for that - in that case we
await solely until the children's queue has no tasks, tasks still
waiting for dependencies aren't accounted in that, but the reason
is that if taskwait should wait for something, there needs to be at least
one active child doing something (in the children queue), which then
possibly awakes some of its siblings when the dependencies are met,
or in the empty task case awakes further dependencies, but in any
case the child that finished is still handled as active child and
will awake taskwait at the end if there is nothing further to
do.
Last sleeping case are barriers, but that is handled by ++ret and
awaking the barrier.
2022-05-25 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_task_run_post_handle_dependers): If empty_task
is the last task taskwait depend depends on, wake it up.
Similarly if it is the last child of a taskgroup, use atomic
store instead of decrement and awak taskgroup wait if any.
* testsuite/libgomp.c-c++-common/taskwait-depend-nowait-2.c: New test.
Jakub Jelinek [Tue, 5 Jul 2022 07:11:46 +0000 (09:11 +0200)]
openmp: Add taskwait nowait depend support [PR105378]
This patch adds support for (so far C/C++)
#pragma omp taskwait nowait depend(...)
directive, which is like
#pragma omp task depend(...)
;
but slightly optimized on the library side, so that it creates
the task only for the purpose of dependency tracking and doesn't actually
schedule it and wait for it when the dependencies are satisfied, instead
makes its dependencies satisfied right away.
2022-05-24 Jakub Jelinek <jakub@redhat.com>
PR c/105378
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TASKWAIT_DEPEND_NOWAIT): New
builtin.
* gimplify.cc (gimplify_omp_task): Diagnose taskwait with nowait
clause but no depend clauses.
* omp-expand.cc (expand_taskwait_call): Use
BUILT_IN_GOMP_TASKWAIT_DEPEND_NOWAIT rather than
BUILT_IN_GOMP_TASKWAIT_DEPEND if nowait clause is present.
gcc/c/
* c-parser.cc (OMP_TASKWAIT_CLAUSE_MASK): Add nowait clause.
gcc/cp/
* parser.cc (OMP_TASKWAIT_CLAUSE_MASK): Add nowait clause.
gcc/testsuite/
* c-c++-common/gomp/taskwait-depend-nowait-1.c: New test.
libgomp/
* libgomp_g.h (GOMP_taskwait_depend_nowait): Declare.
* libgomp.map (GOMP_taskwait_depend_nowait): Export at GOMP_5.1.1.
* task.c (empty_task): New function.
(gomp_task_run_post_handle_depend_hash): Declare earlier.
(gomp_task_run_post_handle_depend): Declare.
(GOMP_task): Optimize fn == empty_task if there is nothing to wait
for.
(gomp_task_run_post_handle_dependers): Optimize task->fn == empty_task.
(GOMP_taskwait_depend_nowait): New function.
* testsuite/libgomp.c-c++-common/taskwait-depend-nowait-1.c: New test.
libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async
This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
Jakub Jelinek [Mon, 4 Jul 2022 19:32:48 +0000 (21:32 +0200)]
openmp: Add support for inoutset depend-kind
This patch adds support for inoutset depend-kind in depend
clauses. It is very similar to the in depend-kind in that
a task with a dependency with that depend-kind is dependent
on all previously created sibling tasks with matching address
unless they have the same depend-kind.
In the in depend-kind case everything is dependent except
for in -> in dependency, for inoutset everything is
dependent except for inoutset -> inoutset dependency.
mutexinoutset is also similar (everything is dependent except
for mutexinoutset -> mutexinoutset dependency), but there is
also the additional restriction that only one task with
mutexinoutset for each address can be scheduled at once (i.e.
mutual exclusitivty). For now we support mutexinoutset
the same as inout/out, but the inoutset support is full.
In order not to bump the ABI for dependencies each time
(we've bumped it already once, the old ABI supports only
inout/out and in depend-kind, the new ABI supports
inout/out, mutexinoutset, in and depobj), this patch arranges
for inoutset to be at least for the time being always handled
as if it was specified through depobj even when it is not.
So it uses the new ABI for that and inoutset are represented
like depobj - pointer to a pair of pointers where the first one
will be the actual address of the object mentioned in depend
clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET.
2022-05-17 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum omp_clause_depend_kind): Add
OMP_CLAUSE_DEPEND_INOUTSET.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
* gimplify.cc (gimplify_omp_depend): Likewise.
* omp-low.cc (lower_depend_clauses): Likewise.
gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/c/
* c-parser.cc (c_parser_omp_clause_depend): Parse
inoutset depend-kind.
(c_parser_omp_depobj): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_clause_depend): Parse
inoutset depend-kind.
(cp_parser_omp_depobj): Likewise.
* cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c (boo): Add test with
inoutset depend-kind.
* c-c++-common/gomp/all-memory-2.c (boo): Likewise.
* c-c++-common/gomp/depobj-1.c (f1): Likewise.
(f2): Adjusted expected diagnostics.
* g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics.
include/
* gomp-constants.h (GOMP_DEPEND_INOUTSET): Define.
libgomp/
* libgomp.h (struct gomp_task_depend_entry): Change is_in type
from bool to unsigned char.
* task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where
task->depend[i].is_in && task->depend[i].is_in == ent->is_in
rather than just task->depend[i].is_in && ent->is_in. Remember
whether GOMP_DEPEND_IN loop is needed and guard the loop with that
conditional.
(gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where elem.is_in && elem.is_in == ent->is_in
rather than just elem.is_in && ent->is_in.
* testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with
inoutset depend-kind.
* testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.
* libgomp.texi (OpenMP 5.1): Set omp_all_memory to 'Y'.
* testsuite/libgomp.fortran/depend-5.f90: New test.
* testsuite/libgomp.fortran/depend-6.f90: New test.
* testsuite/libgomp.fortran/depend-7.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/all-memory-1.f90: New test.
* gfortran.dg/gomp/all-memory-2.f90: New test.
* gfortran.dg/gomp/all-memory-3.f90: New test.
* testsuite/libgomp.c++/target-has-device-addr-7.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-8.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-9.C: New test.
Tobias Burnus [Fri, 13 May 2022 18:00:34 +0000 (20:00 +0200)]
OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays
For a non-descriptor array, map(A(n:m)) was mapped as
map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...])
with this patch, it is changed to
map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...])
The latter avoids an alloc - and also avoids the race condition with
nowait in the enclosed testcase. (Note: predantically, the testcase is
invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13].
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
GOMP_MAP_POINTER for the pointer attachment.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
Andrew Stubbs [Mon, 20 Jun 2022 14:51:15 +0000 (15:51 +0100)]
amdgcn: libgomp plugin USM implementation
Implement the Unified Shared Memory API calls in the GCN plugin.
The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware. However, there's
no known way to check what memory region was used, after the fact, so we use a
splay tree to record allocations so we can answer "is_usm_ptr" later.
Andrew Stubbs [Fri, 17 Jun 2022 12:07:11 +0000 (13:07 +0100)]
amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend. The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.
The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.
Andrew Stubbs [Fri, 10 Jun 2022 14:15:49 +0000 (15:15 +0100)]
amdgcn: Support XNACK mode
The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber. When the port supports scheduling
of s_waitcnt instructions there will be further requirements.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
(ASM_SPEC): Use XNACKOPT.
* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
(enum hsaco_attr_type): ... this, and generalize the names.
(TARGET_XNACK): New macro.
* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
Add xnack compatible alternatives.
(gather<mode>_insn_2offsets<exec>): Likewise.
* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
other than Fiji.
(gcn_expand_epilogue): Remove early-clobber problems.
(output_file_start): Emit xnack attributes.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
(sram_ecc_type): Rename to ...
(hsaco_attr_type: ... this.)
* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
(TEST_XNACK): Delete.
(TEST_XNACK_ANY): New macro.
(TEST_XNACK_ON): New macro.
(main): Support the new -mxnack=on/off/any syntax.
Jason Merrill [Wed, 11 May 2022 18:53:26 +0000 (14:53 -0400)]
c++: lambda template in requires [PR105541]
Since the patch for PR103408, the template parameters for the lambda in this
test have level 1 instead of 2, and we were treating null template args as 1
level of arguments, so tsubst_template_parms decided it had nothing to do.
Fixed by distinguishing between <> and no args at all, which is what we have
in our "substitution" in a requires-expression.
PR c++/105541
gcc/cp/ChangeLog:
* cp-tree.h (TMPL_ARGS_DEPTH): 0 for null args.
* parser.cc (cp_parser_enclosed_template_argument_list):
Use 0-length TREE_VEC for <>.
We were wrongly looking up the generic lambda op() in a dependent scope, and
then trying to look up its instantiation at substitution time, but lambdas
aren't instantiated, so we crashed. The fix is to not look into dependent
class scopes.
But this created trouble with wrongly trying to use a template from the
enclosing scope when we aren't actually looking at a template-argument-list,
in template/lookup18.C, so let's avoid that.
Harald Anlauf [Tue, 21 Jun 2022 21:20:18 +0000 (23:20 +0200)]
Fortran: fix simplification of INDEX(str1,str2) [PR105691]
gcc/fortran/ChangeLog:
PR fortran/105691
* simplify.cc (gfc_simplify_index): Replace old simplification
code by the equivalent of the runtime library implementation. Use
HOST_WIDE_INT instead of int for string index, length variables.
gcc/testsuite/ChangeLog:
PR fortran/105691
* gfortran.dg/index_6.f90: New test.
Harald Anlauf [Fri, 24 Jun 2022 20:21:39 +0000 (22:21 +0200)]
Fortran: fix checking of arguments to UNPACK when MASK is a variable [PR105813]
gcc/fortran/ChangeLog:
PR fortran/105813
* check.cc (gfc_check_unpack): Try to simplify MASK argument to
UNPACK so that checking of the VECTOR argument can work when MASK
is a variable.
gcc/testsuite/ChangeLog:
PR fortran/105813
* gfortran.dg/unpack_vector_1.f90: New test.
gcc/
* omp-low.cc (usm_transform): Remove unused function argument.
(This should be a fixup to 54c2d861ac62e30ebf34a4e62ee0d55478a742b9: 'Build
fix for 'openmp: Use libgomp memory allocation functions with unified shared
memory')
Tobias Burnus [Thu, 30 Jun 2022 06:30:48 +0000 (08:30 +0200)]
Build fix for 'openmp: allow requires unified_shared_memory'
OG12 commit fa65fc45972d27f2fd79a44eaba1978348177ee9 added an
error diagnostic (moved around in later commits); this diagnostic
caused bootstrap fails as %<...%> were missing. This commit adds
them.
gcc/c/
* c-parser.cc (c_parser_omp_requires): Add missing %<...%> in error.
gcc/cp/
* parser.cc (cp_parser_omp_requires): Add missing %<...%> in error.
Lulu Cheng [Mon, 27 Jun 2022 08:26:25 +0000 (16:26 +0800)]
LoongArch: Remove undefined behavior from code [PR 106097]
C++2017 and previous standard description:
The value of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are zero-filled. If E1 has an unsigned type,
the value of the result is E1×2E2, reduced modulo one more
than the maximum value representable inthe result type.
Otherwise, if E1 has a signed type and non-negative value,
and E1×2E2 is representablein the corresponding unsigned
type of the result type, then that value, converted to the
result type, is the resulting value; otherwise, the behavior
is undefined.
The value of E1 >> E2 is E1 right-shifted E2 bit positions.
If E1 has an unsigned type or if E1 has a signed type and
a non-negative value, the value of the result is the integral
part of the quotient of E1/2E2. If E1 has a signed type and
a negative value, the resulting value is implementation-defined.
gcc/ChangeLog:
PR target/106097
* config/loongarch/loongarch.cc (loongarch_build_integer):
Remove undefined behavior from code.
Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c
There is an interaction between the commits "OpenACC profiling-interface
fixes for asynchronous operations" and "[OpenACC] Clarify sequencing of
'async' data copying vs. profiling events in
'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'", leading to an
execution test failure and hang in these two tests.
This reverts changes to the tests made by the second patch, allowing the
tests to pass.
The arguments to gfc_build_array_ref were recently updated in the commit
'fortran: Use pointer arithmetic to index arrays [PR102043]', but a call
from gfc_conv_array_ref used the old function signature. This went
unnoticed due to the use of default arguments.
This patch should be merged into 'Fortran: delinearize multi-dimensional
array accesses'.
The ICE occurs during Gimple verification after the ompexp stage because
one of the arguments to the generated builtin call is of a Gimple reg type,
but isn't a Gimple value (because it is marked addressable).
This appears to be fallout from the commit "OpenACC 'kernels' decomposition:
Mark variables used in synthesized data clauses as addressable [PR100280]".
The launch dimensions have been added to the arguments of a builtin call
by oacc_set_fn_attrib, but one of the dimensions has been marked addressable.
Fixed by forcing the added arguments to be re-gimplified.
The final tail mark has no LHS, causing code that assumes its presence to
segfault. The LHS and the assignment appear to have been removed as dead
code by the cddce1 stage.
Fixed by checking for the presence of the LHS before using it.
A change that was present in the OG11 version of
'openmp: in_reduction clause support on target construct' but
not in the mainline version resulted in non-contiguous
arrays being accepted in cache clauses, only to ICE later.
Thomas Schwinge [Thu, 5 May 2022 21:01:36 +0000 (23:01 +0200)]
Refactor '-ldl' handling for libgomp proper and plugins
Instead of implicit global 'LIBS="-ldl $LIBS"' via 'AC_CHECK_LIB', make
'-ldl' explicit for libgomp proper, and clean up 'PLUGIN_GCN_LIBS',
'PLUGIN_NVPTX_LIBS' accordingly.
Jakub Jelinek [Thu, 12 May 2022 06:31:20 +0000 (08:31 +0200)]
openmp: Add omp_all_memory support (C/C++ only so far)
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier
which isn't allowed to be used anywhere but in the depend clause, this is
against how everything else has been handled in OpenMP so far (where
some identifiers could have special meaning in some OpenMP clauses or
pragmas but not elsewhere).
The patch handles it by making it a conditional keyword (for -fopenmp
only) and emitting a better diagnostics when it is used in a primary
expression. Having a nicer diagnostics when e.g. trying to do
int omp_all_memory;
or
int *omp_all_memory[10];
etc. would mean changing too many spots and hooking into name lookups
to reject declaring any such symbols would be too ugly and I'm afraid
there are way too many spots where one can introduce a name
(variables, functions, namespaces, struct, enum, enumerators, template
arguments, ...).
Otherwise, the handling is quite simple, normal depend clauses lower
into addresses of variables being handed over to the library, for
omp_all_memory I'm using NULL pointers. omp_all_memory can only be
used with inout or out depend kinds and means that a task is dependent
on all previously created sibling tasks that have any dependency (of
any depend kind) and that any later created sibling tasks will be
dependent on it if they have any dependency.
2022-05-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr
if null_pointer_node.
(gimplify_scan_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node
as omp_all_memory.
gcc/c-family/
* c-common.h (enum rid): Add RID_OMP_ALL_MEMORY.
* c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr
if null_pointer_node.
gcc/c/
* c-parser.cc (c_parse_init): Register omp_all_memory as keyword
if flag_openmp.
(c_parser_postfix_expression): Diagnose uses of omp_all_memory
in postfix expressions.
(c_parser_omp_variable_list): Handle omp_all_memory in depend
clause.
* c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
gcc/cp/
* lex.cc (init_reswords): Register omp_all_memory as keyword
if flag_openmp.
* parser.cc (cp_parser_primary_expression): Diagnose uses of
omp_all_memory in postfix expressions.
(cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend
clause.
* semantics.cc (finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
* pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c: New test.
* c-c++-common/gomp/all-memory-2.c: New test.
* c-c++-common/gomp/all-memory-3.c: New test.
* g++.dg/gomp/all-memory-1.C: New test.
* g++.dg/gomp/all-memory-2.C: New test.
libgomp/
* libgomp.h (struct gomp_task): Add depend_all_memory member.
* task.c (gomp_init_task): Initialize depend_all_memory.
(gomp_task_handle_depend): Handle omp_all_memory.
(gomp_task_run_post_handle_depend_hash): Clear
parent->depend_all_memory if equal to current task.
(gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory.
* testsuite/libgomp.c-c++-common/depend-1.c: New test.
* testsuite/libgomp.c-c++-common/depend-2.c: New test.
* testsuite/libgomp.c-c++-common/depend-3.c: New test.
With recent commit 2e309a4eff80e55b53d32d26926a2a94eabfea21 "libgomp testsuite:
Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library",
and commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056 "libgomp GCN plugin:
Clean up unused references to system-provided HSA Runtime library", the last
uses of '--with-hsa-runtime' etc. are gone.
Thomas Schwinge [Wed, 6 Apr 2022 10:15:28 +0000 (12:15 +0200)]
libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'
After recent commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056
"libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime
library", these aren't set anymore.
Thomas Schwinge [Wed, 6 Apr 2022 09:31:45 +0000 (11:31 +0200)]
libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.
Originally changed for the libgomp HSA plugin in
commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'. It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.
Thomas Schwinge [Wed, 6 Apr 2022 08:39:56 +0000 (10:39 +0200)]
libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.
# For build-tree testing, also consider the library paths used for builing.
# For installed testing, we assume all that to be provided in the sysroot.
if { $blddir != "" } {
[...]
global hsa_runtime_lib
if { $hsa_runtime_lib != "" } {
append always_ld_library_path ":$hsa_runtime_lib"
}
}
However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing". It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.
Fortran: Add support for OMP non-rectangular loops.
This patch adds support for OMP 5.1 "canonical loop nest form" to the
Fortran front end, marks non-rectangular loops for processing
by the middle end, and implements missing checks in the gimplifier
for additional prohibitions on non-rectangular loops.
Note that the OMP spec also prohibits non-rectangular loops with the TILE
construct; that construct hasn't been implemented yet, so that error will
need to be filled in later.
gcc/fortran/
* gfortran.h (struct gfc_omp_clauses): Add non_rectangular bit.
* openmp.cc (is_outer_iteration_variable): New function.
(expr_is_invariant): New function.
(bound_expr_is_canonical): New function.
(resolve_omp_do): Replace existing non-rectangularity error with
check for canonical form and setting non_rectangular bit.
* trans-openmp.cc (gfc_trans_omp_do): Transfer non_rectangular
flag to generated tree structure.
gcc/
* gimplify.cc (gimplify_omp_for): Update messages for SCHEDULED
and ORDERED clause conflict errors. Add check for GRAINSIZE and
NUM_TASKS on TASKLOOP.
gcc/testsuite/
* c-c++-common/gomp/loop-6.c (f3): New function to test TASKLOOP
diagnostics.
* gfortran.dg/gomp/collapse1.f90: Update expected messages.
* gfortran.dg/gomp/pr85313.f90: Remove dg-error on non-rectangular
loops that are now accepted.
* gfortran.dg/gomp/non-rectangular-loop.f90: New file.
* gfortran.dg/gomp/canonical-loop-1.f90: New file.
* gfortran.dg/gomp/canonical-loop-2.f90: New file.
Chung-Lin Tang [Fri, 17 Jun 2022 14:22:25 +0000 (22:22 +0800)]
openmp: Implement uses_allocators clause
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596412.html
For user defined allocator handles, this allows target regions to assign
memory space and traits to allocators, and automatically calls
omp_init/destroy_allocator() in the beginning/end of the target region.
For pre-defined allocators (i.e. omp_..._mem_alloc names), this is a no-op,
such clauses are not created.
Asides from the front-end portions, the target region transforms are
done in gimplify_omp_workshare.
This patch also includes added changes to enforce the "allocate allocator
must be also in a uses_allocator clause". This is done during
gimplify_scan_omp_clauses.
* c-parser.cc (c_parser_omp_clause_name): Add case for uses_allocators
clause.
(c_parser_omp_clause_uses_allocators): New function.
(c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* c-typeck.cc (c_finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Add case for uses_allocators
clause.
(cp_parser_omp_clause_uses_allocators): New function.
(cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* semantics.cc (finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.
fortran/ChangeLog:
* gfortran.h (struct gfc_omp_namelist): Add memspace_sym, traits_sym
fields.
(OMP_LIST_USES_ALLOCATORS): New list enum.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_USES_ALLOCATORS.
(gfc_match_omp_clause_uses_allocators): New function.
(gfc_match_omp_clauses): Add case to handle OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Add OMP_CLAUSE_USES_ALLOCATORS.
(resolve_omp_clauses): Add "USES_ALLOCATORS" to clause_names[].
* dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_USES_ALLOCATORS.
(show_omp_clauses): Likewise.
* trans-array.cc (gfc_conv_array_initializer): Adjust array index
to always be a created tree expression instead of NULL_TREE when zero.
* trans-openmp.cc (gfc_trans_omp_clauses): For ALLOCATE clause, handle
using gfc_trans_omp_variable for EXPR_VARIABLE exprs.
Add handling of OMP_LIST_USES_ALLOCATORS case.
* types.def (BT_FN_VOID_PTRMODE): Define.
(BT_FN_PTRMODE_PTRMODE_INT_PTR): Define.
* gimplify.cc (gimplify_scan_omp_clauses): Add checking of OpenMP target
region allocate clauses, to require a uses_allocators clause to exist
for allocators.
(gimplify_omp_workshare): Add handling of OMP_CLAUSE_USES_ALLOCATORS
for OpenMP target regions; create calls of omp_init/destroy_allocator
around target region body.
* omp-low.cc (lower_private_allocate): Adjust receiving of allocator.
(lower_rec_input_clauses): Likewise.
(create_task_copyfn): Add dereference for allocator if needed.
* system.h (startswith): New function.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/uses_allocators-1.c: New test.
* c-c++-common/gomp/uses_allocators-2.c: New test.
* c-c++-common/gomp/uses_allocators-3.c: New test.
* gfortran.dg/gomp/allocate-1.f90: Adjust testcase.
* gfortran.dg/gomp/uses_allocators-1.f90: New test.
* gfortran.dg/gomp/uses_allocators-2.f90: New test.
* gfortran.dg/gomp/uses_allocators-3.f90: New test.
Andrew Stubbs [Thu, 24 Feb 2022 17:16:13 +0000 (17:16 +0000)]
amdgcn: Add gfx90a support
This adds architecture options and multilibs for the AMD GFX90a GPUs.
It also tidies up some of the ISA selection code, and corrects a few small
mistake in the gfx908 naming.
gcc/ChangeLog:
* config.gcc (amdgcn): Accept --with-arch=gfx908 and gfx90a.
* config/gcn/gcn-opts.h (enum gcn_isa): New.
(TARGET_GCN3): Use enum gcn_isa.
(TARGET_GCN3_PLUS): Likewise.
(TARGET_GCN5): Likewise.
(TARGET_GCN5_PLUS): Likewise.
(TARGET_CDNA1): New.
(TARGET_CDNA1_PLUS): New.
(TARGET_CDNA2): New.
(TARGET_CDNA2_PLUS): New.
(TARGET_M0_LDS_LIMIT): New.
(TARGET_PACKED_WORK_ITEMS): New.
* config/gcn/gcn.cc (gcn_isa): Change to enum gcn_isa.
(gcn_option_override): Recognise CDNA ISA variants.
(gcn_omp_device_kind_arch_isa): Support gfx90a.
(gcn_expand_prologue): Make m0 init optional.
Add support for packed work items.
(output_file_start): Support gfx90a.
(gcn_hsa_declare_function_name): Support gfx90a metadata.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS):Add __CDNA1__ and
__CDNA2__.
* config/gcn/gcn.md (<su>mulsi3_highpart): Use TARGET_GCN5_PLUS.
(<su>mulsi3_highpart_imm): Likewise.
(<su>mulsidi3): Likewise.
(<su>mulsidi3_imm): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx90a.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90a): New.
(main): Support gfx90a.
* config/gcn/t-gcn-hsa: Add gfx90a multilib.
* config/gcn/t-omp-device: Add gfx90a isa.
Tobias Burnus [Mon, 23 May 2022 08:54:32 +0000 (10:54 +0200)]
OpenMP: Handle descriptors in target's firstprivate [PR104949]
For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.
As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.
gcc/fortran/ChangeLog:
PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.
PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
Tobias Burnus [Thu, 12 May 2022 08:39:58 +0000 (10:39 +0200)]
Fortran: Fix proc pointer as elemental arg handling
The vtab's _callback function calls the elemental 'cb'
cb (var(:)%comp, comp_types_vtable._callback);
which gets called in a scalarization loop as 'var' might be a
nonscalar. Without the patch, that got translated as:
D.1234 = &comp_types_vtable._callback
...
cb (&(*D.4060)[S.3 + D.4071], &D.1234);
where 'D.1234' is function_type. With the patch, it remains a pointer;
i.e. D.1234 = comp... and 'cb (..., D.1234)', avoiding ME ICE.
Note: Fortran (F2018, C15100) requires that dummy arguments are
dummy data objects, which rules out dummy procs/proc-pointer dummies,
which is enforced in resolve_fl_procedure.
Thus, this change only affects the internally generated code.
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_scalar_elemental_arg_saved_as_reference):
Return true for attr.proc_pointer expressions.
gcc/testsuite/ChangeLog:
* gfortran.dg/finalize_38.f90: Compile with -Ofast.
* gfortran.dg/abstract_type_6.f03: Remove dg-error as
now hidden by other errors; copy to ...
* gfortran.dg/abstract_type_6a.f03: ... here; remove
some error to diagnose the error.
* gfortran.dg/finalize_39.f90: New test.
Fortran: Fix finalization resolution with deep copy
Follow-up patch to
"Fortran/OpenMP: Support mapping of DT with allocatable components"
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html
gcc/fortran/ChangeLog:
* resolve.cc (gfc_resolve_finalizers): Also resolve allocatable comps.
Tobias Burnus [Wed, 4 May 2022 16:18:44 +0000 (18:18 +0200)]
OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg
For array-descriptor vars, the descriptor is assigned to a temporary. However,
this failed when the clause's argument was in turn in a data-sharing clause
as the outer context's VALUE_EXPR wasn't used.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
item that is in an outer data-sharing clause.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
Andrew Stubbs [Wed, 13 Apr 2022 15:55:47 +0000 (16:55 +0100)]
openmp: unified_address support
This makes "requires unified_address" work by making it eqivalent to
"requires unified_shared_memory". This is more than is strictly necessary,
but should be standard compliant.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/ChangeLog:
* omp-low.cc: Do USM transformations for "unified_address".
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/usm-4.c: New test.
* gfortran.dg/gomp/usm-4.f90: New test.
Fix a crash due to mismatch of free and GOMP_alloc.
With allocate directive, we replace the malloc calls to GOMP_alloc if
it is associated with the allocate statement. The memory was supposed
to be free-d by the implicitely generated free calls which also get
replaced. But if user explicitely deallocated the memory using the
deallocate statement, it can cause a mismatch. This commit handles
that case and also replaces the free call generated for deallocate
clause.
Also added deallocate in the testcase and tidied it up a bit.
gcc/ChangeLog:
* omp-low.cc (lower_omp_allocate): Move allocate declaration
inside loop. Set it to false at the end of condition.
Andrew Stubbs [Fri, 11 Mar 2022 12:58:38 +0000 (12:58 +0000)]
openmp: -foffload-memory=pinned
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning. It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.
In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.
Backport of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591354.html
gcc/ChangeLog:
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.
libgomp/ChangeLog:
* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map (GOMP_5.1.1): New version space with
GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
openmp: Use libgomp memory allocation functions with unified shared memory.
This patches changes calls to malloc/free/calloc/realloc/aligned_alloc and
operator new to memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit
from the unified shared memory. The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.
We only replace replacable new operator and not the class member or placement new.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591353.html
gcc/ChangeLog:
* omp-low.cc (usm_transform): New function.
(make_pass_usm_transform): Likewise.
(class pass_usm_transform): New.
* passes.def: Add pass_usm_transform.
* tree-pass.h (make_pass_usm_transform): New declaration.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/usm-2.c: New test.
* c-c++-common/gomp/usm-3.c: New test.
* g++.dg/gomp/usm-1.C: New test.
* g++.dg/gomp/usm-2.C: New test.
* g++.dg/gomp/usm-3.C: New test.
* gfortran.dg/gomp/usm-2.f90: New test.
* gfortran.dg/gomp/usm-3.f90: New test.
libgomp/ChangeLog:
* testsuite/libgomp.c/usm-6.c: New test.
* testsuite/libgomp.c++/usm-1.C: Likewise.
Andrew Stubbs [Fri, 11 Mar 2022 12:37:58 +0000 (12:37 +0000)]
openmp, nvptx: ompx_unified_shared_mem_alloc
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).
The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.
Backport of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591352.html
libgomp/ChangeLog:
* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
(linux_memspace_calloc): Handle USM.
(linux_memspace_free): Handle USM.
(linux_memspace_realloc): Handle USM.
* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
ompx_host_mem_alloc.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_realloc): Likewise.
* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
(GOMP_OFFLOAD_usm_free): New prototype.
(GOMP_OFFLOAD_is_usm_ptr): New prototype.
* libgomp.h (gomp_usm_alloc): New prototype.
(gomp_usm_free): New prototype.
(gomp_is_usm_ptr): New prototype.
(struct gomp_device_descr): Add USM functions.
* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
and ompx_host_mem_space.
(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc.
* omp_lib.f90.in: Likewise.
* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
Call cuMemAllocManaged as appropriate.
(GOMP_OFFLOAD_alloc): Move internals to ...
(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
(GOMP_OFFLOAD_usm_alloc): New function.
(GOMP_OFFLOAD_usm_free): New function.
(GOMP_OFFLOAD_is_usm_ptr): New function.
* target.c (gomp_map_vars_internal): Add USM support.
(gomp_usm_alloc): New function.
(gomp_usm_free): New function.
(gomp_load_plugin_for_device): New function.
* testsuite/libgomp.c/usm-1.c: New test.
* testsuite/libgomp.c/usm-2.c: New test.
* testsuite/libgomp.c/usm-3.c: New test.
* testsuite/libgomp.c/usm-4.c: New test.
* testsuite/libgomp.c/usm-5.c: New test.
Andrew Stubbs [Fri, 11 Mar 2022 14:33:11 +0000 (14:33 +0000)]
openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589355.html
libgomp/ChangeLog:
* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
* testsuite/libgomp.c/allocators-4.c (main): Add access trait.
* testsuite/libgomp.c/allocators-6.c (main): Add access trait.
* testsuite/libgomp.c/allocators-7.c: New test.
Andrew Stubbs [Fri, 11 Mar 2022 12:33:06 +0000 (12:33 +0000)]
libgomp, openmp: Add ompx_pinned_mem_alloc
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588951.html
* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
Andrew Stubbs [Fri, 11 Mar 2022 12:12:39 +0000 (12:12 +0000)]
libgomp: pinned memory
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588360.html
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588372.html
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.
gcc/ChangeLog:
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
(scan_omp_1_stmt): Call it.
(lower_omp_allocate): New function.
(lower_omp_1): Call it.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-6.f90: Add tests.
* gfortran.dg/gomp/allocate-7.f90: New test.
* gfortran.dg/gomp/allocate-8.f90: New test.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-2.f90: New test.
Handle cleanup of omp allocated variables (OpenMP 5.0).
Currently we are only handling omp allocate directive that is associated
with an allocate statement. This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive. But the free calls come in a separate cleanup block. To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588370.html
gcc/fortran/ChangeLog:
* gfortran.h (struct access_ref): Declare new members
omp_allocated and omp_allocated_end.
* openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to
NULL.
(prepare_omp_allocated_var_list_for_cleanup): New function.
(gfc_resolve_omp_allocate): Call it.
* trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated.
* trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt
generated for allocate directive.
This is backport of a patch posted in
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590597.html
An allocate clause in target region must specify an allocator
unless the compilation unit has requires construct with
dynamic_allocators clause. Current implementation of the allocate
clause did not check for this restriction. This patch fills that
gap.
gcc/ChangeLog:
* omp-low.cc (omp_maybe_offloaded_ctx): New prototype.
(scan_sharing_clauses): Check a restriction on allocate clause.
Tobias Burnus [Tue, 1 Mar 2022 15:35:08 +0000 (16:35 +0100)]
Fortran/OpenMP: Support mapping of DT with allocatable components
gcc/fortran/ChangeLog:
* class.cc (finalization_scalarizer): Mark syms as artificial.
(generate_callback_wrapper): New.
(gfc_find_derived_vtab): Call it, add _callback comp.
* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
LANG_HOOKS_OMP_DEEP_MAPPING_P,
LANG_HOOKS_OMP_DEEP_MAPPING_CNT): Redeinfe
* gfortran.h (gfc_import_iso_c_binding_module,
GFC_CLASS_CALLBACK_DEFAULT_FLAG, GFC_CLASS_CALLBACK_VTABLE_FLAG,
GFC_CLASS_CB_ALLOCATABLE, GFC_CLASS_CB_POINTER,
GFC_CLASS_CB_PROC_POINTER, GFC_CLASS_CB_VTABLE,
GFC_CLASS_CB_VPTR): New.
* match.cc (select_type_set_tmp): Propagate allocatable property.
* module.cc (MOD_VERSION): Bump due to vtab change.
(import_iso_c_binding_module): New import_all arg.
(gfc_import_iso_c_binding_module): New.
(gfc_use_module): Update call.
* openmp.cc (resolve_omp_clauses): Accept DT with alloc comps.
* resolve.cc (gfc_resolve_formal_arglist, gfc_resolve_intrinsic,
resolve_fl_procedure, resolve_types): Permit some violations
for internal code.
* trans-array.cc (gfc_conv_descriptor_stride_get,
gfc_tree_array_size, gfc_full_array_size): Update
for GFC_TYPE_ARRAY_AKIND change.
(gfc_conv_expr_descriptor): Likewise; permit calling with tree code.
* trans-expr.cc (VTABLE_CALLBACK_FIELD): Add.
(VTAB_GET_FIELD_GEN): Use it.
(VTABLE_DEALLOCATE_FIELD): Undef at the end.
(gfc_conv_expr_reference): Fixes; avoid unneccessary temp var.
* trans-intrinsic.cc (gfc_conv_intrinsic_sizeof,
gfc_conv_associated): Fix class and comp-ref handling.
(conv_isocbinding_function): Remove buggy code.
* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok arg.
(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
gfc_omp_clause_assign_op, gfc_omp_clause_dtor,
(gfc_omp_finish_clause): Update call.
(GFC_MAP_TOKEN_DATA, GFC_MAP_TOKEN_SIZES, GFC_MAP_TOKEN_KINDS,
GFC_MAP_TOKEN_DATA_OFFSET, GFC_MAP_TOKEN_OFFSET,
GFC_MAP_TOKEN_FLAGS, GFC_MAP_TOKEN_DETACH): Define.
(gfc_omp_get_token_data, gfc_omp_get_token_sizes,
gfc_omp_get_token_kinds, gfc_omp_get_token_offset_data,
gfc_omp_get_token_offset, gfc_omp_get_token_flags,
gfc_omp_get_token_detach, gfc_omp_get_map_token_type,
gfc_omp_get_cb_type, gfc_omp_gen_deep_map_fn,
gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
gfc_omp_get_array_size, gfc_omp_elmental_loop,
gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do),
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
(gfc_trans_omp_array_section): Save clause decl to survive gimplifying.
(gfc_trans_omp_clauses): Likewise; fixes.
* trans-types.cc (gfc_build_array_type, gfc_get_derived_type,
gfc_get_array_descr_info): Update array kind to distinguish
different assumed-rank arrays.
* trans.h (gfc_class_vtab_callback_get, gfc_omp_deep_mapping_p,
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New prototypes.
(enum gfc_array_kind): Additional GFC_ARRAY_ASSUMED_RANK_* entries.
gcc/ChangeLog:
* langhooks-def.h (lhd_omp_deep_mapping_p,
lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
LANG_HOOKS_OMP_DEEP_MAPPING): Define.
(LANG_HOOKS_DECLS): Use it.
* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
lhd_omp_deep_mapping): New stubs.
* langhooks.h (struct lang_hooks_for_decls): Add new hooks
* omp-expand.cc (expand_omp_target): Handle dynamic-size
addr/sizes/kinds arrays.
* omp-low.cc (build_sender_ref, fixup_child_record_type,
scan_sharing_clauses, lower_omp_target): Update to handle
new hooks and dynamic-size addr/sizes/kinds arrays.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
Chung-Lin Tang [Thu, 24 Feb 2022 09:07:48 +0000 (01:07 -0800)]
openmp: Handle C/C++ array reference base-pointers in array sections
In cases where a program constructs its own deep-copying for arrays-of-pointers,
e.g:
#pragma omp target enter data map(to:level->vectors[:N])
for (i = 0; i < N; i++)
#pragma omp target enter data map(to:level->vectors[i][:N])
We need to treat the part of the array reference before the array section
as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment
behavior.
This patch adds this inside handle_omp_array_sections(), tracing the whole
sequence of array dimensions, creating a whole base-pointer reference
iteratively using build_array_ref(). The conditions are that each of the
"absorbed" dimensions must be length==1, and the final reference must be
of pointer-type (so that pointer attachment makes sense).