OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]
Thomas Schwinge
thomas@codesourcery.com
Tue Mar 1 16:46:20 GMT 2022
Hi!
Jakub, need your review/approval here, please:
On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown <julian@codesourcery.com> wrote:
>> - The "addressable" bit is set during the kernels conversion pass for
>> variables that have "create" (alloc) clauses created for them in the
>> synthesised outer data region (instead of in the front-end, etc.,
>> where it can't be done accurately). Such variables actually have
>> their address taken during transformations made in a later pass
>> (omp-low, I think), but there's a phase-ordering problem that means
>> the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287". (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.) I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]"
> ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
>
> 13073 else if (is_gimple_reg (var))
> 13074 {
> 13075 gcc_assert (offloaded);
> 13076 tree avar = create_tmp_var (TREE_TYPE (var));
> 13077 mark_addressable (avar);
>
> ..., which (a) is only implemented for actualy *offloaded* regions (but not
> data regions), and (b) the subsequently synthesized code for writing to and
> later reading back from the temporary fundamentally conflicts with OpenACC
> 'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial
> to make work, so let's just avoid this case.
> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple *body)
>
> /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> - variable. If INNER_CLEANUP is present, add a try-finally statement with
> + variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> + If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block. Return the new data region, or
> the original BODY if no data region was needed. */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> + /* See <https://gcc.gnu.org/PR100280>. */
> + TREE_ADDRESSABLE (v) = 1;
> }
> }
So, that's too simple. ;-) ... and gives rise to workaround patches like
we have on the og11 development branch:
- "Avoid introducing 'create' mapping clauses for loop index variables in kernels regions",
- "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously",
- "Fix for is_gimple_reg vars to 'data kernels'"
We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
because that may easily violate GIMPLE invariants, leading to ICEs later.
There are a few open PRs, which my following changes are addressing. To
make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
discussion of <https://gcc.gnu.org/PR102330>. (PR102330 turned out to be
unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a
different patch for it.)
I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
above (plus one or two additional ones to be introduced in later
patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
'make_addressable_vars' for later fix-up.
(In reply to Jakub Jelinek from comment #9)
> Whether you can use the same bitmap or need to add another bitmap next to
> task_shared_vars is something hard to guess without diving into it deeply.
Per my understanding of the code, the only place where I had doubts is
'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that
what this is doing is either a no-op in the
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the
original 'task_shared_vars' handling has been. Either way: I couldn't
come up with a way (test case) that we'd actually run into this case;
you'd have to have the relevant OpenMP constructs inside an OpenACC
'kernels' region, which isn't permitted per
'gcc/omp-low.cc:check_omp_nesting_restrictions'.
OK to proceed in this way?
Grüße
Thomas
--- gcc/omp-low.cc
+++ gcc/omp-low.cc
@@ -188,7 +188,7 @@ struct omp_context
static splay_tree all_contexts;
static int taskreg_nesting_level;
static int target_nesting_level;
-static bitmap task_shared_vars;
+static bitmap make_addressable_vars;
static bitmap global_nonaddressable_vars;
static vec<omp_context *> taskreg_contexts;
static vec<gomp_task *> task_cpyfns;
@@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
/* Taking address of OUTER in lower_send_shared_vars
might need regimplification of everything that uses the
variable. */
- if (!task_shared_vars)
- task_shared_vars = BITMAP_ALLOC (NULL);
- bitmap_set_bit (task_shared_vars, DECL_UID (outer));
+ if (!make_addressable_vars)
+ make_addressable_vars = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (make_addressable_vars, DECL_UID (outer));
TREE_ADDRESSABLE (outer) = 1;
}
return true;
@@ -601,13 +601,13 @@ omp_copy_decl_2 (tree var, tree name, tree type, omp_context *ctx)
else
record_vars (copy);
- /* If VAR is listed in task_shared_vars, it means it wasn't
- originally addressable and is just because task needs to take
- it's address. But we don't need to take address of privatizations
+ /* If VAR is listed in make_addressable_vars, it wasn't
+ originally addressable, but was only later made so.
+ We don't need to take address of privatizations
from that var. */
if (TREE_ADDRESSABLE (var)
- && ((task_shared_vars
- && bitmap_bit_p (task_shared_vars, DECL_UID (var)))
+ && ((make_addressable_vars
+ && bitmap_bit_p (make_addressable_vars, DECL_UID (var)))
|| (global_nonaddressable_vars
&& bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var)))))
TREE_ADDRESSABLE (copy) = 0;
@@ -1495,6 +1495,21 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
if (ctx->outer)
scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer);
decl = OMP_CLAUSE_DECL (c);
+ /* If requested, make 'decl' addressable. */
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+ && OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c))
+ {
+ gcc_checking_assert (DECL_P (decl));
+
+ gcc_checking_assert (!TREE_ADDRESSABLE (decl));
+ if (!make_addressable_vars)
+ make_addressable_vars = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (make_addressable_vars, DECL_UID (decl));
+ TREE_ADDRESSABLE (decl) = 1;
+
+ /* Done. */
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c) = 0;
+ }
/* Global variables with "omp declare target" attribute
don't need to be copied, the receiver side will use them
directly. However, global variables with "omp declare target link"
@@ -2371,11 +2405,11 @@ finish_taskreg_scan (omp_context *ctx)
if (ctx->record_type == NULL_TREE)
return;
- /* If any task_shared_vars were needed, verify all
+ /* If any make_addressable_vars were needed, verify all
OMP_CLAUSE_SHARED clauses on GIMPLE_OMP_{PARALLEL,TASK,TEAMS}
statements if use_pointer_for_field hasn't changed
because of that. If it did, update field types now. */
- if (task_shared_vars)
+ if (make_addressable_vars)
{
tree c;
@@ -2390,7 +2424,7 @@ finish_taskreg_scan (omp_context *ctx)
the receiver side will use them directly. */
if (is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx)))
continue;
- if (!bitmap_bit_p (task_shared_vars, DECL_UID (decl))
+ if (!bitmap_bit_p (make_addressable_vars, DECL_UID (decl))
|| !use_pointer_for_field (decl, ctx))
continue;
tree field = lookup_field (decl, ctx);
@@ -14040,7 +14074,7 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
/* Callback for lower_omp_1. Return non-NULL if *tp needs to be
regimplified. If DATA is non-NULL, lower_omp_1 is outside
- of OMP context, but with task_shared_vars set. */
+ of OMP context, but with make_addressable_vars set. */
static tree
lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
@@ -14054,9 +14088,9 @@ lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
&& DECL_HAS_VALUE_EXPR_P (t))
return t;
- if (task_shared_vars
+ if (make_addressable_vars
&& DECL_P (t)
- && bitmap_bit_p (task_shared_vars, DECL_UID (t)))
+ && bitmap_bit_p (make_addressable_vars, DECL_UID (t)))
return t;
/* If a global variable has been privatized, TREE_CONSTANT on
@@ -14141,7 +14175,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
if (gimple_has_location (stmt))
input_location = gimple_location (stmt);
- if (task_shared_vars)
+ if (make_addressable_vars)
memset (&wi, '\0', sizeof (wi));
/* If we have issued syntax errors, avoid doing any heavy lifting.
@@ -14158,7 +14192,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
case GIMPLE_COND:
{
gcond *cond_stmt = as_a <gcond *> (stmt);
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& (walk_tree (gimple_cond_lhs_ptr (cond_stmt),
lower_omp_regimplify_p,
ctx ? NULL : &wi, NULL)
@@ -14250,7 +14284,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
lower_omp_critical (gsi_p, ctx);
break;
case GIMPLE_OMP_ATOMIC_LOAD:
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& walk_tree (gimple_omp_atomic_load_rhs_ptr (
as_a <gomp_atomic_load *> (stmt)),
lower_omp_regimplify_p, ctx ? NULL : &wi, NULL))
@@ -14371,7 +14405,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
default:
regimplify:
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& walk_gimple_op (stmt, lower_omp_regimplify_p,
ctx ? NULL : &wi))
{
@@ -14435,10 +14469,10 @@ execute_lower_omp (void)
if (all_contexts->root)
{
- if (task_shared_vars)
+ if (make_addressable_vars)
push_gimplify_context ();
lower_omp (&body, NULL);
- if (task_shared_vars)
+ if (make_addressable_vars)
pop_gimplify_context (NULL);
}
@@ -14447,7 +14481,7 @@ execute_lower_omp (void)
splay_tree_delete (all_contexts);
all_contexts = NULL;
}
- BITMAP_FREE (task_shared_vars);
+ BITMAP_FREE (make_addressable_vars);
BITMAP_FREE (global_nonaddressable_vars);
/* If current function is a method, remove artificial dummy VAR_DECL created
--- gcc/omp-oacc-kernels-decompose.cc
+++ gcc/omp-oacc-kernels-decompose.cc
@@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
prev_mapped_var = v;
/* See <https://gcc.gnu.org/PR100280>. */
- TREE_ADDRESSABLE (v) = 1;
+ if (!TREE_ADDRESSABLE (v))
+ {
+ /* Request that OMP lowering make 'v' addressable. */
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
+ }
}
}
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -1145,6 +1145,9 @@ struct GTY(()) tree_base {
PREDICT_EXPR_OUTCOME in
PREDICT_EXPR
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE in
+ OMP_CLAUSE
+
static_flag:
TREE_STATIC in
--- gcc/tree.h
+++ gcc/tree.h
@@ -1695,6 +1695,11 @@ class auto_suppress_location_wrappers
#define OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag)
+/* Flag that 'OMP_CLAUSE_DECL (NODE)' is to be made addressable during OMP
+ lowering. */
+#define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
+ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag)
+
/* True on an OMP_CLAUSE_USE_DEVICE_PTR with an OpenACC 'if_present'
clause. */
#define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
More information about the Gcc-patches
mailing list