OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

Thomas Schwinge thomas@codesourcery.com
Tue Mar 1 16:46:20 GMT 2022


Hi!

Jakub, need your review/approval here, please:

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown <julian@codesourcery.com> wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>    variables that have "create" (alloc) clauses created for them in the
>>    synthesised outer data region (instead of in the front-end, etc.,
>>    where it can't be done accurately). Such variables actually have
>>    their address taken during transformations made in a later pass
>>    (omp-low, I think), but there's a phase-ordering problem that means
>>    the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]"

> ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
>
>     13073                     else if (is_gimple_reg (var))
>     13074                       {
>     13075                         gcc_assert (offloaded);
>     13076                         tree avar = create_tmp_var (TREE_TYPE (var));
>     13077                         mark_addressable (avar);
>
> ..., which (a) is only implemented for actualy *offloaded* regions (but not
> data regions), and (b) the subsequently synthesized code for writing to and
> later reading back from the temporary fundamentally conflicts with OpenACC
> 'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
> to make work, so let's just avoid this case.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
>     location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
>     this cleanup code in the finally block.  Return the new data region, or
>     the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
>         inner_data_clauses = new_clause;
>
>         prev_mapped_var = v;
> +
> +       /* See <https://gcc.gnu.org/PR100280>.  */
> +       TREE_ADDRESSABLE (v) = 1;
>       }
>      }

So, that's too simple.  ;-) ... and gives rise to workaround patches like
we have on the og11 development branch:
  - "Avoid introducing 'create' mapping clauses for loop index variables in kernels regions",
  - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously",
  - "Fix for is_gimple_reg vars to 'data kernels'"

We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
because that may easily violate GIMPLE invariants, leading to ICEs later.
There are a few open PRs, which my following changes are addressing.  To
make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
discussion of <https://gcc.gnu.org/PR102330>.  (PR102330 turned out to be
unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a
different patch for it.)

I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
above (plus one or two additional ones to be introduced in later
patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
'make_addressable_vars' for later fix-up.

(In reply to Jakub Jelinek from comment #9)
> Whether you can use the same bitmap or need to add another bitmap next to
> task_shared_vars is something hard to guess without diving into it deeply.

Per my understanding of the code, the only place where I had doubts is
'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that
what this is doing is either a no-op in the
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the
original 'task_shared_vars' handling has been.  Either way: I couldn't
come up with a way (test case) that we'd actually run into this case;
you'd have to have the relevant OpenMP constructs inside an OpenACC
'kernels' region, which isn't permitted per
'gcc/omp-low.cc:check_omp_nesting_restrictions'.

OK to proceed in this way?


Grüße
 Thomas


--- gcc/omp-low.cc
+++ gcc/omp-low.cc
@@ -188,7 +188,7 @@ struct omp_context
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
 static int target_nesting_level;
-static bitmap task_shared_vars;
+static bitmap make_addressable_vars;
 static bitmap global_nonaddressable_vars;
 static vec<omp_context *> taskreg_contexts;
 static vec<gomp_task *> task_cpyfns;
@@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
              /* Taking address of OUTER in lower_send_shared_vars
                 might need regimplification of everything that uses the
                 variable.  */
-             if (!task_shared_vars)
-               task_shared_vars = BITMAP_ALLOC (NULL);
-             bitmap_set_bit (task_shared_vars, DECL_UID (outer));
+             if (!make_addressable_vars)
+               make_addressable_vars = BITMAP_ALLOC (NULL);
+             bitmap_set_bit (make_addressable_vars, DECL_UID (outer));
              TREE_ADDRESSABLE (outer) = 1;
            }
          return true;
@@ -601,13 +601,13 @@ omp_copy_decl_2 (tree var, tree name, tree type, omp_context *ctx)
   else
     record_vars (copy);

-  /* If VAR is listed in task_shared_vars, it means it wasn't
-     originally addressable and is just because task needs to take
-     it's address.  But we don't need to take address of privatizations
+  /* If VAR is listed in make_addressable_vars, it wasn't
+     originally addressable, but was only later made so.
+     We don't need to take address of privatizations
      from that var.  */
   if (TREE_ADDRESSABLE (var)
-      && ((task_shared_vars
-          && bitmap_bit_p (task_shared_vars, DECL_UID (var)))
+      && ((make_addressable_vars
+          && bitmap_bit_p (make_addressable_vars, DECL_UID (var)))
          || (global_nonaddressable_vars
              && bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var)))))
     TREE_ADDRESSABLE (copy) = 0;
@@ -1495,6 +1495,21 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
          if (ctx->outer)
            scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer);
          decl = OMP_CLAUSE_DECL (c);
+         /* If requested, make 'decl' addressable.  */
+         if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+             && OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c))
+           {
+             gcc_checking_assert (DECL_P (decl));
+
+             gcc_checking_assert (!TREE_ADDRESSABLE (decl));
+             if (!make_addressable_vars)
+               make_addressable_vars = BITMAP_ALLOC (NULL);
+             bitmap_set_bit (make_addressable_vars, DECL_UID (decl));
+             TREE_ADDRESSABLE (decl) = 1;
+
+             /* Done.  */
+             OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c) = 0;
+           }
          /* Global variables with "omp declare target" attribute
             don't need to be copied, the receiver side will use them
             directly.  However, global variables with "omp declare target link"
@@ -2371,11 +2405,11 @@ finish_taskreg_scan (omp_context *ctx)
   if (ctx->record_type == NULL_TREE)
     return;

-  /* If any task_shared_vars were needed, verify all
+  /* If any make_addressable_vars were needed, verify all
      OMP_CLAUSE_SHARED clauses on GIMPLE_OMP_{PARALLEL,TASK,TEAMS}
      statements if use_pointer_for_field hasn't changed
      because of that.  If it did, update field types now.  */
-  if (task_shared_vars)
+  if (make_addressable_vars)
     {
       tree c;

@@ -2390,7 +2424,7 @@ finish_taskreg_scan (omp_context *ctx)
               the receiver side will use them directly.  */
            if (is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx)))
              continue;
-           if (!bitmap_bit_p (task_shared_vars, DECL_UID (decl))
+           if (!bitmap_bit_p (make_addressable_vars, DECL_UID (decl))
                || !use_pointer_for_field (decl, ctx))
              continue;
            tree field = lookup_field (decl, ctx);
@@ -14040,7 +14074,7 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)

 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
-   of OMP context, but with task_shared_vars set.  */
+   of OMP context, but with make_addressable_vars set.  */

 static tree
 lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
@@ -14054,9 +14088,9 @@ lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
       && DECL_HAS_VALUE_EXPR_P (t))
     return t;

-  if (task_shared_vars
+  if (make_addressable_vars
       && DECL_P (t)
-      && bitmap_bit_p (task_shared_vars, DECL_UID (t)))
+      && bitmap_bit_p (make_addressable_vars, DECL_UID (t)))
     return t;

   /* If a global variable has been privatized, TREE_CONSTANT on
@@ -14141,7 +14175,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (gimple_has_location (stmt))
     input_location = gimple_location (stmt);

-  if (task_shared_vars)
+  if (make_addressable_vars)
     memset (&wi, '\0', sizeof (wi));

   /* If we have issued syntax errors, avoid doing any heavy lifting.
@@ -14158,7 +14192,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GIMPLE_COND:
       {
        gcond *cond_stmt = as_a <gcond *> (stmt);
-       if ((ctx || task_shared_vars)
+       if ((ctx || make_addressable_vars)
            && (walk_tree (gimple_cond_lhs_ptr (cond_stmt),
                           lower_omp_regimplify_p,
                           ctx ? NULL : &wi, NULL)
@@ -14250,7 +14284,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       lower_omp_critical (gsi_p, ctx);
       break;
     case GIMPLE_OMP_ATOMIC_LOAD:
-      if ((ctx || task_shared_vars)
+      if ((ctx || make_addressable_vars)
          && walk_tree (gimple_omp_atomic_load_rhs_ptr (
                          as_a <gomp_atomic_load *> (stmt)),
                        lower_omp_regimplify_p, ctx ? NULL : &wi, NULL))
@@ -14371,7 +14405,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)

     default:
     regimplify:
-      if ((ctx || task_shared_vars)
+      if ((ctx || make_addressable_vars)
          && walk_gimple_op (stmt, lower_omp_regimplify_p,
                             ctx ? NULL : &wi))
        {
@@ -14435,10 +14469,10 @@ execute_lower_omp (void)

   if (all_contexts->root)
     {
-      if (task_shared_vars)
+      if (make_addressable_vars)
        push_gimplify_context ();
       lower_omp (&body, NULL);
-      if (task_shared_vars)
+      if (make_addressable_vars)
        pop_gimplify_context (NULL);
     }

@@ -14447,7 +14481,7 @@ execute_lower_omp (void)
       splay_tree_delete (all_contexts);
       all_contexts = NULL;
     }
-  BITMAP_FREE (task_shared_vars);
+  BITMAP_FREE (make_addressable_vars);
   BITMAP_FREE (global_nonaddressable_vars);

   /* If current function is a method, remove artificial dummy VAR_DECL created
--- gcc/omp-oacc-kernels-decompose.cc
+++ gcc/omp-oacc-kernels-decompose.cc
@@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
          prev_mapped_var = v;

          /* See <https://gcc.gnu.org/PR100280>.  */
-         TREE_ADDRESSABLE (v) = 1;
+         if (!TREE_ADDRESSABLE (v))
+           {
+             /* Request that OMP lowering make 'v' addressable.  */
+             OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
+           }
        }
     }

--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -1145,6 +1145,9 @@ struct GTY(()) tree_base {
        PREDICT_EXPR_OUTCOME in
           PREDICT_EXPR

+       OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE in
+          OMP_CLAUSE
+
    static_flag:

        TREE_STATIC in
--- gcc/tree.h
+++ gcc/tree.h
@@ -1695,6 +1695,11 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag)

+/* Flag that 'OMP_CLAUSE_DECL (NODE)' is to be made addressable during OMP
+   lowering.  */
+#define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag)
+
 /* True on an OMP_CLAUSE_USE_DEVICE_PTR with an OpenACC 'if_present'
    clause.  */
 #define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955


More information about the Gcc-patches mailing list