[gomp4] [WIP] OpenACC bind, nohost clauses

Cesar Philippidis cesar@codesourcery.com
Mon Dec 14 20:36:00 GMT 2015


On 12/08/2015 11:55 AM, Thomas Schwinge wrote:

Just for clarification, we're implementing the bind clause with the
semantics defined in OpenACC 2.5, correct? The 2.0a semantics aren't clear.

> On Sat, 14 Nov 2015 09:36:36 +0100, I wrote:
>> Initial support for the OpenACC bind and nohost clauses (routine
>> directive) for C, C++.  Fortran to follow.  Middle end handling and more
>> complete testsuite coverage also to follow once we got a few details
>> clarified.  OK for trunk?
> 
> (Has not yet been reviewed.)  Meanwhile, I continued working on the
> implementation, focussing on C.  See also my question "How to rewrite
> call targets (OpenACC bind clause)",
> <http://news.gmane.org/find-root.php?message_id=%3C877fkq482i.fsf%40hertz.schwinge.homeip.net%3E>.
> 
> To enable Cesar to help with the C++ and Fortran front ends (thanks!), in
> r231423, I just committed "[WIP] OpenACC bind, nohost clauses" to
> gomp-4_0-branch.  (There has already been initial support, parsing only,
> on gomp-4_0-branch.)  I'll try to make progress with the generic middle
> end bits, but will appreciate any review comments, so before inlining the
> complete patch, first a few questions/comments:
> 
> In the OpenACC bind(Y) clause attached to a routine(X) directive, Y can
> be an identifier or a string.  In the front ends, I canonicalize that
> into a string, as we -- at least currently -- don't have any use for the
> identifier (or decl?) later on:
> 
>     --- gcc/tree-core.h
>     +++ gcc/tree-core.h
>     @@ -461,7 +461,7 @@ enum omp_clause_code {
>     -  /* OpenACC clause: bind ( identifer | string ).  */
>     +  /* OpenACC clause: bind (string).  */
>        OMP_CLAUSE_BIND,

So what happens in c++ then? E.g. Say that we have a function sum which
is overloaded as follows:

  int sum (int a, int b) { return a + b; }
  double sum (double a, double b) { return a + b; }

  #pragma acc routine (sum) bind (cuda_sum)

First of all, does this bind apply to both int sum and double sum, or
just the double sum? Second, if the identifier gets canonicalized as a
string, will that prevent the name from being mangled, and hence disable
function overloading?

Also, while I'm asking about c++, is it possible apply bind individually
to an overloaded function. E.g.

 #pragma acc routine (sum) bind (cuda_sum_int)
 int sum (int a, int b) { return a + b; }

 #pragma acc routine (sum) bind (cuda_sum_double)
 double sum (double a, double b) { return a + b; }

> All the following are unreachable for OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST;
> document that to make it obvious/expected:
> 
>     --- gcc/cp/pt.c
>     +++ gcc/cp/pt.c
>     @@ -14501,6 +14501,8 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
>                   }
>               }
>               break;
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     --- gcc/gimplify.c
>     +++ gcc/gimplify.c
>     @@ -7413,6 +7413,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>               ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     @@ -8104,6 +8106,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
>             case OMP_CLAUSE_DEVICE_TYPE:
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     --- gcc/omp-low.c
>     +++ gcc/omp-low.c
>     @@ -2279,6 +2279,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>               sorry ("Clause not supported yet");
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     @@ -2453,6 +2455,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>               sorry ("Clause not supported yet");
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     --- gcc/tree-nested.c
>     +++ gcc/tree-nested.c
>     @@ -1200,6 +1200,8 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
>             case OMP_CLAUSE_SEQ:
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }
>     @@ -1882,6 +1884,8 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
>             case OMP_CLAUSE_SEQ:
>               break;
>      
>     +       case OMP_CLAUSE_BIND:
>     +       case OMP_CLAUSE_NOHOST:
>             default:
>               gcc_unreachable ();
>             }

Those changes look reasonable.

> C front end:
> 
>     --- gcc/c/c-parser.c
>     +++ gcc/c/c-parser.c
>     @@ -11607,6 +11607,8 @@ c_parser_oacc_clause_async (c_parser *parser, tree list)
>      static tree
>      c_parser_oacc_clause_bind (c_parser *parser, tree list)
>      {
>     +  check_no_duplicate_clause (list, OMP_CLAUSE_BIND, "bind");
>     +
>        location_t loc = c_parser_peek_token (parser)->location;
>      
>        parser->lex_untranslated_string = true;
>     @@ -11615,20 +11617,43 @@ c_parser_oacc_clause_bind (c_parser *parser, tree list)
>            parser->lex_untranslated_string = false;
>            return list;
>          }
>     -  if (c_parser_next_token_is (parser, CPP_NAME)
>     -      || c_parser_next_token_is (parser, CPP_STRING))
>     +  tree name = error_mark_node;
>     +  c_token *token = c_parser_peek_token (parser);
>     +  if (c_parser_next_token_is (parser, CPP_NAME))
>          {
>     -      tree t = c_parser_peek_token (parser)->value;
>     +      tree decl = lookup_name (token->value);
>     +      if (!decl)
>     +       error_at (token->location, "%qE has not been declared",
>     +                 token->value);
>     +      else if (TREE_CODE (decl) != FUNCTION_DECL)
>     +       error_at (token->location, "%qE does not refer to a function",
>     +                 token->value);
> 
> Quite possibly we'll want to add more error checking (matching signature
> of X and Y, for example).

Good idea, but I wonder if that would be too strict. Should we allow
integer promotion in the bind function arguments?

>     +      else
>     +       {
>     +         //TODO? TREE_USED (decl) = 1;
>     +         tree name_id = DECL_NAME (decl);
>     +         name = build_string (IDENTIFIER_LENGTH (name_id),
>     +                              IDENTIFIER_POINTER (name_id));
>     +       }
>     +      c_parser_consume_token (parser);
>     +    }
> 
> Should I set TREE_USED after having looked up the identifier?
> 
>     +  else if (c_parser_next_token_is (parser, CPP_STRING))
>     +    {
>     +      name = token->value;
>            c_parser_consume_token (parser);
>     -      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
>     -      OMP_CLAUSE_BIND_NAME (c) = t;
>     -      OMP_CLAUSE_CHAIN (c) = list;
>     -      list = c;
>          }
>        else
>     -    c_parser_error (parser, "expected identifier or character string literal");
>     +    c_parser_error (parser,
>     +                   "expected identifier or character string literal");
>        parser->lex_untranslated_string = false;
>        c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>");
>     +  if (name != error_mark_node)
>     +    {
>     +      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
>     +      OMP_CLAUSE_BIND_NAME (c) = name;
>     +      OMP_CLAUSE_CHAIN (c) = list;
>     +      list = c;
>     +    }
>        return list;
>      }
>      
>     @@ -13977,10 +14002,10 @@ static void
>      c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
>      {
>        tree decl = NULL_TREE;
>     -  /* Create a dummy claue, to record location.  */
>     +  /* Create a dummy clause, to record the location.  */
>        tree c_head = build_omp_clause (c_parser_peek_token (parser)->location,
>     -                                 OMP_CLAUSE_SEQ);
>     -  
>     +                                 OMP_CLAUSE_ERROR);
> 
> I don't know why somebody chose OMP_CLAUSE_SEQ for this; changed to a
> distinctive OMP_CLAUSE_ERROR.  In the following, handling of c_head and
> generally the clauses seemed unnecessarily complicated to me, so I
> simplified that as follows:

I think that was me. As the comment states, I was using a dummy clause
to save the location for error reporting. OMP_CLAUSE_SEQ was chosen
because it's default level of parallelism for routines. Your changes are
ok though.

>     @@ -14018,9 +14043,9 @@ c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
>        tree clauses = c_parser_oacc_all_clauses
>          (parser, OACC_ROUTINE_CLAUSE_MASK, "#pragma acc routine",
>           OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK);
>     -
>     -  /* Force clauses to be non-null, by attaching context to it.  */
>     -  clauses = tree_cons (c_head, clauses, NULL_TREE);
>     +  /* Prepend the dummy clause.  */
>     +  OMP_CLAUSE_CHAIN (c_head) = clauses;
>     +  clauses = c_head;
>        
>        if (decl)
>          c_finish_oacc_routine (parser, decl, clauses, true, true, false);
>     @@ -14040,7 +14065,9 @@ static void
>      c_finish_oacc_routine (c_parser *ARG_UNUSED (parser), tree fndecl,
>                            tree clauses, bool named, bool first, bool is_defn)
>      {
>     -  location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE (clauses));
>     +  location_t loc = OMP_CLAUSE_LOCATION (clauses);
>     +  /* Get rid of the dummy clause.  */
>     +  clauses = OMP_CLAUSE_CHAIN (clauses);
>      
>        if (!fndecl || TREE_CODE (fndecl) != FUNCTION_DECL || !first)
>          {
>     @@ -14059,13 +14086,12 @@ c_finish_oacc_routine (c_parser *ARG_UNUSED (parser), tree fndecl,
>                   TREE_USED (fndecl) ? "use" : "definition");
>      
>        /* Process for function attrib  */
>     -  tree dims = build_oacc_routine_dims (TREE_VALUE (clauses));
>     +  tree dims = build_oacc_routine_dims (clauses);
>        replace_oacc_fn_attrib (fndecl, dims);
>      
>     -  /* Also attach as a declare.  */
>     -  DECL_ATTRIBUTES (fndecl)
>     -    = tree_cons (get_identifier ("omp declare target"),
>     -                clauses, DECL_ATTRIBUTES (fndecl));
>     +  /* Also add an "omp declare target" attribute, with clauses.  */
>     +  DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("omp declare target"),
>     +                                       clauses, DECL_ATTRIBUTES (fndecl));
>      }
> 
> I don't know why somebody chose to attach the clauses to the "omp declare
> target" attribute in this way?  Especially given that so far there hasn't
> been any user of this information (I'm now adding such users).  Is that
> OK, or should we have a separate "omp clauses" attribute or similar?

That was probably me again. When I started working on routine, I didn't
think it was going to be necessary to have a separate attribute for acc
routines. Then I added an acc routine attribute for something (forgot
what exactly), but these routine clauses were never updated.

I like the idea of having an "omp clauses" attribute. Especially since
we're going to need to eventually chain a list of device_type clauses
together. It's probably easier to access the clauses by pulling them
from the "omp clauses" attribute.

> Again simplifying the c_head/clauses handling (snipped), the C++ front
> end changes are very similar to the C front end changes:
> 
>     --- gcc/cp/parser.c
>     +++ gcc/cp/parser.c
>     @@ -31539,42 +31538,76 @@ static tree
>      cp_parser_oacc_clause_bind (cp_parser *parser, tree list)
>      {
>     [...]
>     -  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
>     -      || cp_lexer_next_token_is (parser->lexer, CPP_STRING))
>     +  tree name = error_mark_node;
>     +  cp_token *token = cp_lexer_peek_token (parser->lexer);
>     +  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
> 
> I'm not particularly confident in the following lookup/error checking
> (which I copied a lot from C++ OpenACC routine parsing):
> 
>          {
>     -      tree t;
>     -
>     -      if (cp_lexer_peek_token (parser->lexer)->type == CPP_STRING)
>     -       {
>     -         t = cp_lexer_peek_token (parser->lexer)->u.value;
>     -         cp_lexer_consume_token (parser->lexer);
>     +      //TODO
>     +      tree id = cp_parser_id_expression (parser, /*template_p=*/false,
>     +                                        /*check_dependency_p=*/true,
>     +                                        /*template_p=*/NULL,
>     +                                        /*declarator_p=*/false,
>     +                                        /*optional_p=*/false);
>     +      tree decl = cp_parser_lookup_name_simple (parser, id, token->location);
>     +      if (id != error_mark_node && decl == error_mark_node)
>     +       cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
>     +                                    token->location);
>     +      if (/* TODO */ !decl || decl == error_mark_node)
>     +       error_at (token->location, "%qE has not been declared",
>     +                 token->u.value);
>     +      else if (/* TODO */ is_overloaded_fn (decl)
>     +              && (TREE_CODE (decl) != FUNCTION_DECL
>     +                  || DECL_FUNCTION_TEMPLATE_P (decl)))
>     +       error_at (token->location, "%qE names a set of overloads",
>     +                 token->u.value);
>     +      else if (/* TODO */ !DECL_NAMESPACE_SCOPE_P (decl))
>     +       {
>     +         /* Perhaps we should use the same rule as declarations in different
>     +            namespaces?  */
>     +         error_at (token->location,
>     +                   "%qE does not refer to a namespace scope function",
>     +                   token->u.value);
>             }
>     +      else if (TREE_CODE (decl) != FUNCTION_DECL)
>     +       error_at (token->location,
>     +                 "%qE does not refer to a function",
>     +                 token->u.value);
> 
> ... also we'll want to add a lot more testsuite coverage for this.  (Also
> for the OpenACC routine directive itself.)

I'll look into this.

>            else
>     -       t = cp_parser_id_expression (parser, /*template_p=*/false,
>     -                                    /*check_dependency_p=*/true,
>     -                                    /*template_p=*/NULL,
>     -                                    /*declarator_p=*/false,
>     -                                    /*optional_p=*/false);
>     -      if (t == error_mark_node)
>     -       return t;
>     -
>     -      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
>     -      OMP_CLAUSE_BIND_NAME (c) = t;
>     -      OMP_CLAUSE_CHAIN (c) = list;
>     -      list = c;
>     +       {
>     +         //TODO? TREE_USED (decl) = 1;
>     +         tree name_id = DECL_NAME (decl);
>     +         name = build_string (IDENTIFIER_LENGTH (name_id),
>     +                              IDENTIFIER_POINTER (name_id));
> 
> We probably need to apply C++ name mangling here?  How to do that?
> 
>     +       }
>     +      //cp_lexer_consume_token (parser->lexer);
>     +    }
>     +  else if (cp_lexer_next_token_is (parser->lexer, CPP_STRING))
>     +    {
>     +      name = token->u.value;
>     +      cp_lexer_consume_token (parser->lexer);
>          }
>        else
>     -    cp_parser_error (parser, "expected identifier or character string literal");
>     +    cp_parser_error (parser,
>     +                    "expected identifier or character string literal");
>        parser->translate_strings_p = save_translate_strings_p;
>        cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
>     +  if (name != error_mark_node)
>     +    {
>     +      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
>     +      OMP_CLAUSE_BIND_NAME (c) = name;
>     +      OMP_CLAUSE_CHAIN (c) = list;
>     +      list = c;
>     +    }
>        return list;
>      }
> 
> What I changed in the Fortran front end is just a quick hack.  Also I
> have not spent any effort on updating the existing OpenACC bind clause
> support: the name is (only) parsed into routine_bind, but then not
> handled any further?  Also needs testsuite coverage, obviously.
> 
>     --- gcc/fortran/gfortran.h
>     +++ gcc/fortran/gfortran.h
>     @@ -850,6 +850,7 @@ typedef struct
>      
>        /* This is an OpenACC acclerator function at level N - 1  */
>        unsigned oacc_function:3;
>     +  unsigned oacc_function_nohost:1;
>      
>        /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
>        unsigned ext_attr:EXT_ATTR_NUM;
>     --- gcc/fortran/openmp.c
>     +++ gcc/fortran/openmp.c
>     @@ -1884,6 +1884,8 @@ gfc_match_oacc_routine (void)
>             goto cleanup;
>            gfc_current_ns->proc_name->attr.oacc_function
>             = gfc_oacc_routine_dims (c) + 1;
>     +      gfc_current_ns->proc_name->attr.oacc_function_nohost
>     +       = c ? c->nohost : false;
>          }
>      
>        if (n)
>     --- gcc/fortran/trans-decl.c
>     +++ gcc/fortran/trans-decl.c
>     @@ -1309,8 +1309,13 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>            || sym_attr.oacc_declare_device_resident
>      #endif
>            )
>     -    list = tree_cons (get_identifier ("omp declare target"),
>     -                     NULL_TREE, list);
>     +    {
>     +      tree c = NULL_TREE;
>     +      if (sym_attr.oacc_function_nohost)
>     +       c = build_omp_clause (/* TODO */ input_location,
>     +                             OMP_CLAUSE_NOHOST);
>     +      list = tree_cons (get_identifier ("omp declare target"), c, list);
>     +    }
>      #if 0 /* TODO */
>        if (sym_attr.oacc_declare_link)
>          list = tree_cons (get_identifier ("omp declare target link"),
> 
> I guess add_attributes_to_decl is the correct place to be doning this?
> 
>     --- gcc/fortran/trans-openmp.c
>     +++ gcc/fortran/trans-openmp.c
>     @@ -2644,6 +2644,13 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
>               OMP_CLAUSE_GANG_STATIC_EXPR (c) = arg;
>             }
>          }
>     +  if (clauses->nohost)
>     +    {
>     +      c = build_omp_clause (where.lb->location, OMP_CLAUSE_NOHOST);
>     +      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
>     +      //TODO
>     +      gcc_unreachable();
>     +    }
> 
> Probably we can generally just put a gcc_unreachable call here, with a
> source code comment added.  Again, this is to make sure that the reader
> of that code doesn't wonder why "clauses->nohost" has been forgotten to
> be handled here.
> 
>        return nreverse (omp_clauses);
>      }

That'll go on my todo list too.

> Middle end.  In the LTO wrapper, at the end of read_cgraph_and_symbols,
> for ACCEL_COMPILERs handle OpenACC bind clauses:
> 
>     --- gcc/lto/lto.c
>     +++ gcc/lto/lto.c
>     @@ -2942,6 +2944,36 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
>      
>        ggc_free (all_file_decl_data);
>        all_file_decl_data = NULL;
>     +
>     +#ifdef ACCEL_COMPILER
>     +  /* In an offload compiler, redirect calls to any function X that is tagged
>     +     with an OpenACC bind(Y) clause to call Y instead of X.  */
>     +  FOR_EACH_SYMBOL (snode)
>     +  {
>     +    tree decl = snode->decl;
>     +    tree attr = lookup_attribute ("omp declare target",
>     +                                 DECL_ATTRIBUTES (decl));
>     +    if (attr)
>     +      {
>     +       tree clauses = TREE_VALUE (attr);
>     +       /* TODO: device_type handling.  */
>     +       tree clause_bind = find_omp_clause (clauses, OMP_CLAUSE_BIND);
>     +       if (clause_bind)
>     +         {
>     +           tree clause_bind_name = OMP_CLAUSE_BIND_NAME (clause_bind);
>     +           const char *bind_name = TREE_STRING_POINTER(clause_bind_name);
>     +           if (symtab->dump_file)
>     +             fprintf (symtab->dump_file,
>     +                      "Applying \"bind(%s)\" clause to declaration of "
>     +                      "function \"%s\".\n",
>     +                      bind_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>     +           //TODO: Use gcc/varasm.c:set_user_assembler_name instead?
>     +           symtab->change_decl_assembler_name (decl,
>     +                                               get_identifier (bind_name));
>     +         }
>     +      }
>     +  }
>     +#endif /* ACCEL_COMPILER */
>      }
> 
> Probably that should be put into a separate function (in gcc/omp-low.c,
> even?).  Is the end of read_cgraph_and_symbols the correct place to
> put/call this?  Per my "How to rewrite call targets (OpenACC bind
> clause)" email,
> <http://news.gmane.org/find-root.php?message_id=%3C877fkq482i.fsf%40hertz.schwinge.homeip.net%3E>,
> it's still not clear to me whether just setting the decl's assembler name
> here is the right (and sufficient) thing to do (but it seems to work,
> with -fno-inline at least...).

I don't think the placement matters too much. It's a minor detail that
can be changed later.

> Joseph once pointed out that we'll need to add user_label_prefix to the
> bind_name -- but only if an indentifier has been used for Y in the
> bind(Y) clause, and not when a string has been used?
> 
> Then, the following handling in execute_oacc_device_lower (correct
> position in the pipeline -- as early as possible after the LTO front end,
> I guess?), for ACCEL_COMPILERs handle OpenACC bind clauses, and for
> non-ACCEL_COMPILERs handle OpenACC nohost clauses.  In both cases, use
> the new TODO_discard_function,
> <http://news.gmane.org/find-root.php?message_id=%3C563A3791.7020001%40suse.cz%3E>,
> that has recently been added.  :-)
> 
>     --- gcc/omp-low.c
>     +++ gcc/omp-low.c
>     @@ -19853,14 +19857,76 @@ default_goacc_reduction (gcall *call)
>      static unsigned int
>      execute_oacc_device_lower ()
>      {
>     -  tree attrs = get_oacc_fn_attrib (current_function_decl);
>     -  int dims[GOMP_DIM_MAX];
>     -  
>     -  if (!attrs)
>     +  /* There are offloaded functions without an "omp declare target" attribute,
>     +     so we'll not handle these here, but on the other hand, OpenACC bind and
>     +     nohost clauses can only be generated in the front ends, and an "omp
>     +     declare target" attribute will then also always have been set there, so
>     +     this is not a problem in practice.  */
>     +  tree attr = lookup_attribute ("omp declare target",
>     +                               DECL_ATTRIBUTES (current_function_decl));
>     +
>     +#if defined(ACCEL_COMPILER)
>     +  /* In an offload compiler, discard any offloaded function X that is tagged
>     +     with an OpenACC bind(Y) clause: all references to X have been rewritten to
>     +     refer to Y; X is unreachable, do not compile it.  */
>     +  if (attr)
>     +    {
>     +      tree clauses = TREE_VALUE (attr);
>     +      /* TODO: device_type handling.  */
>     +      tree clause_bind = find_omp_clause (clauses, OMP_CLAUSE_BIND);
>     +      if (clause_bind)
>     +       {
>     +         tree clause_bind_name = OMP_CLAUSE_BIND_NAME (clause_bind);
>     +         const char *bind_name = TREE_STRING_POINTER(clause_bind_name);
>     +         if (dump_file)
>     +           fprintf (dump_file,
>     +                    "Discarding function \"%s\" with \"bind(%s)\" clause.\n",
>     +                    IDENTIFIER_POINTER (DECL_NAME (current_function_decl)),
>     +                    bind_name);
>     +         TREE_ASM_WRITTEN (current_function_decl) = 1;
>     +         return TODO_discard_function;
>     +       }
>     +    }
>     +#endif /* ACCEL_COMPILER */
>     +#if !defined(ACCEL_COMPILER)
>     +  /* In the host compiler, discard any offloaded function that is tagged with
>     +     an OpenACC nohost clause.  */
>     +  if (attr)
>     +    {
>     +      tree clauses = TREE_VALUE (attr);
>     +      if (find_omp_clause (clauses, OMP_CLAUSE_NOHOST))
>     +       {
>     +         /* There are no construct/clause combinations that could make this
>     +            happen, but play it safe, and verify that we never discard a
>     +            function that is stored in offload_funcs, used for target/offload
>     +            function mapping.  */
>     +         if (flag_checking)
>     +           {
>     +             bool found = false;
>     +             for (unsigned i = 0;
>     +                  !found && i < vec_safe_length (offload_funcs);
>     +                  i++)
>     +               if ((*offload_funcs)[i] == current_function_decl)
>     +                 found = true;
>     +             gcc_assert (!found);
>     +           }
>     +
>     +         if (dump_file)
>     +           fprintf (dump_file,
>     +                    "Discarding function \"%s\" with \"nohost\" clause.\n",
>     +                    IDENTIFIER_POINTER (DECL_NAME (current_function_decl)));
>     +         TREE_ASM_WRITTEN (current_function_decl) = 1;
>     +         return TODO_discard_function;

I don't think this is a good idea. If you have a nohost function,
wounldn't that prevent the code from linking?

Perhaps nohost should kind of implement a reverse bind on the host. E.g.
discard the function defintion and replace it with an asm alias to some
libgomp function like goacc_nohost_fallback. That way, the program will
still link and the runtime will provide the end user with a sensible
error when things go wrong.

>     +       }
>     +    }
>     +#endif /* !ACCEL_COMPILER */
>     +
>     +  attr = get_oacc_fn_attrib (current_function_decl);
>     +  if (!attr)
>          /* Not an offloaded function.  */
>          return 0;
>     -
>     -  int fn_level = oacc_validate_dims (current_function_decl, attrs, dims);
>     +  int dims[GOMP_DIM_MAX];
>     +  int fn_level = oacc_validate_dims (current_function_decl, attr, dims);
>      
>        /* Discover, partition and process the loops.  */
>        oacc_loop *loops = oacc_loop_discovery ();
> 
> Initial testsuite updates:
> 
>     --- gcc/testsuite/c-c++-common/goacc/routine-2.c
>     +++ gcc/testsuite/c-c++-common/goacc/routine-2.c
>     @@ -1,21 +1,40 @@
>     +/* Test invalid use of clauses with routine.  */
>     [...]
>     +extern void a(void), b(void);
>     +
>     +#pragma acc routine bind(a) bind(b) /* { dg-error "too many .bind. clauses" } */
>     +extern void bind_1 (void);
> 
> This diagnostic does make sense (can't bind to a and b at the same time),
> but this will need re-visiting for device_type clause support.
> 
>     +#pragma acc routine nohost nohost /* { dg-error "too many .nohost. clauses" } */
>     +extern void nohost (void);
> 
> But I'm not too sure about this one.  After all, there is no harm in
> specifying multiple such clauses.  However, GCC generally (also for
> "simple" OpenMP clauses?) seems to diagnose such usage, so it's probably
> a good idea to be consistent?

I think so. If the user wants to duplicate nohost, then nohost should go
into a device_type.

>     --- /dev/null
>     +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-bind-nohost-1.c
>     @@ -0,0 +1,105 @@
>     +/* Test the bind and nohost clauses for OpenACC routine directive.  */
>     +
>     +/* TODO.  Function inlining and the OpenACC bind clause do not yet get on well
>     +   with one another.
>     +   { dg-additional-options "-fno-inline" } */
> 
> TODO.
> 
>     +/* TODO.  C works, but for C++ we get: "lto1: internal compiler error: in
>     +   ipa_propagate_frequency".
>     +   { dg-xfail-if "TODO" { *-*-* } } */
> 
> TODO.  Perhaps related to missing C++ name mangling (see above), perhaps
> something else.
> 
>     +#include <openacc.h>
>     +
>     +/* "MINUS_TWO" is the device variant for function "TWO".  Similar for "THREE",
>     +   and "FOUR".  Exercising different variants for declaring routines.  */
>     +
>     +#pragma acc routine nohost
>     +extern int MINUS_TWO(void);
>     +
>     +int MINUS_TWO(void)
>     +{
>     +  if (!acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return -2;
>     +}
>     +
>     +extern int TWO(void);
>     +#pragma acc routine (TWO) bind(MINUS_TWO)
>     +
>     +int TWO(void)
>     +{
>     +  if (acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return 2;
>     +}
>     +
>     +
>     +#pragma acc routine nohost
>     +int MINUS_THREE(void)
>     +{
>     +  if (!acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return -3;
>     +}
>     +
>     +#pragma acc routine bind(MINUS_THREE)
>     +extern int THREE(void);
>     +
>     +int THREE(void)
>     +{
>     +  if (acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return 3;
>     +}
>     +
>     +
>     +/* Due to using a string in the bind clause, we don't need "MINUS_FOUR" in
>     +   scope here.  */
>     +#pragma acc routine bind("MINUS_FOUR")
>     +int FOUR(void)
>     +{
>     +  if (acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return 4;
>     +}
>     +
>     +extern int MINUS_FOUR(void);
>     +#pragma acc routine (MINUS_FOUR) nohost
>     +
>     +int MINUS_FOUR(void)
>     +{
>     +  if (!acc_on_device(acc_device_not_host))
>     +    __builtin_abort();
>     +  return -4;
>     +}
>     +
>     +
>     +int main()
>     +{
>     +  int x2, x3, x4;
>     +
>     +#pragma acc parallel copyout(x2, x3, x4) if(0)
>     +  {
>     +    x2 = TWO();
>     +    x3 = THREE();
>     +    x4 = FOUR();
>     +  }
>     +  if (x2 != 2 || x3 != 3 || x4 != 4)
>     +    __builtin_abort();
>     +
>     +#pragma acc parallel copyout(x2, x3, x4)
>     +  {
>     +    x2 = TWO();
>     +    x3 = THREE();
>     +    x4 = FOUR();
>     +  }
>     +#ifdef ACC_DEVICE_TYPE_host
>     +  if (x2 != 2 || x3 != 3 || x4 != 4)
>     +    __builtin_abort();
>     +#else
>     +  if (x2 != -2 || x3 != -3 || x4 != -4)
>     +    __builtin_abort();
>     +#endif
>     +
>     +  return 0;
>     +}
> 
> I'd also like to add test cases where the host and device function
> definitions are in separate files, so I'll try to figure out how to do
> that in the libgomp testsuite.

I thought we're using lto, so being in separate files doens't really
matter in the end.

>     --- /dev/null
>     +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c
>     @@ -0,0 +1,18 @@
>     +/* { dg-do link } */
>     +
>     +extern int three (void);
>     +
>     +#pragma acc routine (three) nohost
>     +__attribute__((noinline))
>     +int three(void)
>     +{
>     +  return 3;
>     +}
>     +
>     +int main(void)
>     +{
>     +  return (three() == 3) ? 0 : 1;
>     +}
>     +
>     +/* Expecting link to fail; "undefined reference to `three'" (or similar).
>     +   { dg-excess-errors "" } */
> 
> This results in an XFAIL, which is not nice.  Is there a mechanism in the
> GCC testsuite/DejaGnu to check for an expected link failure (due to a
> missing symbol)?  I guess we could cook up something that instead
> triggers a link failure for a duplicate or incompatible symbol
> definition?

This is an interesting test case. So what's supposed to happen if a
nohost routine is called outside of an acc context? Should it still work
or not?

As mentioned above, I don't think there should be a missing symbol
error. Maybe check for a "LIBGOMP: invalid call to nohost function".

>     --- libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
>     +++ libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
>     @@ -1,5 +1,5 @@
>      ! { dg-do run }
>     -! { dg-xfail-if "not found" { openacc_host_selected } }
>     +! { dg-xfail-if "TODO" { *-*-* } }
> 
> TODO.  ICE, if I remember correctly.

Cesar



More information about the Gcc-patches mailing list