[gomp4] [WIP] OpenACC bind, nohost clauses
Cesar Philippidis
cesar@codesourcery.com
Mon Dec 14 20:36:00 GMT 2015
On 12/08/2015 11:55 AM, Thomas Schwinge wrote:
Just for clarification, we're implementing the bind clause with the
semantics defined in OpenACC 2.5, correct? The 2.0a semantics aren't clear.
> On Sat, 14 Nov 2015 09:36:36 +0100, I wrote:
>> Initial support for the OpenACC bind and nohost clauses (routine
>> directive) for C, C++. Fortran to follow. Middle end handling and more
>> complete testsuite coverage also to follow once we got a few details
>> clarified. OK for trunk?
>
> (Has not yet been reviewed.) Meanwhile, I continued working on the
> implementation, focussing on C. See also my question "How to rewrite
> call targets (OpenACC bind clause)",
> <http://news.gmane.org/find-root.php?message_id=%3C877fkq482i.fsf%40hertz.schwinge.homeip.net%3E>.
>
> To enable Cesar to help with the C++ and Fortran front ends (thanks!), in
> r231423, I just committed "[WIP] OpenACC bind, nohost clauses" to
> gomp-4_0-branch. (There has already been initial support, parsing only,
> on gomp-4_0-branch.) I'll try to make progress with the generic middle
> end bits, but will appreciate any review comments, so before inlining the
> complete patch, first a few questions/comments:
>
> In the OpenACC bind(Y) clause attached to a routine(X) directive, Y can
> be an identifier or a string. In the front ends, I canonicalize that
> into a string, as we -- at least currently -- don't have any use for the
> identifier (or decl?) later on:
>
> --- gcc/tree-core.h
> +++ gcc/tree-core.h
> @@ -461,7 +461,7 @@ enum omp_clause_code {
> - /* OpenACC clause: bind ( identifer | string ). */
> + /* OpenACC clause: bind (string). */
> OMP_CLAUSE_BIND,
So what happens in c++ then? E.g. Say that we have a function sum which
is overloaded as follows:
int sum (int a, int b) { return a + b; }
double sum (double a, double b) { return a + b; }
#pragma acc routine (sum) bind (cuda_sum)
First of all, does this bind apply to both int sum and double sum, or
just the double sum? Second, if the identifier gets canonicalized as a
string, will that prevent the name from being mangled, and hence disable
function overloading?
Also, while I'm asking about c++, is it possible apply bind individually
to an overloaded function. E.g.
#pragma acc routine (sum) bind (cuda_sum_int)
int sum (int a, int b) { return a + b; }
#pragma acc routine (sum) bind (cuda_sum_double)
double sum (double a, double b) { return a + b; }
> All the following are unreachable for OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST;
> document that to make it obvious/expected:
>
> --- gcc/cp/pt.c
> +++ gcc/cp/pt.c
> @@ -14501,6 +14501,8 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
> }
> }
> break;
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -7413,6 +7413,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
> ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> @@ -8104,6 +8106,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
> case OMP_CLAUSE_DEVICE_TYPE:
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -2279,6 +2279,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> sorry ("Clause not supported yet");
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> @@ -2453,6 +2455,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> sorry ("Clause not supported yet");
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> --- gcc/tree-nested.c
> +++ gcc/tree-nested.c
> @@ -1200,6 +1200,8 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
> case OMP_CLAUSE_SEQ:
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
> @@ -1882,6 +1884,8 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
> case OMP_CLAUSE_SEQ:
> break;
>
> + case OMP_CLAUSE_BIND:
> + case OMP_CLAUSE_NOHOST:
> default:
> gcc_unreachable ();
> }
Those changes look reasonable.
> C front end:
>
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -11607,6 +11607,8 @@ c_parser_oacc_clause_async (c_parser *parser, tree list)
> static tree
> c_parser_oacc_clause_bind (c_parser *parser, tree list)
> {
> + check_no_duplicate_clause (list, OMP_CLAUSE_BIND, "bind");
> +
> location_t loc = c_parser_peek_token (parser)->location;
>
> parser->lex_untranslated_string = true;
> @@ -11615,20 +11617,43 @@ c_parser_oacc_clause_bind (c_parser *parser, tree list)
> parser->lex_untranslated_string = false;
> return list;
> }
> - if (c_parser_next_token_is (parser, CPP_NAME)
> - || c_parser_next_token_is (parser, CPP_STRING))
> + tree name = error_mark_node;
> + c_token *token = c_parser_peek_token (parser);
> + if (c_parser_next_token_is (parser, CPP_NAME))
> {
> - tree t = c_parser_peek_token (parser)->value;
> + tree decl = lookup_name (token->value);
> + if (!decl)
> + error_at (token->location, "%qE has not been declared",
> + token->value);
> + else if (TREE_CODE (decl) != FUNCTION_DECL)
> + error_at (token->location, "%qE does not refer to a function",
> + token->value);
>
> Quite possibly we'll want to add more error checking (matching signature
> of X and Y, for example).
Good idea, but I wonder if that would be too strict. Should we allow
integer promotion in the bind function arguments?
> + else
> + {
> + //TODO? TREE_USED (decl) = 1;
> + tree name_id = DECL_NAME (decl);
> + name = build_string (IDENTIFIER_LENGTH (name_id),
> + IDENTIFIER_POINTER (name_id));
> + }
> + c_parser_consume_token (parser);
> + }
>
> Should I set TREE_USED after having looked up the identifier?
>
> + else if (c_parser_next_token_is (parser, CPP_STRING))
> + {
> + name = token->value;
> c_parser_consume_token (parser);
> - tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
> - OMP_CLAUSE_BIND_NAME (c) = t;
> - OMP_CLAUSE_CHAIN (c) = list;
> - list = c;
> }
> else
> - c_parser_error (parser, "expected identifier or character string literal");
> + c_parser_error (parser,
> + "expected identifier or character string literal");
> parser->lex_untranslated_string = false;
> c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>");
> + if (name != error_mark_node)
> + {
> + tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
> + OMP_CLAUSE_BIND_NAME (c) = name;
> + OMP_CLAUSE_CHAIN (c) = list;
> + list = c;
> + }
> return list;
> }
>
> @@ -13977,10 +14002,10 @@ static void
> c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
> {
> tree decl = NULL_TREE;
> - /* Create a dummy claue, to record location. */
> + /* Create a dummy clause, to record the location. */
> tree c_head = build_omp_clause (c_parser_peek_token (parser)->location,
> - OMP_CLAUSE_SEQ);
> -
> + OMP_CLAUSE_ERROR);
>
> I don't know why somebody chose OMP_CLAUSE_SEQ for this; changed to a
> distinctive OMP_CLAUSE_ERROR. In the following, handling of c_head and
> generally the clauses seemed unnecessarily complicated to me, so I
> simplified that as follows:
I think that was me. As the comment states, I was using a dummy clause
to save the location for error reporting. OMP_CLAUSE_SEQ was chosen
because it's default level of parallelism for routines. Your changes are
ok though.
> @@ -14018,9 +14043,9 @@ c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
> tree clauses = c_parser_oacc_all_clauses
> (parser, OACC_ROUTINE_CLAUSE_MASK, "#pragma acc routine",
> OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK);
> -
> - /* Force clauses to be non-null, by attaching context to it. */
> - clauses = tree_cons (c_head, clauses, NULL_TREE);
> + /* Prepend the dummy clause. */
> + OMP_CLAUSE_CHAIN (c_head) = clauses;
> + clauses = c_head;
>
> if (decl)
> c_finish_oacc_routine (parser, decl, clauses, true, true, false);
> @@ -14040,7 +14065,9 @@ static void
> c_finish_oacc_routine (c_parser *ARG_UNUSED (parser), tree fndecl,
> tree clauses, bool named, bool first, bool is_defn)
> {
> - location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE (clauses));
> + location_t loc = OMP_CLAUSE_LOCATION (clauses);
> + /* Get rid of the dummy clause. */
> + clauses = OMP_CLAUSE_CHAIN (clauses);
>
> if (!fndecl || TREE_CODE (fndecl) != FUNCTION_DECL || !first)
> {
> @@ -14059,13 +14086,12 @@ c_finish_oacc_routine (c_parser *ARG_UNUSED (parser), tree fndecl,
> TREE_USED (fndecl) ? "use" : "definition");
>
> /* Process for function attrib */
> - tree dims = build_oacc_routine_dims (TREE_VALUE (clauses));
> + tree dims = build_oacc_routine_dims (clauses);
> replace_oacc_fn_attrib (fndecl, dims);
>
> - /* Also attach as a declare. */
> - DECL_ATTRIBUTES (fndecl)
> - = tree_cons (get_identifier ("omp declare target"),
> - clauses, DECL_ATTRIBUTES (fndecl));
> + /* Also add an "omp declare target" attribute, with clauses. */
> + DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("omp declare target"),
> + clauses, DECL_ATTRIBUTES (fndecl));
> }
>
> I don't know why somebody chose to attach the clauses to the "omp declare
> target" attribute in this way? Especially given that so far there hasn't
> been any user of this information (I'm now adding such users). Is that
> OK, or should we have a separate "omp clauses" attribute or similar?
That was probably me again. When I started working on routine, I didn't
think it was going to be necessary to have a separate attribute for acc
routines. Then I added an acc routine attribute for something (forgot
what exactly), but these routine clauses were never updated.
I like the idea of having an "omp clauses" attribute. Especially since
we're going to need to eventually chain a list of device_type clauses
together. It's probably easier to access the clauses by pulling them
from the "omp clauses" attribute.
> Again simplifying the c_head/clauses handling (snipped), the C++ front
> end changes are very similar to the C front end changes:
>
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -31539,42 +31538,76 @@ static tree
> cp_parser_oacc_clause_bind (cp_parser *parser, tree list)
> {
> [...]
> - if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
> - || cp_lexer_next_token_is (parser->lexer, CPP_STRING))
> + tree name = error_mark_node;
> + cp_token *token = cp_lexer_peek_token (parser->lexer);
> + if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
>
> I'm not particularly confident in the following lookup/error checking
> (which I copied a lot from C++ OpenACC routine parsing):
>
> {
> - tree t;
> -
> - if (cp_lexer_peek_token (parser->lexer)->type == CPP_STRING)
> - {
> - t = cp_lexer_peek_token (parser->lexer)->u.value;
> - cp_lexer_consume_token (parser->lexer);
> + //TODO
> + tree id = cp_parser_id_expression (parser, /*template_p=*/false,
> + /*check_dependency_p=*/true,
> + /*template_p=*/NULL,
> + /*declarator_p=*/false,
> + /*optional_p=*/false);
> + tree decl = cp_parser_lookup_name_simple (parser, id, token->location);
> + if (id != error_mark_node && decl == error_mark_node)
> + cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
> + token->location);
> + if (/* TODO */ !decl || decl == error_mark_node)
> + error_at (token->location, "%qE has not been declared",
> + token->u.value);
> + else if (/* TODO */ is_overloaded_fn (decl)
> + && (TREE_CODE (decl) != FUNCTION_DECL
> + || DECL_FUNCTION_TEMPLATE_P (decl)))
> + error_at (token->location, "%qE names a set of overloads",
> + token->u.value);
> + else if (/* TODO */ !DECL_NAMESPACE_SCOPE_P (decl))
> + {
> + /* Perhaps we should use the same rule as declarations in different
> + namespaces? */
> + error_at (token->location,
> + "%qE does not refer to a namespace scope function",
> + token->u.value);
> }
> + else if (TREE_CODE (decl) != FUNCTION_DECL)
> + error_at (token->location,
> + "%qE does not refer to a function",
> + token->u.value);
>
> ... also we'll want to add a lot more testsuite coverage for this. (Also
> for the OpenACC routine directive itself.)
I'll look into this.
> else
> - t = cp_parser_id_expression (parser, /*template_p=*/false,
> - /*check_dependency_p=*/true,
> - /*template_p=*/NULL,
> - /*declarator_p=*/false,
> - /*optional_p=*/false);
> - if (t == error_mark_node)
> - return t;
> -
> - tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
> - OMP_CLAUSE_BIND_NAME (c) = t;
> - OMP_CLAUSE_CHAIN (c) = list;
> - list = c;
> + {
> + //TODO? TREE_USED (decl) = 1;
> + tree name_id = DECL_NAME (decl);
> + name = build_string (IDENTIFIER_LENGTH (name_id),
> + IDENTIFIER_POINTER (name_id));
>
> We probably need to apply C++ name mangling here? How to do that?
>
> + }
> + //cp_lexer_consume_token (parser->lexer);
> + }
> + else if (cp_lexer_next_token_is (parser->lexer, CPP_STRING))
> + {
> + name = token->u.value;
> + cp_lexer_consume_token (parser->lexer);
> }
> else
> - cp_parser_error (parser, "expected identifier or character string literal");
> + cp_parser_error (parser,
> + "expected identifier or character string literal");
> parser->translate_strings_p = save_translate_strings_p;
> cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
> + if (name != error_mark_node)
> + {
> + tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
> + OMP_CLAUSE_BIND_NAME (c) = name;
> + OMP_CLAUSE_CHAIN (c) = list;
> + list = c;
> + }
> return list;
> }
>
> What I changed in the Fortran front end is just a quick hack. Also I
> have not spent any effort on updating the existing OpenACC bind clause
> support: the name is (only) parsed into routine_bind, but then not
> handled any further? Also needs testsuite coverage, obviously.
>
> --- gcc/fortran/gfortran.h
> +++ gcc/fortran/gfortran.h
> @@ -850,6 +850,7 @@ typedef struct
>
> /* This is an OpenACC acclerator function at level N - 1 */
> unsigned oacc_function:3;
> + unsigned oacc_function_nohost:1;
>
> /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */
> unsigned ext_attr:EXT_ATTR_NUM;
> --- gcc/fortran/openmp.c
> +++ gcc/fortran/openmp.c
> @@ -1884,6 +1884,8 @@ gfc_match_oacc_routine (void)
> goto cleanup;
> gfc_current_ns->proc_name->attr.oacc_function
> = gfc_oacc_routine_dims (c) + 1;
> + gfc_current_ns->proc_name->attr.oacc_function_nohost
> + = c ? c->nohost : false;
> }
>
> if (n)
> --- gcc/fortran/trans-decl.c
> +++ gcc/fortran/trans-decl.c
> @@ -1309,8 +1309,13 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> || sym_attr.oacc_declare_device_resident
> #endif
> )
> - list = tree_cons (get_identifier ("omp declare target"),
> - NULL_TREE, list);
> + {
> + tree c = NULL_TREE;
> + if (sym_attr.oacc_function_nohost)
> + c = build_omp_clause (/* TODO */ input_location,
> + OMP_CLAUSE_NOHOST);
> + list = tree_cons (get_identifier ("omp declare target"), c, list);
> + }
> #if 0 /* TODO */
> if (sym_attr.oacc_declare_link)
> list = tree_cons (get_identifier ("omp declare target link"),
>
> I guess add_attributes_to_decl is the correct place to be doning this?
>
> --- gcc/fortran/trans-openmp.c
> +++ gcc/fortran/trans-openmp.c
> @@ -2644,6 +2644,13 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
> OMP_CLAUSE_GANG_STATIC_EXPR (c) = arg;
> }
> }
> + if (clauses->nohost)
> + {
> + c = build_omp_clause (where.lb->location, OMP_CLAUSE_NOHOST);
> + omp_clauses = gfc_trans_add_clause (c, omp_clauses);
> + //TODO
> + gcc_unreachable();
> + }
>
> Probably we can generally just put a gcc_unreachable call here, with a
> source code comment added. Again, this is to make sure that the reader
> of that code doesn't wonder why "clauses->nohost" has been forgotten to
> be handled here.
>
> return nreverse (omp_clauses);
> }
That'll go on my todo list too.
> Middle end. In the LTO wrapper, at the end of read_cgraph_and_symbols,
> for ACCEL_COMPILERs handle OpenACC bind clauses:
>
> --- gcc/lto/lto.c
> +++ gcc/lto/lto.c
> @@ -2942,6 +2944,36 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
>
> ggc_free (all_file_decl_data);
> all_file_decl_data = NULL;
> +
> +#ifdef ACCEL_COMPILER
> + /* In an offload compiler, redirect calls to any function X that is tagged
> + with an OpenACC bind(Y) clause to call Y instead of X. */
> + FOR_EACH_SYMBOL (snode)
> + {
> + tree decl = snode->decl;
> + tree attr = lookup_attribute ("omp declare target",
> + DECL_ATTRIBUTES (decl));
> + if (attr)
> + {
> + tree clauses = TREE_VALUE (attr);
> + /* TODO: device_type handling. */
> + tree clause_bind = find_omp_clause (clauses, OMP_CLAUSE_BIND);
> + if (clause_bind)
> + {
> + tree clause_bind_name = OMP_CLAUSE_BIND_NAME (clause_bind);
> + const char *bind_name = TREE_STRING_POINTER(clause_bind_name);
> + if (symtab->dump_file)
> + fprintf (symtab->dump_file,
> + "Applying \"bind(%s)\" clause to declaration of "
> + "function \"%s\".\n",
> + bind_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
> + //TODO: Use gcc/varasm.c:set_user_assembler_name instead?
> + symtab->change_decl_assembler_name (decl,
> + get_identifier (bind_name));
> + }
> + }
> + }
> +#endif /* ACCEL_COMPILER */
> }
>
> Probably that should be put into a separate function (in gcc/omp-low.c,
> even?). Is the end of read_cgraph_and_symbols the correct place to
> put/call this? Per my "How to rewrite call targets (OpenACC bind
> clause)" email,
> <http://news.gmane.org/find-root.php?message_id=%3C877fkq482i.fsf%40hertz.schwinge.homeip.net%3E>,
> it's still not clear to me whether just setting the decl's assembler name
> here is the right (and sufficient) thing to do (but it seems to work,
> with -fno-inline at least...).
I don't think the placement matters too much. It's a minor detail that
can be changed later.
> Joseph once pointed out that we'll need to add user_label_prefix to the
> bind_name -- but only if an indentifier has been used for Y in the
> bind(Y) clause, and not when a string has been used?
>
> Then, the following handling in execute_oacc_device_lower (correct
> position in the pipeline -- as early as possible after the LTO front end,
> I guess?), for ACCEL_COMPILERs handle OpenACC bind clauses, and for
> non-ACCEL_COMPILERs handle OpenACC nohost clauses. In both cases, use
> the new TODO_discard_function,
> <http://news.gmane.org/find-root.php?message_id=%3C563A3791.7020001%40suse.cz%3E>,
> that has recently been added. :-)
>
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -19853,14 +19857,76 @@ default_goacc_reduction (gcall *call)
> static unsigned int
> execute_oacc_device_lower ()
> {
> - tree attrs = get_oacc_fn_attrib (current_function_decl);
> - int dims[GOMP_DIM_MAX];
> -
> - if (!attrs)
> + /* There are offloaded functions without an "omp declare target" attribute,
> + so we'll not handle these here, but on the other hand, OpenACC bind and
> + nohost clauses can only be generated in the front ends, and an "omp
> + declare target" attribute will then also always have been set there, so
> + this is not a problem in practice. */
> + tree attr = lookup_attribute ("omp declare target",
> + DECL_ATTRIBUTES (current_function_decl));
> +
> +#if defined(ACCEL_COMPILER)
> + /* In an offload compiler, discard any offloaded function X that is tagged
> + with an OpenACC bind(Y) clause: all references to X have been rewritten to
> + refer to Y; X is unreachable, do not compile it. */
> + if (attr)
> + {
> + tree clauses = TREE_VALUE (attr);
> + /* TODO: device_type handling. */
> + tree clause_bind = find_omp_clause (clauses, OMP_CLAUSE_BIND);
> + if (clause_bind)
> + {
> + tree clause_bind_name = OMP_CLAUSE_BIND_NAME (clause_bind);
> + const char *bind_name = TREE_STRING_POINTER(clause_bind_name);
> + if (dump_file)
> + fprintf (dump_file,
> + "Discarding function \"%s\" with \"bind(%s)\" clause.\n",
> + IDENTIFIER_POINTER (DECL_NAME (current_function_decl)),
> + bind_name);
> + TREE_ASM_WRITTEN (current_function_decl) = 1;
> + return TODO_discard_function;
> + }
> + }
> +#endif /* ACCEL_COMPILER */
> +#if !defined(ACCEL_COMPILER)
> + /* In the host compiler, discard any offloaded function that is tagged with
> + an OpenACC nohost clause. */
> + if (attr)
> + {
> + tree clauses = TREE_VALUE (attr);
> + if (find_omp_clause (clauses, OMP_CLAUSE_NOHOST))
> + {
> + /* There are no construct/clause combinations that could make this
> + happen, but play it safe, and verify that we never discard a
> + function that is stored in offload_funcs, used for target/offload
> + function mapping. */
> + if (flag_checking)
> + {
> + bool found = false;
> + for (unsigned i = 0;
> + !found && i < vec_safe_length (offload_funcs);
> + i++)
> + if ((*offload_funcs)[i] == current_function_decl)
> + found = true;
> + gcc_assert (!found);
> + }
> +
> + if (dump_file)
> + fprintf (dump_file,
> + "Discarding function \"%s\" with \"nohost\" clause.\n",
> + IDENTIFIER_POINTER (DECL_NAME (current_function_decl)));
> + TREE_ASM_WRITTEN (current_function_decl) = 1;
> + return TODO_discard_function;
I don't think this is a good idea. If you have a nohost function,
wounldn't that prevent the code from linking?
Perhaps nohost should kind of implement a reverse bind on the host. E.g.
discard the function defintion and replace it with an asm alias to some
libgomp function like goacc_nohost_fallback. That way, the program will
still link and the runtime will provide the end user with a sensible
error when things go wrong.
> + }
> + }
> +#endif /* !ACCEL_COMPILER */
> +
> + attr = get_oacc_fn_attrib (current_function_decl);
> + if (!attr)
> /* Not an offloaded function. */
> return 0;
> -
> - int fn_level = oacc_validate_dims (current_function_decl, attrs, dims);
> + int dims[GOMP_DIM_MAX];
> + int fn_level = oacc_validate_dims (current_function_decl, attr, dims);
>
> /* Discover, partition and process the loops. */
> oacc_loop *loops = oacc_loop_discovery ();
>
> Initial testsuite updates:
>
> --- gcc/testsuite/c-c++-common/goacc/routine-2.c
> +++ gcc/testsuite/c-c++-common/goacc/routine-2.c
> @@ -1,21 +1,40 @@
> +/* Test invalid use of clauses with routine. */
> [...]
> +extern void a(void), b(void);
> +
> +#pragma acc routine bind(a) bind(b) /* { dg-error "too many .bind. clauses" } */
> +extern void bind_1 (void);
>
> This diagnostic does make sense (can't bind to a and b at the same time),
> but this will need re-visiting for device_type clause support.
>
> +#pragma acc routine nohost nohost /* { dg-error "too many .nohost. clauses" } */
> +extern void nohost (void);
>
> But I'm not too sure about this one. After all, there is no harm in
> specifying multiple such clauses. However, GCC generally (also for
> "simple" OpenMP clauses?) seems to diagnose such usage, so it's probably
> a good idea to be consistent?
I think so. If the user wants to duplicate nohost, then nohost should go
into a device_type.
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-bind-nohost-1.c
> @@ -0,0 +1,105 @@
> +/* Test the bind and nohost clauses for OpenACC routine directive. */
> +
> +/* TODO. Function inlining and the OpenACC bind clause do not yet get on well
> + with one another.
> + { dg-additional-options "-fno-inline" } */
>
> TODO.
>
> +/* TODO. C works, but for C++ we get: "lto1: internal compiler error: in
> + ipa_propagate_frequency".
> + { dg-xfail-if "TODO" { *-*-* } } */
>
> TODO. Perhaps related to missing C++ name mangling (see above), perhaps
> something else.
>
> +#include <openacc.h>
> +
> +/* "MINUS_TWO" is the device variant for function "TWO". Similar for "THREE",
> + and "FOUR". Exercising different variants for declaring routines. */
> +
> +#pragma acc routine nohost
> +extern int MINUS_TWO(void);
> +
> +int MINUS_TWO(void)
> +{
> + if (!acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return -2;
> +}
> +
> +extern int TWO(void);
> +#pragma acc routine (TWO) bind(MINUS_TWO)
> +
> +int TWO(void)
> +{
> + if (acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return 2;
> +}
> +
> +
> +#pragma acc routine nohost
> +int MINUS_THREE(void)
> +{
> + if (!acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return -3;
> +}
> +
> +#pragma acc routine bind(MINUS_THREE)
> +extern int THREE(void);
> +
> +int THREE(void)
> +{
> + if (acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return 3;
> +}
> +
> +
> +/* Due to using a string in the bind clause, we don't need "MINUS_FOUR" in
> + scope here. */
> +#pragma acc routine bind("MINUS_FOUR")
> +int FOUR(void)
> +{
> + if (acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return 4;
> +}
> +
> +extern int MINUS_FOUR(void);
> +#pragma acc routine (MINUS_FOUR) nohost
> +
> +int MINUS_FOUR(void)
> +{
> + if (!acc_on_device(acc_device_not_host))
> + __builtin_abort();
> + return -4;
> +}
> +
> +
> +int main()
> +{
> + int x2, x3, x4;
> +
> +#pragma acc parallel copyout(x2, x3, x4) if(0)
> + {
> + x2 = TWO();
> + x3 = THREE();
> + x4 = FOUR();
> + }
> + if (x2 != 2 || x3 != 3 || x4 != 4)
> + __builtin_abort();
> +
> +#pragma acc parallel copyout(x2, x3, x4)
> + {
> + x2 = TWO();
> + x3 = THREE();
> + x4 = FOUR();
> + }
> +#ifdef ACC_DEVICE_TYPE_host
> + if (x2 != 2 || x3 != 3 || x4 != 4)
> + __builtin_abort();
> +#else
> + if (x2 != -2 || x3 != -3 || x4 != -4)
> + __builtin_abort();
> +#endif
> +
> + return 0;
> +}
>
> I'd also like to add test cases where the host and device function
> definitions are in separate files, so I'll try to figure out how to do
> that in the libgomp testsuite.
I thought we're using lto, so being in separate files doens't really
matter in the end.
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do link } */
> +
> +extern int three (void);
> +
> +#pragma acc routine (three) nohost
> +__attribute__((noinline))
> +int three(void)
> +{
> + return 3;
> +}
> +
> +int main(void)
> +{
> + return (three() == 3) ? 0 : 1;
> +}
> +
> +/* Expecting link to fail; "undefined reference to `three'" (or similar).
> + { dg-excess-errors "" } */
>
> This results in an XFAIL, which is not nice. Is there a mechanism in the
> GCC testsuite/DejaGnu to check for an expected link failure (due to a
> missing symbol)? I guess we could cook up something that instead
> triggers a link failure for a duplicate or incompatible symbol
> definition?
This is an interesting test case. So what's supposed to happen if a
nohost routine is called outside of an acc context? Should it still work
or not?
As mentioned above, I don't think there should be a missing symbol
error. Maybe check for a "LIBGOMP: invalid call to nohost function".
> --- libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
> @@ -1,5 +1,5 @@
> ! { dg-do run }
> -! { dg-xfail-if "not found" { openacc_host_selected } }
> +! { dg-xfail-if "TODO" { *-*-* } }
>
> TODO. ICE, if I remember correctly.
Cesar
More information about the Gcc-patches
mailing list