Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

Thomas Schwinge thomas@codesourcery.com
Wed Feb 10 11:51:00 GMT 2016


Hi!

Ping.

On Thu, 04 Feb 2016 15:47:25 +0100, I wrote:
> Here is the patch re-worked for trunk.  Instead of passing
> -foffload-force in the affected libgomp test cases, I instead chose to
> have them expect the warning.  This way, we're testing more in line to
> what users will be doing, and we'll notice how the OpenACC kernels
> handling improves, when parloops gets able to parallelize more offloaded
> code (and the "avoid offloading" handling will no longer trigger).  OK to
> commit?
> 
> commit acd66946777671486a0f69706b25a3ec5f877306
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Tue Feb 2 20:41:42 2016 +0100
> 
>     Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"
>     
>     	gcc/
>     	* common.opt: Add -foffload-force.
>     	* lto-wrapper.c (merge_and_complain, append_compiler_options):
>     	Handle it.
>     	* doc/invoke.texi: Document it.
>     	* config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
>     	(record_id): Parse, and set it.
>     	(process): Use it.
>     	* config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
>     	offloading".
>     	(nvptx_record_offload_symbol): Use it.
>     	(nvptx_goacc_validate_dims): Set it.
>     	libgomp/
>     	* libgomp.h (gomp_offload_target_available_p): New function
>     	declaration.
>     	* target.c (gomp_offload_target_available_p): New function
>     	definition.
>     	(GOMP_offload_register_ver, GOMP_offload_unregister_ver)
>     	(gomp_init_device, gomp_unload_device): Handle and document "avoid
>     	offloading" flag ("host_table == NULL").
>     	(resolve_device): Document "avoid offloading".
>     	* oacc-init.c (resolve_device): Likewise.
>     	* libgomp.texi (Enabling OpenACC): Likewise.
>     	* testsuite/lib/libgomp.exp
>     	(check_effective_target_nvptx_offloading_configured): New proc
>     	definition.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
>     	file.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Expect warning.
>     	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
>     
>     	libgomp/
>     	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: Set
>     	"-ftree-parallelize-loops=32".
>     	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
> ---
>  gcc/common.opt                                     |    4 +
>  gcc/config/nvptx/mkoffload.c                       |   73 +++++++++++-
>  gcc/config/nvptx/nvptx.c                           |   42 ++++++-
>  gcc/doc/invoke.texi                                |   12 +-
>  gcc/lto-wrapper.c                                  |    2 +
>  libgomp/libgomp.h                                  |    1 +
>  libgomp/libgomp.texi                               |    8 ++
>  libgomp/oacc-init.c                                |   19 ++-
>  libgomp/target.c                                   |  122 ++++++++++++++++----
>  libgomp/testsuite/lib/libgomp.exp                  |   10 ++
>  .../testsuite/libgomp.oacc-c-c++-common/abort-3.c  |    4 +-
>  .../testsuite/libgomp.oacc-c-c++-common/abort-4.c  |    4 +-
>  .../libgomp.oacc-c-c++-common/avoid-offloading-1.c |   28 +++++
>  .../libgomp.oacc-c-c++-common/avoid-offloading-2.c |   38 ++++++
>  .../libgomp.oacc-c-c++-common/avoid-offloading-3.c |   29 +++++
>  .../combined-directives-1.c                        |    4 +-
>  .../libgomp.oacc-c-c++-common/default-1.c          |    4 +-
>  .../libgomp.oacc-c-c++-common/deviceptr-1.c        |    4 +-
>  .../libgomp.oacc-c-c++-common/host_data-1.c        |    1 +
>  .../libgomp.oacc-c-c++-common/kernels-1.c          |   10 +-
>  .../kernels-alias-ipa-pta-2.c                      |    4 +-
>  .../kernels-alias-ipa-pta-3.c                      |    4 +-
>  .../kernels-alias-ipa-pta.c                        |    4 +-
>  .../libgomp.oacc-c-c++-common/kernels-empty.c      |    2 +-
>  .../kernels-loop-and-seq-2.c                       |    3 +-
>  .../kernels-loop-and-seq-3.c                       |    4 +-
>  .../kernels-loop-and-seq-4.c                       |    3 +-
>  .../kernels-loop-and-seq-5.c                       |    3 +-
>  .../kernels-loop-and-seq-6.c                       |    3 +-
>  .../kernels-loop-and-seq.c                         |    4 +-
>  .../kernels-loop-collapse.c                        |    3 +-
>  .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |    2 +-
>  .../libgomp.oacc-fortran/avoid-offloading-1.f      |   32 +++++
>  .../libgomp.oacc-fortran/avoid-offloading-2.f      |   41 +++++++
>  .../libgomp.oacc-fortran/avoid-offloading-3.f      |   31 +++++
>  .../libgomp.oacc-fortran/combined-directives-1.f90 |    5 +-
>  .../libgomp.oacc-fortran/non-scalar-data.f90       |    5 +-
>  37 files changed, 494 insertions(+), 78 deletions(-)
> 
> diff --git gcc/common.opt gcc/common.opt
> index 520fa9c..2cf798d 100644
> --- gcc/common.opt
> +++ gcc/common.opt
> @@ -1779,6 +1779,10 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
>  EnumValue
>  Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
>  
> +foffload-force
> +Common Var(flag_offload_force)
> +Force offloading if the compiler wanted to avoid it.
> +
>  fomit-frame-pointer
>  Common Report Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
> diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
> index c8eed45..586ee8b 100644
> --- gcc/config/nvptx/mkoffload.c
> +++ gcc/config/nvptx/mkoffload.c
> @@ -41,9 +41,19 @@ const char tool_name[] = "nvptx mkoffload";
>  
>  #define COMMENT_PREFIX "#"
>  
> +enum id_map_flag
> +  {
> +    /* All clear.  */
> +    ID_MAP_FLAG_NONE = 0,
> +    /* Avoid offloading.  For example, because there is no sufficient
> +       parallelism.  */
> +    ID_MAP_FLAG_AVOID_OFFLOADING = 1
> +  };
> +
>  struct id_map
>  {
>    id_map *next;
> +  int flags;
>    char *ptx_name;
>  };
>  
> @@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where)
>      fatal_error (input_location, "malformed ptx file");
>  
>    id_map *v = XNEW (id_map);
> +
> +  /* Do we have any flags?  */
> +  v->flags = ID_MAP_FLAG_NONE;
> +  if (p1[0] == '(')
> +    {
> +      /* Current flag.  */
> +      const char *cur = p1 + 1;
> +
> +      /* Seek to the beginning of ") ".  */
> +      p1 = strchr (cur, ')');
> +      if (!p1 || p1 > end || p1[1] != ' ')
> +	fatal_error (input_location, "malformed ptx file: "
> +		     "expected \") \" at \"%s\"", cur);
> +
> +      while (cur < p1)
> +	{
> +	  const char *next = strchr (cur, ',');
> +	  if (!next || next > p1)
> +	    next = p1;
> +
> +	  if (strncmp (cur, "avoid offloading", next - cur - 1) == 0)
> +	    v->flags |= ID_MAP_FLAG_AVOID_OFFLOADING;
> +	  else
> +	    fatal_error (input_location, "malformed ptx file: "
> +			 "unknown flag at \"%s\"", cur);
> +
> +	  cur = next;
> +	}
> +
> +      /* Skip past ") ".  */
> +      p1 += 2;
> +    }
>    size_t len = end - p1;
>    v->ptx_name = XNEWVEC (char, len + 1);
>    memcpy (v->ptx_name, p1, len);
> @@ -296,12 +338,17 @@ process (FILE *in, FILE *out)
>    fprintf (out, "\n};\n\n");
>  
>    /* Dump out function idents.  */
> +  bool avoid_offloading_p = false;
>    fprintf (out, "static const struct nvptx_fn {\n"
>  	   "  const char *name;\n"
>  	   "  unsigned short dim[%d];\n"
>  	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
>    for (comma = "", id = func_ids; id; comma = ",", id = id->next)
> -    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
> +    {
> +      if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING)
> +	avoid_offloading_p = true;
> +      fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
> +    }
>    fprintf (out, "\n};\n\n");
>  
>    fprintf (out,
> @@ -318,7 +365,11 @@ process (FILE *in, FILE *out)
>  	   "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
>  	   "  func_mappings,"
>  	   "  sizeof (func_mappings) / sizeof (func_mappings[0])\n"
> -	   "};\n\n");
> +	   "};\n");
> +  if (avoid_offloading_p)
> +    /* Need a unique handle for target_data.  */
> +    fprintf (out, "static int target_data_avoid_offloading;\n");
> +  fprintf (out, "\n");
>  
>    fprintf (out, "#ifdef __cplusplus\n"
>  	   "extern \"C\" {\n"
> @@ -338,18 +389,28 @@ process (FILE *in, FILE *out)
>    fprintf (out, "static __attribute__((constructor)) void init (void)\n"
>  	   "{\n"
>  	   "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
> -	   "%d/*NVIDIA_PTX*/, &target_data);\n"
> -	   "};\n",
> +	   "%d/*NVIDIA_PTX*/, &target_data);\n",
>  	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
>  	   GOMP_DEVICE_NVIDIA_PTX);
> +  if (avoid_offloading_p)
> +    fprintf (out, "  GOMP_offload_register_ver (%#x, (void *) 0,"
> +	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
> +	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
> +	     GOMP_DEVICE_NVIDIA_PTX);
> +  fprintf (out, "};\n");
>  
>    fprintf (out, "static __attribute__((destructor)) void fini (void)\n"
>  	   "{\n"
>  	   "  GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__,"
> -	   "%d/*NVIDIA_PTX*/, &target_data);\n"
> -	   "};\n",
> +	   "%d/*NVIDIA_PTX*/, &target_data);\n",
>  	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
>  	   GOMP_DEVICE_NVIDIA_PTX);
> +  if (avoid_offloading_p)
> +    fprintf (out, "  GOMP_offload_unregister_ver (%#x, (void *) 0,"
> +	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
> +	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
> +	     GOMP_DEVICE_NVIDIA_PTX);
> +  fprintf (out, "};\n");
>  }
>  
>  static void
> diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
> index 78614f8..fe28154 100644
> --- gcc/config/nvptx/nvptx.c
> +++ gcc/config/nvptx/nvptx.c
> @@ -3803,6 +3803,9 @@ static const struct attribute_spec nvptx_attribute_table[] =
>    /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>         affects_type_identity } */
>    { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
> +  /* Avoid offloading.  For example, because there is no sufficient
> +     parallelism.  */
> +  { "omp avoid offloading", 0, 0, true, false, false, NULL, false },
>    { NULL, 0, 0, false, false, false, NULL, false }
>  };
>  

> @@ -3867,7 +3870,10 @@ nvptx_record_offload_symbol (tree decl)
>  	tree dims = TREE_VALUE (attr);
>  	unsigned ix;
>  
> -	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
> +	fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"",
> +		 (lookup_attribute ("omp avoid offloading",
> +				    DECL_ATTRIBUTES (decl))
> +		  ? "(avoid offloading) " : ""),
>  		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
>  
>  	for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
> @@ -4124,6 +4130,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
>  static bool
>  nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
>  {
> +  /* Detect if a function is unsuitable for offloading.  */
> +  if (!flag_offload_force && decl)
> +    {
> +      tree oacc_function_attr = get_oacc_fn_attrib (decl);
> +      if (oacc_function_attr
> +	  && oacc_fn_attrib_kernels_p (oacc_function_attr))
> +	{
> +	  bool avoid_offloading_p = true;
> +	  for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
> +	    {
> +	      if (dims[ix] > 1)
> +		{
> +		  avoid_offloading_p = false;
> +		  break;
> +		}
> +	    }
> +	  if (avoid_offloading_p)
> +	    {
> +	      /* OpenACC kernels constructs will never be parallelized for
> +		 optimization levels smaller than -O2; avoid the diagnostic in
> +		 this case.  */
> +	      if (optimize >= 2)
> +		warning_at (DECL_SOURCE_LOCATION (decl), 0,
> +			    "OpenACC kernels construct will be executed "
> +			    "sequentially; will by default avoid offloading "
> +			    "to prevent data copy penalty");
> +	      DECL_ATTRIBUTES (decl)
> +		= tree_cons (get_identifier ("omp avoid offloading"),
> +			     NULL_TREE, DECL_ATTRIBUTES (decl));
> +
> +	    }
> +	}
> +    }
> +
>    bool changed = false;
>  
>    /* The vector size must be 32, unless this is a SEQ routine.  */
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index fcc404e..c09fbc5 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -180,7 +180,8 @@ in the following sections.
>  @gccoptlist{-ansi  -std=@var{standard}  -fgnu89-inline @gol
>  -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol
>  -fno-asm  -fno-builtin  -fno-builtin-@var{function} @gol
> --fhosted  -ffreestanding -fopenacc -fopenmp -fopenmp-simd @gol
> +-fhosted  -ffreestanding @gol
> +-foffload-force -fopenacc -fopenacc-dim=@var{geom} -fopenmp -fopenmp-simd @gol
>  -fms-extensions -fplan9-extensions -fsso-struct=@var{endianness}
>  -fallow-single-precision  -fcond-mismatch -flax-vector-conversions @gol
>  -fsigned-bitfields  -fsigned-char @gol
> @@ -1953,6 +1954,15 @@ This is equivalent to @option{-fno-hosted}.
>  @xref{Standards,,Language Standards Supported by GCC}, for details of
>  freestanding and hosted environments.
>  
> +@item -foffload-force
> +@opindex -foffload-force
> +The option @option{-foffload-force} forces offloading if the compiler
> +wanted to avoid it.  For example, when there isn't sufficient
> +parallelism in certain offloading constructs, the compiler may come to
> +the conclusion that offloading incurs too much overhead (for data
> +transfers, for example), and unless overridden with this flag, it then
> +suggests to the runtime (libgomp) to avoid offloading.
> +
>  @item -fopenacc
>  @opindex fopenacc
>  @cindex OpenACC accelerator programming
> diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
> index ced6f2f..702ae47 100644
> --- gcc/lto-wrapper.c
> +++ gcc/lto-wrapper.c
> @@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options,
>  	case OPT_fsigned_zeros:
>  	case OPT_ftrapping_math:
>  	case OPT_fwrapv:
> +	case OPT_foffload_force:
>  	case OPT_fopenmp:
>  	case OPT_fopenacc:
>  	case OPT_fcilkplus:
> @@ -517,6 +518,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
>  	case OPT_fsigned_zeros:
>  	case OPT_ftrapping_math:
>  	case OPT_fwrapv:
> +	case OPT_foffload_force:
>  	case OPT_fopenmp:
>  	case OPT_fopenacc:
>  	case OPT_fopenacc_dim_:
> diff --git libgomp/libgomp.h libgomp/libgomp.h
> index 7108a6d..8747b72 100644
> --- libgomp/libgomp.h
> +++ libgomp/libgomp.h
> @@ -984,6 +984,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, bool);
>  extern void gomp_init_device (struct gomp_device_descr *);
>  extern void gomp_free_memmap (struct splay_tree_s *);
>  extern void gomp_unload_device (struct gomp_device_descr *);
> +extern bool gomp_offload_target_available_p (int);
>  
>  /* work.c */
>  
> diff --git libgomp/libgomp.texi libgomp/libgomp.texi
> index 987ee5f..5795c00 100644
> --- libgomp/libgomp.texi
> +++ libgomp/libgomp.texi
> @@ -1815,6 +1815,14 @@ flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
>  arranges for automatic linking of the OpenACC runtime library 
>  (@ref{OpenACC Runtime Library Routines}).
>  
> +Offloading is enabled by default.  In some cases, the compiler may
> +come to the conclusion that offloading incurs too much overhead, and
> +suggest to the runtime to avoid it.  To counteract that, you can use
> +the option @option{-foffload-force} to force offloading in such cases.
> +Alternatively, offloading is also enabled if a specific device type is
> +requested, in a call to @code{acc_init} or by setting the
> +@env{ACC_DEVICE_TYPE} environment variable, for example.
> +
>  A complete description of all OpenACC directives accepted may be found in 
>  the @uref{http://www.openacc.org/, OpenACC} Application Programming
>  Interface manual, version 2.0.
> diff --git libgomp/oacc-init.c libgomp/oacc-init.c
> index 42d005d..2f053f3 100644
> --- libgomp/oacc-init.c
> +++ libgomp/oacc-init.c
> @@ -122,7 +122,10 @@ resolve_device (acc_device_t d, bool fail_is_error)
>        {
>  	if (goacc_device_type)
>  	  {
> -	    /* Lookup the named device.  */
> +	    /* Lookup the device that has been explicitly named, so do not pay
> +	       attention to gomp_offload_target_available_p.  (That is,
> +	       enforced usage even with an "avoid offloading" flag set, and
> +	       hard error if not actually available.)  */
>  	    while (++d != _ACC_device_hwm)
>  	      if (dispatchers[d]
>  		  && !strcasecmp (goacc_device_type,
> @@ -148,8 +151,15 @@ resolve_device (acc_device_t d, bool fail_is_error)
>      case acc_device_not_host:
>        /* Find the first available device after acc_device_not_host.  */
>        while (++d != _ACC_device_hwm)
> -	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
> +	if (dispatchers[d]
> +	    && dispatchers[d]->get_num_devices_func () > 0
> +	    /* No device has been explicitly named, so pay attention to
> +	       gomp_offload_target_available_p, to not decide on an offload
> +	       target that we don't have offload data available for, or have an
> +	       "avoid offloading" flag set for.  */
> +	    && gomp_offload_target_available_p (dispatchers[d]->type))
>  	  goto found;
> +      /* No non-host device found.  */
>        if (d_arg == acc_device_default)
>  	{
>  	  d = acc_device_host;
> @@ -168,7 +178,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>        break;
>  
>      default:
> -      if (d > _ACC_device_hwm)
> +      if (d >= _ACC_device_hwm)
>  	{
>  	  if (fail_is_error)
>  	    goto unsupported_device;
> @@ -181,7 +191,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
>  
>    assert (d != acc_device_none
>  	  && d != acc_device_default
> -	  && d != acc_device_not_host);
> +	  && d != acc_device_not_host
> +	  && d < _ACC_device_hwm);
>  
>    if (dispatchers[d] == NULL && fail_is_error)
>      {
> diff --git libgomp/target.c libgomp/target.c
> index 96fe3d5..afcbedb 100644
> --- libgomp/target.c
> +++ libgomp/target.c
> @@ -1165,12 +1165,19 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
>  
>  /* This function should be called from every offload image while loading.
>     It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +   the target, and TARGET_DATA needed by target plugin.
> +
> +   If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid
> +   offloading" flag, and the TARGET_TYPE will not be considered by default
> +   until this image gets unregistered.  */
>  
>  void
>  GOMP_offload_register_ver (unsigned version, const void *host_table,
>  			   int target_type, const void *target_data)
>  {
> +  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
> +	      version, host_table, target_type, target_data);
> +
>    int i;
>  
>    if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
> @@ -1179,16 +1186,19 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
>    
>    gomp_mutex_lock (&register_lock);
>  
> -  /* Load image to all initialized devices.  */
> -  for (i = 0; i < num_devices; i++)
> +  if (host_table != NULL)
>      {
> -      struct gomp_device_descr *devicep = &devices[i];
> -      gomp_mutex_lock (&devicep->lock);
> -      if (devicep->type == target_type
> -	  && devicep->state == GOMP_DEVICE_INITIALIZED)
> -	gomp_load_image_to_device (devicep, version,
> -				   host_table, target_data, true);
> -      gomp_mutex_unlock (&devicep->lock);
> +      /* Load image to all initialized devices.  */
> +      for (i = 0; i < num_devices; i++)
> +	{
> +	  struct gomp_device_descr *devicep = &devices[i];
> +	  gomp_mutex_lock (&devicep->lock);
> +	  if (devicep->type == target_type
> +	      && devicep->state == GOMP_DEVICE_INITIALIZED)
> +	    gomp_load_image_to_device (devicep, version,
> +				       host_table, target_data, true);
> +	  gomp_mutex_unlock (&devicep->lock);
> +	}
>      }
>  
>    /* Insert image to array of pending images.  */
> @@ -1214,26 +1224,36 @@ GOMP_offload_register (const void *host_table, int target_type,
>  
>  /* This function should be called from every offload image while unloading.
>     It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +   the target, and TARGET_DATA needed by target plugin.
> +
> +   If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for this
> +   image (TARGET_DATA), and this TARGET_TYPE may again be considered by
> +   default.  */
>  
>  void
>  GOMP_offload_unregister_ver (unsigned version, const void *host_table,
>  			     int target_type, const void *target_data)
>  {
> +  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
> +	      version, host_table, target_type, target_data);
> +
>    int i;
>  
>    gomp_mutex_lock (&register_lock);
>  
> -  /* Unload image from all initialized devices.  */
> -  for (i = 0; i < num_devices; i++)
> +  if (host_table != NULL)
>      {
> -      struct gomp_device_descr *devicep = &devices[i];
> -      gomp_mutex_lock (&devicep->lock);
> -      if (devicep->type == target_type
> -	  && devicep->state == GOMP_DEVICE_INITIALIZED)
> -	gomp_unload_image_from_device (devicep, version,
> -				       host_table, target_data);
> -      gomp_mutex_unlock (&devicep->lock);
> +      /* Unload image from all initialized devices.  */
> +      for (i = 0; i < num_devices; i++)
> +	{
> +	  struct gomp_device_descr *devicep = &devices[i];
> +	  gomp_mutex_lock (&devicep->lock);
> +	  if (devicep->type == target_type
> +	      && devicep->state == GOMP_DEVICE_INITIALIZED)
> +	    gomp_unload_image_from_device (devicep, version,
> +					   host_table, target_data);
> +	  gomp_mutex_unlock (&devicep->lock);
> +	}
>      }
>  
>    /* Remove image from array of pending images.  */
> @@ -1267,7 +1287,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
>    for (i = 0; i < num_offload_images; i++)
>      {
>        struct offload_image_descr *image = &offload_images[i];
> -      if (image->type == devicep->type)
> +      if (image->type == devicep->type
> +	  && image->host_table != NULL)
>  	gomp_load_image_to_device (devicep, image->version,
>  				   image->host_table, image->target_data,
>  				   false);
> @@ -1287,7 +1308,8 @@ gomp_unload_device (struct gomp_device_descr *devicep)
>        for (i = 0; i < num_offload_images; i++)
>  	{
>  	  struct offload_image_descr *image = &offload_images[i];
> -	  if (image->type == devicep->type)
> +	  if (image->type == devicep->type
> +	      && image->host_table != NULL)
>  	    gomp_unload_image_from_device (devicep, image->version,
>  					   image->host_table,
>  					   image->target_data);
> @@ -1311,6 +1333,62 @@ gomp_free_memmap (struct splay_tree_s *mem_map)
>      }
>  }
>  
> +/* Do we have offload data available for the given offload target type?
> +   Instead of verifying that *all* offload data is available that could
> +   possibly be required, we instead just look for *any*.  If we later find any
> +   offload data missing, that's user error.  If any offload data of this target
> +   type is tagged with an "avoid offloading" flag, do not consider this target
> +   type available unless it has been initialized already.  */
> +
> +attribute_hidden bool
> +gomp_offload_target_available_p (int type)
> +{
> +  bool available = false;
> +
> +  /* Has the offload target type already been initialized?  */
> +  for (int i = 0; !available && i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      gomp_mutex_lock (&devicep->lock);
> +      if (devicep->type == type
> +	  && devicep->state == GOMP_DEVICE_INITIALIZED)
> +	available = true;
> +      gomp_mutex_unlock (&devicep->lock);
> +    }
> +
> +  /* If the offload target type has been initialized already, we ignore "avoid
> +     offloading" flags.  This is important, because data/state may be present
> +     on the device, that we must continue to use.  */
> +  if (!available)
> +    {
> +      gomp_mutex_lock (&register_lock);
> +      if (num_offload_images == 0)
> +	/* If there is no offload data available at all, there is no way to
> +	   later fail to find any of it for a specific offload target type.
> +	   This is the case where there are no offloaded code regions in user
> +	   code, but the target type can be initialized successfully, and
> +	   executable directqives be used, or runtime library calls be
> +	   made.  */
> +	available = true;
> +      else
> +	{
> +	  /* Can the offload target be initialized?  */
> +	  for (int i = 0; !available && i < num_offload_images; i++)
> +	    if (offload_images[i].type == type
> +		&& offload_images[i].host_table != NULL)
> +	      available = true;
> +	  /* If yes, is an "avoid offloading" flag set?  */
> +	  for (int i = 0; available && i < num_offload_images; i++)
> +	    if (offload_images[i].type == type
> +		&& offload_images[i].host_table == NULL)
> +	      available = false;
> +	}
> +      gomp_mutex_unlock (&register_lock);
> +    }
> +
> +  return available;
> +}
> +
>  /* Host fallback for GOMP_target{,_ext} routines.  */
>  
>  static void
> diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
> index a4c9d83..8d2be80 100644
> --- libgomp/testsuite/lib/libgomp.exp
> +++ libgomp/testsuite/lib/libgomp.exp
> @@ -344,6 +344,16 @@ proc check_effective_target_offload_device_nonshared_as { } {
>      } ]
>  }
>  
> +# Return 1 if the compiler has been configured for nvptx offloading.
> +
> +proc check_effective_target_nvptx_offloading_configured { } {
> +    # PR libgomp/65099: Currently, we only support offloading in 64-bit
> +    # configurations.
> +    global offload_targets
> +    return [expr [string match "*,nvptx,*" ",$offload_targets,"] \
> +		&& [is-effective-target lp64] ]
> +}
> +
>  # Return 1 if at least one nvidia board is present.
>  
>  proc check_effective_target_openacc_nvidia_accel_present { } {
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> index bca425e..23156d8 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> @@ -1,5 +1,3 @@
> -/* { dg-do run } */
> -
>  #include <stdio.h>
>  #include <stdlib.h>
>  
> @@ -7,7 +5,7 @@ int
>  main (void)
>  {
>    fprintf (stderr, "CheCKpOInT\n");
> -#pragma acc kernels
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      abort ();
>    }
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> index c29ca3f..f4d6a07 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> @@ -1,12 +1,10 @@
> -/* { dg-do run } */
> -
>  #include <stdlib.h>
>  
>  int
>  main (int argc, char **argv)
>  {
>  
> -#pragma acc kernels
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      if (argc != 1)
>        abort ();
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> new file mode 100644
> index 0000000..08745fc
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> @@ -0,0 +1,28 @@
> +/* Test that the compiler decides to "avoid offloading".  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> +/* The ACC_DEVICE_TYPE environment variable gets set in the testing
> +   framework, and that overrides the "avoid offloading" flag at run time.
> +   { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
> +  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
> +
> +  if (x != 33)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
> +  if (y != 1)
> +    __builtin_abort();
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> new file mode 100644
> index 0000000..724228a
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> @@ -0,0 +1,38 @@
> +/* Test that a user can override the compiler's "avoid offloading"
> +   decision at run time.  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  /* Override the compiler's "avoid offloading" decision.  */
> +  acc_device_t d;
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  d = acc_device_nvidia;
> +#elif defined ACC_DEVICE_TYPE_host
> +  d = acc_device_host;
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +  acc_init (d);
> +
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
> +  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
> +
> +  if (x != 33)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  if (y != 0)
> +    __builtin_abort();
> +#else
> +  if (y != 1)
> +    __builtin_abort();
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> new file mode 100644
> index 0000000..2fb5196
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> @@ -0,0 +1,29 @@
> +/* Test that a user can override the compiler's "avoid offloading"
> +   decision at compile time.  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels
> +  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
> +
> +  if (x != 33)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  if (y != 0)
> +    __builtin_abort();
> +#else
> +  if (y != 1)
> +    __builtin_abort();
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> index dad6d13..87ca378 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> @@ -1,6 +1,6 @@
>  /* This test exercises combined directives.  */
>  
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
>  
> @@ -33,7 +33,7 @@ main (int argc, char **argv)
>  	abort ();
>      }
>  
> -#pragma acc kernels loop copy (a[0:N]) copy (b[0:N])
> +#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    for (i = 0; i < N; i++)
>      {
>        b[i] = 3.0;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> index 1ac0b95..8f0144c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include  <openacc.h>
>  
> @@ -51,7 +51,7 @@ int test_kernels ()
>      ary[i] = ~0;
>  
>    /* val defaults to copy, ary defaults to copy.  */
> -#pragma acc kernels copy(ondev)
> +#pragma acc kernels copy(ondev) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      ondev = acc_on_device (acc_device_not_host);
>  #pragma acc loop 
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> index e271a37..9a5f7b1 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> @@ -1,5 +1,3 @@
> -/* { dg-do run } */
> -
>  #include <stdlib.h>
>  
>  int main (void)
> @@ -10,7 +8,7 @@ int main (void)
>    a = A;
>  
>  #pragma acc data copyout (a_1, a_2)
> -#pragma acc kernels deviceptr (a)
> +#pragma acc kernels deviceptr (a) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a_1 = a;
>      a_2 = &a;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> index 51745ba..3ef6f9b 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-do run { target openacc_nvidia_accel_selected } } */
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
>  
>  #include <stdlib.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> index 3acfdf5..614ad33 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
>  
> @@ -73,7 +73,7 @@ int main (void)
>    i = -1;
>    j = -2;
>    v = 0;
> -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
> +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      if (i != -1 || j != -2)
>        abort ();
> @@ -96,7 +96,7 @@ int main (void)
>    i = -1;
>    j = -2;
>    v = 0;
> -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
> +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      i = 2;
>      j = 1;
> @@ -110,7 +110,7 @@ int main (void)
>    i = -1;
>    j = -2;
>    v = 0;
> -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j)
> +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      if (i != -1 || j != -2)
>        abort ();
> @@ -126,7 +126,7 @@ int main (void)
>    i = -1;
>    j = -2;
>    v = 0;
> -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, j)
> +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      i = 2;
>      j = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> index 0f323c8..8d5101d 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-additional-options "-O2 -fipa-pta" } */
> +/* { dg-additional-options "-fipa-pta" } */
>  
>  #include <stdlib.h>
>  
> @@ -11,7 +11,7 @@ main (void)
>    unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int));
>    unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
>  
> -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
> +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a[0] = 0;
>      b[0] = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> index 654e750..3726b0c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> @@ -1,4 +1,4 @@
> -/* { dg-additional-options "-O2 -fipa-pta" } */
> +/* { dg-additional-options "-fipa-pta" } */
>  
>  #include <stdlib.h>
>  
> @@ -11,7 +11,7 @@ main (void)
>    unsigned int *b = a;
>    unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
>  
> -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
> +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a[0] = 0;
>      b[0] = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> index 44d4fd2..eea4f76 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> @@ -1,4 +1,4 @@
> -/* { dg-additional-options "-O2 -fipa-pta" } */
> +/* { dg-additional-options "-fipa-pta" } */
>  
>  #include <stdlib.h>
>  
> @@ -11,7 +11,7 @@ main (void)
>    unsigned int b[N];
>    unsigned int c[N];
>  
> -#pragma acc kernels pcopyout (a, b, c)
> +#pragma acc kernels pcopyout (a, b, c) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a[0] = 0;
>      b[0] = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> index a68a7cd..860b6da 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> @@ -1,6 +1,6 @@
>  int
>  main (void)
>  {
> -#pragma acc kernels
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    ;
>  }
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> index 2e4100f..5cdc200 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -8,7 +7,7 @@
>  unsigned int
>  foo (int n, unsigned int *a)
>  {
> -#pragma acc kernels copy (a[0:N])
> +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a[0] = a[0] + 1;
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> index b3e736b..2e4d4d2 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -8,8 +7,7 @@
>  unsigned int
>  foo (int n, unsigned int *a)
>  {
> -
> -#pragma acc kernels copy (a[0:N])
> +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      for (int i = 0; i < n; i++)
>        a[i] = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> index 8b9affa..5bf00db 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -8,7 +7,7 @@
>  unsigned int
>  foo (int n, unsigned int *a)
>  {
> -#pragma acc kernels copy (a[0:N])
> +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      a[0] = 2;
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> index 83d4e7f..d39b667 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -9,7 +8,7 @@ unsigned int
>  foo (int n, unsigned int *a)
>  {
>    int r;
> -#pragma acc kernels copyout(r) copy (a[0:N])
> +#pragma acc kernels copyout(r) copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      r = a[0];
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> index 01d5e5e..bb2e85b 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -8,7 +7,7 @@
>  unsigned int
>  foo (int n, unsigned int *a)
>  {
> -#pragma acc kernels copy (a[0:N])
> +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      int r = a[0];
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> index 61d1283..e513827 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -8,8 +7,7 @@
>  unsigned int
>  foo (int n, unsigned int *a)
>  {
> -
> -#pragma acc kernels copy (a[0:N])
> +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>      for (int i = 0; i < n; i++)
>        a[i] = 1;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> index f7f04cb..c4791a4 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
> @@ -11,7 +10,7 @@ void __attribute__((noinline, noclone))
>  foo (int m, int n)
>  {
>    int i, j;
> -  #pragma acc kernels
> +  #pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } } */
>    {
>  #pragma acc loop collapse(2)
>      for (i = 0; i < m; i++)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> index c164598..94a5ae2 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> new file mode 100644
> index 0000000..5f18b94
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> @@ -0,0 +1,32 @@
> +! Test that the compiler decides to "avoid offloading".
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops=32" }
> +! The "avoid offloading" warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
> +! The ACC_DEVICE_TYPE environment variable gets set in the testing
> +! framework, and that overrides the "avoid offloading" flag at run time.
> +! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } }
> +      X = 33
> +      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST);
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
> +      IF (.NOT. Y) CALL ABORT
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> new file mode 100644
> index 0000000..51801ad
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> @@ -0,0 +1,41 @@
> +! Test that a user can override the compiler's "avoid offloading"
> +! decision at run time.
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops=32" }
> +! The "avoid offloading" warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER :: D
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!     Override the compiler's "avoid offloading" decision.
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      D = ACC_DEVICE_NVIDIA
> +#elif defined ACC_DEVICE_TYPE_host
> +      D = ACC_DEVICE_HOST
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +      CALL ACC_INIT (D)
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } }
> +      X = 33
> +      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      IF (Y) CALL ABORT
> +#else
> +      IF (.NOT. Y) CALL ABORT
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> new file mode 100644
> index 0000000..bea6ab8
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> @@ -0,0 +1,31 @@
> +! Test that a user can override the compiler's "avoid offloading"
> +! decision at compile time.
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops=32" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER :: D
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS
> +      X = 33
> +      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      IF (Y) CALL ABORT
> +#else
> +      IF (.NOT. Y) CALL ABORT
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> index 94100b2..4b52579 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> @@ -1,6 +1,9 @@
>  ! This test exercises combined directives.
>  
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops=32" }
> +! The "avoid offloading" warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>  
>  program main
>    integer, parameter :: n = 32
> @@ -27,7 +30,7 @@ program main
>    !$acc kernels loop copy (a(1:n)) copy (b(1:n))
>    do i = 1, n
>      b(i) = 3.0;
> -    a(i) = a(i) + b(i)
> +    a(i) = a(i) + b(i) ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } }
>    end do
>  
>    do i = 1, n
> diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> index 4afb562..b9298c7 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> @@ -2,6 +2,9 @@
>  ! offloaded regions are properly mapped using present_or_copy.
>  
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops=32" }
> +! The "avoid offloading" warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>  
>  program main
>    implicit none
> @@ -30,7 +33,7 @@ subroutine kernels (array, n)
>    integer, dimension (n) :: array
>    integer :: n, i
>  
> -  !$acc kernels
> +  !$acc kernels ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target nvptx_offloading_configured } }
>    do i = 1, n
>       array(i) = i
>    end do


Grüße
 Thomas



More information about the Gcc-patches mailing list