This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading" (was: [PATCH] Add fopt-info-oacc)


Hi!

On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> This patch introduces an option fopt-info-oacc.
> 
> When using the option like this with a kernels region in kernels-loop.c 
> that parloops does not manage to parallelize:
> ...
> $ gcc kernels-loop.c -S -O2 -fopenacc -fopt-info-oacc-all
> ...
> 
> we get a message:
> ...
> kernels-loop.c:23:9: note: kernels region executed sequentially. 
> Consider mapping it to host execution, to avoid data copy penalty.
> ...

Yay for helping the user understand what the compiler is doing!

> Any comments?

Telling from real-world code that we've been having a look at, when the
above situation happens, we're -- in the vast majority of all cases -- in
a situation where we generally want to avoid offloading (unless
explicitly requested), "to avoid data copy penalty" as well as typically
much slower single-threaded execution on the GPU.  Obviously, that will
have to be revisited as parloops (or any other mechanism in GCC) is able
to better understand/use the parallelism in OpenACC kernels constructs.

So, building upon Tom's patch, I have implemented an "avoid offloading"
flag given the presence of one un-parallelized OpenACC kernels construct.
This is currently only enabled for OpenACC kernels constructs, in
combination with nvptx offloading, but I think the general scheme will be
useful also for other constructs as well as other (non-shared memory)
offloading targets.

Also, "avoid offloading" is just a default: if a user explicitly
requested the use of, for example, a Nvidia GPU (with an
acc_init(acc_device_nvidia) call, or by setting the
ACC_DEVICE_TYPE=nvidia environemnt variable, for example), then we cannot
apply host-fallback execution, because in this case the user can
rightfully assume Nvidia GPU semantics (async clause works, and so on).


The new warning (very similar to the one that Tom proposed) also
uncovered a bunch of OpenACC kernels test cases in libgomp that did not
have OpenACC kernels processing enabled (-ftree-parallelize-loops), but
which parloops can handle fine once that is enabled -- and also a bunch
of OpenACC kernels test cases that parloops doesn't handle but it looked
as they were meant to be.  (Maybe I'm wrong about that, though.)  Anyway,
Tom, would you please make a note to audit all use of -foffload-force in
the libgomp testsuite?  (It is appropriate for all test cases that
parloops truely is not meant to handle, but for all others, that flag
should probably be removed and instead an XFAILed dg-bogus directive
added, so that we will notice (XPASS) once it does handle them.)


I've also added a new command-line option, -foffload-force, that restores
the current behavior, inhibits the "avoid offloading" handling.  This is
primarily meant for GCC (libgomp) testsuite usage, but could occasionally
also be useful for users.  Considering alternatives (that can be applied
in a more fine-grained way, case by case per OpenACC kernels construct):

1) a new GCC-specific pragma, for example:

    #pragma GCC force offloading
    #pragma acc kernels
      [un-parallelizable stuff]

2) a new GCC-specific clause, for example in the implementation
namespace, starting with "_":

    #pragma acc kernels _force_offloading
      [un-parallelizable stuff]

..., the -foffload-force flag was the simplest solution.  (Because, if
you're going to alter the sources anyway, you might as well just remove
the one offending OpenACC kernels construct...)


Committed to gomp-4_0-branch in r232709:

commit 41a76d233e714fd7b79dc1f40823f607c38306ba
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jan 21 21:52:50 2016 +0000

    Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"
    
    	gcc/
    	* common.opt: Add -foffload-force.
    	* lto-wrapper.c (merge_and_complain, append_compiler_options):
    	Handle it.
    	* doc/invoke.texi: Document it.
    	* config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
    	(record_id): Parse, and set it.
    	(process): Use it.
    	* config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
    	offloading".
    	(nvptx_record_offload_symbol): Use it.
    	(nvptx_goacc_validate_dims): Set it.
    	libgomp/
    	* target.c (GOMP_offload_register_ver)
    	(GOMP_offload_unregister_ver, gomp_init_device)
    	(gomp_unload_device, gomp_offload_target_available_p): Handle and
    	document "avoid offloading" ("host_table == NULL").
    	(resolve_device): Document "avoid offloading".
    	* oacc-init.c (resolve_device): Likewise.
    	* libgomp.texi (Enabling OpenACC): Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
    	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Set
    	"-foffload-force".
    	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
    	Likewise.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Set
    	"-ftree-parallelize-loops=32".
    	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232709 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |  14 ++
 gcc/common.opt                                     |   4 +
 gcc/config/nvptx/mkoffload.c                       |  73 +++++++++-
 gcc/config/nvptx/nvptx.c                           |  42 +++++-
 gcc/doc/invoke.texi                                |  11 +-
 gcc/lto-wrapper.c                                  |   2 +
 libgomp/ChangeLog.gomp                             | 150 +++++++++++++++++++++
 libgomp/libgomp.texi                               |   8 ++
 libgomp/oacc-init.c                                |   8 +-
 libgomp/target.c                                   |  86 ++++++++----
 .../testsuite/libgomp.oacc-c++/non-scalar-data.C   |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/abort-3.c  |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/abort-4.c  |   3 +-
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        |   1 +
 .../libgomp.oacc-c-c++-common/avoid-offloading-1.c |  25 ++++
 .../libgomp.oacc-c-c++-common/avoid-offloading-2.c |  38 ++++++
 .../libgomp.oacc-c-c++-common/avoid-offloading-3.c |  29 ++++
 .../combined-directives-1.c                        |   2 +-
 .../libgomp.oacc-c-c++-common/default-1.c          |   4 +-
 .../libgomp.oacc-c-c++-common/deviceptr-1.c        |   4 +-
 .../libgomp.oacc-c-c++-common/host_data-1.c        |   1 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c |   2 +-
 .../libgomp.oacc-c-c++-common/kernels-1.c          |   4 +-
 .../kernels-alias-ipa-pta-2.c                      |   5 +-
 .../kernels-alias-ipa-pta-3.c                      |   5 +-
 .../kernels-alias-ipa-pta.c                        |   5 +-
 .../libgomp.oacc-c-c++-common/kernels-empty.c      |   3 +
 .../kernels-loop-and-seq-2.c                       |   3 +-
 .../kernels-loop-and-seq-5.c                       |   3 +-
 .../kernels-loop-and-seq-6.c                       |   3 +-
 .../kernels-loop-and-seq.c                         |   3 +-
 .../kernels-loop-collapse.c                        |   3 +-
 .../kernels-private-vars-local-worker-1.c          |   3 +-
 .../kernels-private-vars-local-worker-2.c          |   3 +-
 .../kernels-private-vars-local-worker-3.c          |   3 +-
 .../kernels-private-vars-local-worker-4.c          |   3 +-
 .../kernels-private-vars-local-worker-5.c          |   3 +-
 .../kernels-private-vars-loop-gang-1.c             |   3 +-
 .../kernels-private-vars-loop-gang-2.c             |   3 +-
 .../kernels-private-vars-loop-gang-3.c             |   3 +-
 .../kernels-private-vars-loop-gang-4.c             |   3 +-
 .../kernels-private-vars-loop-gang-5.c             |   3 +-
 .../kernels-private-vars-loop-gang-6.c             |   4 +
 .../kernels-private-vars-loop-vector-1.c           |   3 +-
 .../kernels-private-vars-loop-vector-2.c           |   3 +-
 .../kernels-private-vars-loop-worker-1.c           |   3 +-
 .../kernels-private-vars-loop-worker-2.c           |   3 +-
 .../kernels-private-vars-loop-worker-3.c           |   3 +-
 .../kernels-private-vars-loop-worker-4.c           |   3 +-
 .../kernels-private-vars-loop-worker-5.c           |   3 +-
 .../kernels-private-vars-loop-worker-6.c           |   3 +-
 .../kernels-private-vars-loop-worker-7.c           |   3 +-
 .../kernels-reduction-1.c                          |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |   2 +-
 .../testsuite/libgomp.oacc-fortran/asyncwait-1.f90 |   1 +
 .../testsuite/libgomp.oacc-fortran/asyncwait-2.f90 |   1 +
 .../testsuite/libgomp.oacc-fortran/asyncwait-3.f90 |   1 +
 .../libgomp.oacc-fortran/avoid-offloading-1.f      |  29 ++++
 .../libgomp.oacc-fortran/avoid-offloading-2.f      |  40 ++++++
 .../libgomp.oacc-fortran/avoid-offloading-3.f      |  30 +++++
 .../libgomp.oacc-fortran/combined-directives-1.f90 |   1 +
 .../testsuite/libgomp.oacc-fortran/default-1.f90   |   3 +
 .../testsuite/libgomp.oacc-fortran/deviceptr-1.f90 |   5 +-
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90    |   5 +-
 .../kernels-acc-loop-reduction-2.f90               |   5 +
 .../kernels-acc-loop-reduction.f90                 |   5 +
 .../libgomp.oacc-fortran/kernels-collapse-3.f90    |   2 +
 .../libgomp.oacc-fortran/kernels-collapse-4.f90    |   2 +
 .../libgomp.oacc-fortran/kernels-independent.f90   |   2 +-
 .../libgomp.oacc-fortran/kernels-map-1.f90         |   3 +
 .../kernels-private-vars-loop-gang-2.f90           |   2 +
 .../kernels-private-vars-loop-gang-3.f90           |   2 +
 .../kernels-private-vars-loop-gang-6.f90           |   2 +
 .../kernels-private-vars-loop-vector-1.f90         |   2 +
 .../kernels-private-vars-loop-vector-2.f90         |   2 +
 .../kernels-private-vars-loop-worker-1.f90         |   2 +
 .../kernels-private-vars-loop-worker-2.f90         |   2 +
 .../kernels-private-vars-loop-worker-3.f90         |   2 +
 .../kernels-private-vars-loop-worker-4.f90         |   2 +
 .../kernels-private-vars-loop-worker-5.f90         |   2 +
 .../kernels-private-vars-loop-worker-6.f90         |   2 +
 .../kernels-private-vars-loop-worker-7.f90         |   2 +
 .../libgomp.oacc-fortran/kernels-reduction-1.f90   |   2 +
 .../libgomp.oacc-fortran/non-scalar-data.f90       |   1 +
 84 files changed, 700 insertions(+), 78 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index cdd279b..f991b91 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,17 @@
+2016-01-21  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* common.opt: Add -foffload-force.
+	* lto-wrapper.c (merge_and_complain, append_compiler_options):
+	Handle it.
+	* doc/invoke.texi: Document it.
+	* config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
+	(record_id): Parse, and set it.
+	(process): Use it.
+	* config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
+	offloading".
+	(nvptx_record_offload_symbol): Use it.
+	(nvptx_goacc_validate_dims): Set it.
+
 2016-01-20  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* gimplify.c (gimplify_scan_omp_clauses):  Consider OACC_{DATA,
diff --git gcc/common.opt gcc/common.opt
index 793a062..c905f71 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -1786,6 +1786,10 @@ Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
 EnumValue
 Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
 
+foffload-force
+Common Var(flag_offload_force)
+Force offloading if the compiler wanted to avoid it.
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
index cce562d..de6a8ad 100644
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -41,9 +41,19 @@ const char tool_name[] = "nvptx mkoffload";
 
 #define COMMENT_PREFIX "#"
 
+enum id_map_flag
+  {
+    /* All clear.  */
+    ID_MAP_FLAG_NONE = 0,
+    /* Avoid offloading.  For example, because there is no sufficient
+       parallelism.  */
+    ID_MAP_FLAG_AVOID_OFFLOADING = 1
+  };
+
 struct id_map
 {
   id_map *next;
+  int flags;
   char *ptx_name;
 };
 
@@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
+
+  /* Do we have any flags?  */
+  v->flags = ID_MAP_FLAG_NONE;
+  if (p1[0] == '(')
+    {
+      /* Current flag.  */
+      const char *cur = p1 + 1;
+
+      /* Seek to the beginning of ") ".  */
+      p1 = strchr (cur, ')');
+      if (!p1 || p1 > end || p1[1] != ' ')
+	fatal_error (input_location, "malformed ptx file: "
+		     "expected \") \" at \"%s\"", cur);
+
+      while (cur < p1)
+	{
+	  const char *next = strchr (cur, ',');
+	  if (!next || next > p1)
+	    next = p1;
+
+	  if (strncmp (cur, "avoid offloading", next - cur - 1) == 0)
+	    v->flags |= ID_MAP_FLAG_AVOID_OFFLOADING;
+	  else
+	    fatal_error (input_location, "malformed ptx file: "
+			 "unknown flag at \"%s\"", cur);
+
+	  cur = next;
+	}
+
+      /* Skip past ") ".  */
+      p1 += 2;
+    }
   size_t len = end - p1;
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
@@ -296,12 +338,17 @@ process (FILE *in, FILE *out)
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
+  bool avoid_offloading_p = false;
   fprintf (out, "static const struct nvptx_fn {\n"
 	   "  const char *name;\n"
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    {
+      if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING)
+	avoid_offloading_p = true;
+      fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    }
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
@@ -318,7 +365,11 @@ process (FILE *in, FILE *out)
 	   "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
 	   "  func_mappings,"
 	   "  sizeof (func_mappings) / sizeof (func_mappings[0])\n"
-	   "};\n\n");
+	   "};\n");
+  if (avoid_offloading_p)
+    /* Need a unique handle for target_data.  */
+    fprintf (out, "static int target_data_avoid_offloading;\n");
+  fprintf (out, "\n");
 
   fprintf (out, "#ifdef __cplusplus\n"
 	   "extern \"C\" {\n"
@@ -338,18 +389,28 @@ process (FILE *in, FILE *out)
   fprintf (out, "static __attribute__((constructor)) void init (void)\n"
 	   "{\n"
 	   "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
-	   "%d/*NVIDIA_PTX*/, &target_data);\n"
-	   "};\n",
+	   "%d/*NVIDIA_PTX*/, &target_data);\n",
 	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
 	   GOMP_DEVICE_NVIDIA_PTX);
+  if (avoid_offloading_p)
+    fprintf (out, "  GOMP_offload_register_ver (%#x, (void *) 0,"
+	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
+	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
+	     GOMP_DEVICE_NVIDIA_PTX);
+  fprintf (out, "};\n");
 
   fprintf (out, "static __attribute__((destructor)) void fini (void)\n"
 	   "{\n"
 	   "  GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__,"
-	   "%d/*NVIDIA_PTX*/, &target_data);\n"
-	   "};\n",
+	   "%d/*NVIDIA_PTX*/, &target_data);\n",
 	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
 	   GOMP_DEVICE_NVIDIA_PTX);
+  if (avoid_offloading_p)
+    fprintf (out, "  GOMP_offload_unregister_ver (%#x, (void *) 0,"
+	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
+	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
+	     GOMP_DEVICE_NVIDIA_PTX);
+  fprintf (out, "};\n");
 }
 
 static void
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index dfbdcfb..3faacd5 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -3811,6 +3811,9 @@ static const struct attribute_spec nvptx_attribute_table[] =
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
        affects_type_identity } */
   { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
+  /* Avoid offloading.  For example, because there is no sufficient
+     parallelism.  */
+  { "omp avoid offloading", 0, 0, true, false, false, NULL, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -3875,7 +3878,10 @@ nvptx_record_offload_symbol (tree decl)
 	tree dims = TREE_VALUE (attr);
 	unsigned ix;
 
-	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+	fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"",
+		 (lookup_attribute ("omp avoid offloading",
+				    DECL_ATTRIBUTES (decl))
+		  ? "(avoid offloading) " : ""),
 		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
 
 	for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
@@ -4135,6 +4141,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 static bool
 nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
+  /* Detect if a function is unsuitable for offloading.  */
+  if (!flag_offload_force && decl)
+    {
+      tree oacc_function_attr = get_oacc_fn_attrib (decl);
+      if (oacc_function_attr
+	  && oacc_fn_attrib_kernels_p (oacc_function_attr))
+	{
+	  bool avoid_offloading_p = true;
+	  for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
+	    {
+	      if (dims[ix] > 1)
+		{
+		  avoid_offloading_p = false;
+		  break;
+		}
+	    }
+	  if (avoid_offloading_p)
+	    {
+	      /* OpenACC kernels constructs will never be parallelized for
+		 optimization levels smaller than -O2; avoid the diagnostic in
+		 this case.  */
+	      if (optimize >= 2)
+		warning_at (DECL_SOURCE_LOCATION (decl), 0,
+			    "OpenACC kernels construct will be executed "
+			    "sequentially; will by default avoid offloading "
+			    "to prevent data copy penalty");
+	      DECL_ATTRIBUTES (decl)
+		= tree_cons (get_identifier ("omp avoid offloading"),
+			     NULL_TREE, DECL_ATTRIBUTES (decl));
+
+	    }
+	}
+    }
+
   bool changed = false;
 
   /* The vector size must be 32, unless this is a SEQ routine.  */
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index c608a36..c9c79fc 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -1153,7 +1153,7 @@ See S/390 and zSeries Options.
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
--foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]} @gol
+-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]}  -foffload-force @gol
 -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
@@ -24230,6 +24230,15 @@ objects references in an offload region do not alias.  The option
 aliasing in offload regions.  The default value is
 @option{-foffload-alias=none}.
 
+@item -foffload-force
+@opindex -foffload-force
+The option @option{-foffload-force} forces offloading if the compiler
+wanted to avoid it.  For example, when there isn't sufficient
+parallelism in certain offloading constructs, the compiler may come to
+the conclusion that offloading incurs too much overhead (for data
+transfers, for example), and unless overridden with this flag, it then
+suggests to the runtime (libgomp) to avoid offloading.
+
 @item -fexceptions
 @opindex fexceptions
 Enable exception handling.  Generates extra code needed to propagate
diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
index 91bb1e8..5e03544 100644
--- gcc/lto-wrapper.c
+++ gcc/lto-wrapper.c
@@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options,
 	case OPT_fsigned_zeros:
 	case OPT_ftrapping_math:
 	case OPT_fwrapv:
+	case OPT_foffload_force:
 	case OPT_fopenmp:
 	case OPT_fopenacc:
 	case OPT_fcheck_pointer_bounds:
@@ -516,6 +517,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
 	case OPT_fsigned_zeros:
 	case OPT_ftrapping_math:
 	case OPT_fwrapv:
+	case OPT_foffload_force:
 	case OPT_fopenmp:
 	case OPT_fopenacc:
 	case OPT_fopenacc_dim_:
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 2003a8a..b089e27 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,153 @@
+2016-01-21  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* target.c (GOMP_offload_register_ver)
+	(GOMP_offload_unregister_ver, gomp_init_device)
+	(gomp_unload_device, gomp_offload_target_available_p): Handle and
+	document "avoid offloading" ("host_table == NULL").
+	(resolve_device): Document "avoid offloading".
+	* oacc-init.c (resolve_device): Likewise.
+	* libgomp.texi (Enabling OpenACC): Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
+	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Set
+	"-foffload-force".
+	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
+	Likewise.
+
+	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Set
+	"-ftree-parallelize-loops=32".
+	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
+
 2016-01-20  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c++/non-scalar-data.C: New test.
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 8870084..2841b2e 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -1818,6 +1818,14 @@ flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
 arranges for automatic linking of the OpenACC runtime library 
 (@ref{OpenACC Runtime Library Routines}).
 
+Offloading is enabled by default.  In some cases, the compiler may
+come to the conclusion that offloading incurs too much overhead, and
+suggest to the runtime to avoid it.  To counteract that, you can use
+the option @option{-foffload-force} to force offloading in such cases.
+Alternatively, offloading is also enabled if a specific device type is
+requested, in a call to @code{acc_init} or by setting the
+@env{ACC_DEVICE_TYPE} environment variable, for example.
+
 A complete description of all OpenACC directives accepted may be found in 
 the @uref{http://www.openacc.org/, OpenACC} Application Programming
 Interface manual, version 2.0.
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index a90732d..b3d13a8 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -123,8 +123,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
 	if (goacc_device_type)
 	  {
 	    /* Lookup the device that has been explicitly named, so do not pay
-	       attention to gomp_offload_target_available_p.  (That is, hard
-	       error if not actually available.)  */
+	       attention to gomp_offload_target_available_p.  (That is,
+	       enforced usage even with an "avoid offloading" flag set, and
+	       hard error if not actually available.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -154,7 +155,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 	    && dispatchers[d]->get_num_devices_func () > 0
 	    /* No device has been explicitly named, so pay attention to
 	       gomp_offload_target_available_p, to not decide on an offload
-	       target that we don't have offload data available for.  */
+	       target that we don't have offload data available for, or have an
+	       "avoid offloading" flag set for.  */
 	    && gomp_offload_target_available_p (dispatchers[d]->type))
 	  goto found;
       /* No non-host device found.  */
diff --git libgomp/target.c libgomp/target.c
index 7adc4d0..c60e52a 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -130,8 +130,9 @@ resolve_device (int device)
     }
   gomp_mutex_unlock (&devices[device_id].lock);
 
-  /* If the device-var ICV does not actually have offload data available, don't
-     try use it (which will fail), and use host fallback instead.  */
+  /* Use host fallback instead of the device-var ICV if the latter doesn't
+     actually have offload data available (offloading will fail), or has an
+     "avoid offloading" flag set.  */
   if (device == GOMP_DEVICE_ICV
       && !gomp_offload_target_available_p (devices[device_id].type))
     return NULL;
@@ -1139,12 +1140,19 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
 
 /* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+   the target, and TARGET_DATA needed by target plugin.
+
+   If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid
+   offloading" flag, and the TARGET_TYPE will not be considered by default
+   until this image gets unregistered.  */
 
 void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
 			   int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
+	      version, host_table, target_type, target_data);
+
   int i;
 
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
@@ -1153,16 +1161,19 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
   
   gomp_mutex_lock (&register_lock);
 
-  /* Load image to all initialized devices.  */
-  for (i = 0; i < num_devices; i++)
+  if (host_table != NULL)
     {
-      struct gomp_device_descr *devicep = &devices[i];
-      gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type
-	  && devicep->state == GOMP_DEVICE_INITIALIZED)
-	gomp_load_image_to_device (devicep, version,
-				   host_table, target_data, true);
-      gomp_mutex_unlock (&devicep->lock);
+      /* Load image to all initialized devices.  */
+      for (i = 0; i < num_devices; i++)
+	{
+	  struct gomp_device_descr *devicep = &devices[i];
+	  gomp_mutex_lock (&devicep->lock);
+	  if (devicep->type == target_type
+	      && devicep->state == GOMP_DEVICE_INITIALIZED)
+	    gomp_load_image_to_device (devicep, version,
+				       host_table, target_data, true);
+	  gomp_mutex_unlock (&devicep->lock);
+	}
     }
 
   /* Insert image to array of pending images.  */
@@ -1188,26 +1199,36 @@ GOMP_offload_register (const void *host_table, int target_type,
 
 /* This function should be called from every offload image while unloading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+   the target, and TARGET_DATA needed by target plugin.
+
+   If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for this
+   image (TARGET_DATA), and this TARGET_TYPE may again be considered by
+   default.  */
 
 void
 GOMP_offload_unregister_ver (unsigned version, const void *host_table,
 			     int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
+	      version, host_table, target_type, target_data);
+
   int i;
 
   gomp_mutex_lock (&register_lock);
 
-  /* Unload image from all initialized devices.  */
-  for (i = 0; i < num_devices; i++)
+  if (host_table != NULL)
     {
-      struct gomp_device_descr *devicep = &devices[i];
-      gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type
-	  && devicep->state == GOMP_DEVICE_INITIALIZED)
-	gomp_unload_image_from_device (devicep, version,
-				       host_table, target_data);
-      gomp_mutex_unlock (&devicep->lock);
+      /* Unload image from all initialized devices.  */
+      for (i = 0; i < num_devices; i++)
+	{
+	  struct gomp_device_descr *devicep = &devices[i];
+	  gomp_mutex_lock (&devicep->lock);
+	  if (devicep->type == target_type
+	      && devicep->state == GOMP_DEVICE_INITIALIZED)
+	    gomp_unload_image_from_device (devicep, version,
+					   host_table, target_data);
+	  gomp_mutex_unlock (&devicep->lock);
+	}
     }
 
   /* Remove image from array of pending images.  */
@@ -1241,7 +1262,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
   for (i = 0; i < num_offload_images; i++)
     {
       struct offload_image_descr *image = &offload_images[i];
-      if (image->type == devicep->type)
+      if (image->type == devicep->type
+	  && image->host_table != NULL)
 	gomp_load_image_to_device (devicep, image->version,
 				   image->host_table, image->target_data,
 				   false);
@@ -1261,7 +1283,8 @@ gomp_unload_device (struct gomp_device_descr *devicep)
       for (i = 0; i < num_offload_images; i++)
 	{
 	  struct offload_image_descr *image = &offload_images[i];
-	  if (image->type == devicep->type)
+	  if (image->type == devicep->type
+	      && image->host_table != NULL)
 	    gomp_unload_image_from_device (devicep, image->version,
 					   image->host_table,
 					   image->target_data);
@@ -1272,7 +1295,9 @@ gomp_unload_device (struct gomp_device_descr *devicep)
 /* Do we have offload data available for the given offload target type?
    Instead of verifying that *all* offload data is available that could
    possibly be required, we instead just look for *any*.  If we later find any
-   offload data missing, that's user error.  */
+   offload data missing, that's user error.  If any offload data of this target
+   type is tagged with an "avoid offloading" flag, do not consider this target
+   type available unless it has been initialized already.  */
 
 attribute_hidden bool
 gomp_offload_target_available_p (int type)
@@ -1290,6 +1315,9 @@ gomp_offload_target_available_p (int type)
       gomp_mutex_unlock (&devicep->lock);
     }
 
+  /* If the offload target has been initialized already, we ignore "avoid
+     offloading" flags.  This is important, because data/state may be present
+     on the device, that we must continue to use.  */
   if (!available)
     {
       gomp_mutex_lock (&register_lock);
@@ -1303,8 +1331,14 @@ gomp_offload_target_available_p (int type)
 
       /* Can the offload target be initialized?  */
       for (int i = 0; !available && i < num_offload_images; i++)
-	if (offload_images[i].type == type)
+	if (offload_images[i].type == type
+	    && offload_images[i].host_table != NULL)
 	  available = true;
+      /* If yes, is an "avoid offloading" flag set?  */
+      for (int i = 0; available && i < num_offload_images; i++)
+	if (offload_images[i].type == type
+	    && offload_images[i].host_table == NULL)
+	  available = false;
 
       gomp_mutex_unlock (&register_lock);
     }
diff --git libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
index 180e86f..fe919c8 100644
--- libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
+++ libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
@@ -1,7 +1,8 @@
 // Ensure that a non-scalar dummy arguments which are implicitly used inside
 // offloaded regions are properly mapped using present_or_copy.
 
-// { dg-do run }
+// Override the compiler's "avoid offloading" decision.
+// { dg-additional-options "-foffload-force" }
 
 #include <cassert>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
index bca425e..b0da8b7 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdio.h>
 #include <stdlib.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
index c29ca3f..3079b78 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
index f3b490a..02e43af 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* <http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
    { dg-xfail-run-if "TODO" { *-*-* } } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-lcuda" } */
 
 #include <openacc.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
new file mode 100644
index 0000000..e614785
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
@@ -0,0 +1,25 @@
+/* Test that the compiler decides to "avoid offloading".  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
+  if (y != 1)
+    __builtin_abort();
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
new file mode 100644
index 0000000..c13436f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
@@ -0,0 +1,38 @@
+/* Test that a user can override the compiler's "avoid offloading"
+   decision.  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  /* Override the compiler's "avoid offloading" decision.  */
+  acc_device_t d;
+#if defined ACC_DEVICE_TYPE_nvidia
+  d = acc_device_nvidia;
+#elif defined ACC_DEVICE_TYPE_host
+  d = acc_device_host;
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+  acc_init (d);
+
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_nvidia
+  if (y != 0)
+    __builtin_abort();
+#else
+  if (y != 1)
+    __builtin_abort();
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
new file mode 100644
index 0000000..e2301e6
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
@@ -0,0 +1,29 @@
+/* Test that a user can override the compiler's "avoid offloading"
+   decision.  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_nvidia
+  if (y != 0)
+    __builtin_abort();
+#else
+  if (y != 1)
+    __builtin_abort();
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
index dad6d13..f8ebbb1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
@@ -1,6 +1,6 @@
 /* This test exercises combined directives.  */
 
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
index 1ac0b95..e512fcf 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
@@ -1,4 +1,6 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include  <openacc.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
index e62c315..b5c29ab 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
@@ -1,4 +1,6 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
index 51745ba..3ef6f9b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
 
 #include <stdlib.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
index 2887f66f..7b09917 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <openacc.h>
 #include <stdlib.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
index aeb0142..a90c9466 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
@@ -1,4 +1,6 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
index 0f323c8..1dc0402 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
@@ -1,4 +1,7 @@
-/* { dg-additional-options "-O2 -fipa-pta" } */
+/* { dg-additional-options "-fipa-pta" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
index 17a0f3d..baf6662 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
@@ -1,4 +1,7 @@
-/* { dg-additional-options "-O2 -foffload-alias=all -fipa-pta" } */
+/* { dg-additional-options "-foffload-alias=all -fipa-pta" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
index 44d4fd2..efbe43a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
@@ -1,4 +1,7 @@
-/* { dg-additional-options "-O2 -fipa-pta" } */
+/* { dg-additional-options "-fipa-pta" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
index a68a7cd..d527e14 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
@@ -1,3 +1,6 @@
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
+
 int
 main (void)
 {
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index 2e4100f..6b561e4 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index 83d4e7f..d965348 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index 01d5e5e..9548cd6 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index 61d1283..237d56c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index f7f04cb..67e75cd 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
index 2e920cd..195b2c5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
index 72249cc..f182a2c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
index 1b0a7cc..4da360c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
index bbe6b3c..1a8fc9c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
index 18e5676..a3f2fb9 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
index e424739..eac168c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
index a12e36e..0c0f1e1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
index f8ec543..0ee0a95 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
index 73561b3..e54873a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
index 3334830..9660c14 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
index 88ab245..e4d1437 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
@@ -1,3 +1,7 @@
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
+
 #include <assert.h>
 
 /* Test of gang-private aggregate variable declared on loop directive, with
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
index 3f7062d..83f52de 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
index dada424..25ceab5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
index 8d649d1..ac5f24a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
index a67f90e..a3d18a1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
index 465a800..3944399 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
index a08ba69..d6dd81b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
index 1f76345..53293a3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
index fe2e23a..63b5b51 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
index 12c17e4..65089de 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <assert.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
index 3a2a5b5..ab38f91 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
@@ -1,8 +1,9 @@
 /* Verify that a simple, explicit acc loop reduction works inside
  a kernels region.  */
 
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
index c164598..94a5ae2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
index 01728bd..bc1210e 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program asyncwait
   integer, parameter :: N = 64
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
index fe131b6..2dfed6a 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program asyncwait
   integer, parameter :: N = 64
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
index fa96a01..2c33c0f 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program asyncwait
   integer, parameter :: N = 64
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
new file mode 100644
index 0000000..0f4edb1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
@@ -0,0 +1,29 @@
+! Test that the compiler decides to "avoid offloading".
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { "" } }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST);
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
+      IF (.NOT. Y) CALL ABORT
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
new file mode 100644
index 0000000..4c8ceac
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
@@ -0,0 +1,40 @@
+! Test that a user can override the compiler's "avoid offloading" decision.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { "" } }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER :: D
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!     Override the compiler's "avoid offloading" decision.
+#if defined ACC_DEVICE_TYPE_nvidia
+      D = ACC_DEVICE_NVIDIA
+#elif defined ACC_DEVICE_TYPE_host
+      D = ACC_DEVICE_HOST
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+      CALL ACC_INIT (D)
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_nvidia
+      IF (Y) CALL ABORT
+#else
+      IF (.NOT. Y) CALL ABORT
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
new file mode 100644
index 0000000..5f669b7
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
@@ -0,0 +1,30 @@
+! Test that a user can override the compiler's "avoid offloading" decision.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+!     Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER :: D
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_nvidia
+      IF (Y) CALL ABORT
+#else
+      IF (.NOT. Y) CALL ABORT
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
index 94100b2..3081e7a 100644
--- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
@@ -1,6 +1,7 @@
 ! This test exercises combined directives.
 
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program main
   integer, parameter :: n = 32
diff --git libgomp/testsuite/libgomp.oacc-fortran/default-1.f90 libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
index 1059089..07c1e74 100644
--- libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
@@ -1,4 +1,7 @@
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   implicit none
diff --git libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
index 276a172..4646be9 100644
--- libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
@@ -1,9 +1,10 @@
-! { dg-do run }
-
 ! Test the deviceptr clause with various directives
 ! and in combination with other directives where
 ! the deviceptr variable is implied.
 
+! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+
 subroutine subr1 (a, b)
   implicit none
   integer, parameter :: N = 8
diff --git libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
index e54c1b2..784f8a1 100644
--- libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -1,5 +1,8 @@
-! { dg-do run } */
+! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   use openacc
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
index fdf9409..854fe9c 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
@@ -1,3 +1,8 @@
+! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
+
 program foo
 
   IMPLICIT NONE
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
index 912a22b..b120b66 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
@@ -1,3 +1,8 @@
+! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
+
 program foo
   IMPLICIT NONE
   INTEGER :: vol = 0
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
index 9378b12..1aafefa 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program collapse3
   integer :: a(3,3,3), k, kk, kkk, l, ll, lll
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
index dfd9cd2..1f2cf97 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program collapse4
   integer :: i, j, k, a(1:7, -3:5, 12:19), b(1:7, -3:5, 12:19)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
index 9f17308..f6b2255 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
@@ -1,4 +1,4 @@
-! { dg-do run } */
+! { dg-do run }
 ! { dg-additional-options "-cpp" }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
index 01d62f8..14e14ab 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
@@ -1,6 +1,9 @@
 ! Test the copy, copyin, copyout, pcopy, pcopyin, pcopyout, and pcreate
 ! clauses on kernels constructs.
 
+! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+
 program map
   integer, parameter     :: n = 20, c = 10
   integer                :: i, a(n), b(n), d(n)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
index 43a1988..51a57b2 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, arr(0:32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
index e5806ee..948f811 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, arr(0:32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
index 7d19bba..6be2692 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   type vec3
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
index 379bb3a..0312ee7 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, k, idx, arr(0:32*32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
index 8873efe..7ce7f1b 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
index f513ec2..50d13e4 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, arr(0:32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
index e7652d9..328a6b4 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, k, idx, arr(0:32*32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
index c82ced7..a96221d 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, k, idx, arr(0:32*32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
index e30de70..d2b30dd 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: x, i, j, k, idx, arr(0:32*32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
index 20f8579..3cfcbb4 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: i, j, k, idx, arr(0:32*32*32)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
index 48c3bfd..5f65926 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   type vec2
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
index ca63796..27d1b27 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
@@ -3,6 +3,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program main
   integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
index e894b6d..dcabe02 100644
--- libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
@@ -2,6 +2,8 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
 
 program reduction
   integer, parameter     :: n = 20
diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
index 4afb562..cae39ac 100644
--- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
@@ -2,6 +2,7 @@
 ! offloaded regions are properly mapped using present_or_copy.
 
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program main
   implicit none


GrÃÃe
 Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]