This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading" (was: [PATCH] Add fopt-info-oacc)


This patch has proven to be effective at warning users when the compiler
falls back to host execution due to insufficient parallelism (at least
from parloop's perspective) inside kernels regions. At the moment, acc
kernels are restricted to gang-level parallelism. Consequently, if
parloops fails to detects any parallelism, the kernels region will only
execute on a single GPU thread, which is several orders of magnitude
slower than a single CPU thread.

Before I rebase this patch, are these changes conceptually OK for GCC7?
Looking back at the old email thread, there were some disagreements on
these types of warnings. One significant example where the parloops
fails to detect any parallelism is in SPEC_ACCEL. A test that should
take an hour or so, takes longer than a week with the host fallback. At
least this warning provides the user with some feedback that his/her
program might run slow.

Cesar

On 01/21/2016 01:54 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> This patch introduces an option fopt-info-oacc.
>>
>> When using the option like this with a kernels region in kernels-loop.c 
>> that parloops does not manage to parallelize:
>> ...
>> $ gcc kernels-loop.c -S -O2 -fopenacc -fopt-info-oacc-all
>> ...
>>
>> we get a message:
>> ...
>> kernels-loop.c:23:9: note: kernels region executed sequentially. 
>> Consider mapping it to host execution, to avoid data copy penalty.
>> ...
> 
> Yay for helping the user understand what the compiler is doing!
> 
>> Any comments?
> 
> Telling from real-world code that we've been having a look at, when the
> above situation happens, we're -- in the vast majority of all cases -- in
> a situation where we generally want to avoid offloading (unless
> explicitly requested), "to avoid data copy penalty" as well as typically
> much slower single-threaded execution on the GPU.  Obviously, that will
> have to be revisited as parloops (or any other mechanism in GCC) is able
> to better understand/use the parallelism in OpenACC kernels constructs.
> 
> So, building upon Tom's patch, I have implemented an "avoid offloading"
> flag given the presence of one un-parallelized OpenACC kernels construct.
> This is currently only enabled for OpenACC kernels constructs, in
> combination with nvptx offloading, but I think the general scheme will be
> useful also for other constructs as well as other (non-shared memory)
> offloading targets.
> 
> Also, "avoid offloading" is just a default: if a user explicitly
> requested the use of, for example, a Nvidia GPU (with an
> acc_init(acc_device_nvidia) call, or by setting the
> ACC_DEVICE_TYPE=idia environemnt variable, for example), then we cannot
> apply host-fallback execution, because in this case the user can
> rightfully assume Nvidia GPU semantics (async clause works, and so on).
> 
> 
> The new warning (very similar to the one that Tom proposed) also
> uncovered a bunch of OpenACC kernels test cases in libgomp that did not
> have OpenACC kernels processing enabled (-ftree-parallelize-loops), but
> which parloops can handle fine once that is enabled -- and also a bunch
> of OpenACC kernels test cases that parloops doesn't handle but it looked
> as they were meant to be.  (Maybe I'm wrong about that, though.)  Anyway,
> Tom, would you please make a note to audit all use of -foffload-force in
> the libgomp testsuite?  (It is appropriate for all test cases that
> parloops truely is not meant to handle, but for all others, that flag
> should probably be removed and instead an XFAILed dg-bogus directive
> added, so that we will notice (XPASS) once it does handle them.)
> 
> 
> I've also added a new command-line option, -foffload-force, that restores
> the current behavior, inhibits the "avoid offloading" handling.  This is
> primarily meant for GCC (libgomp) testsuite usage, but could occasionally
> also be useful for users.  Considering alternatives (that can be applied
> in a more fine-grained way, case by case per OpenACC kernels construct):
> 
> 1) a new GCC-specific pragma, for example:
> 
>     #pragma GCC force offloading
>     #pragma acc kernels
>       [un-parallelizable stuff]
> 
> 2) a new GCC-specific clause, for example in the implementation
> namespace, starting with "_":
> 
>     #pragma acc kernels _force_offloading
>       [un-parallelizable stuff]
> 
> ..., the -foffload-force flag was the simplest solution.  (Because, if
> you're going to alter the sources anyway, you might as well just remove
> the one offending OpenACC kernels construct...)
> 
> 
> Committed to gomp-4_0-branch in r232709:
> 
> commit 41a76d233e714fd7b79dc1f40823f607c38306ba
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Thu Jan 21 21:52:50 2016 +0000
> 
>     Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"
>     
>     	gcc/
>     	* common.opt: Add -foffload-force.
>     	* lto-wrapper.c (merge_and_complain, append_compiler_options):
>     	Handle it.
>     	* doc/invoke.texi: Document it.
>     	* config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
>     	(record_id): Parse, and set it.
>     	(process): Use it.
>     	* config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
>     	offloading".
>     	(nvptx_record_offload_symbol): Use it.
>     	(nvptx_goacc_validate_dims): Set it.
>     	libgomp/
>     	* target.c (GOMP_offload_register_ver)
>     	(GOMP_offload_unregister_ver, gomp_init_device)
>     	(gomp_unload_device, gomp_offload_target_available_p): Handle and
>     	document "avoid offloading" ("host_table =NULL").
>     	(resolve_device): Document "avoid offloading".
>     	* oacc-init.c (resolve_device): Likewise.
>     	* libgomp.texi (Enabling OpenACC): Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
>     	file.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
>     	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
>     	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Set
>     	"-foffload-force".
>     	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
>     	Likewise.
>     
>     	libgomp/
>     	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Set
>     	"-ftree-parallelize-loops2".
>     	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
>     	Likewise.
>     	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
>     	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
>     	Likewise.
>     	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Likewise.
>     	* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232709 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp                                 |  14 ++
>  gcc/common.opt                                     |   4 +
>  gcc/config/nvptx/mkoffload.c                       |  73 +++++++++-
>  gcc/config/nvptx/nvptx.c                           |  42 +++++-
>  gcc/doc/invoke.texi                                |  11 +-
>  gcc/lto-wrapper.c                                  |   2 +
>  libgomp/ChangeLog.gomp                             | 150 +++++++++++++++++++++
>  libgomp/libgomp.texi                               |   8 ++
>  libgomp/oacc-init.c                                |   8 +-
>  libgomp/target.c                                   |  86 ++++++++----
>  .../testsuite/libgomp.oacc-c++/non-scalar-data.C   |   3 +-
>  .../testsuite/libgomp.oacc-c-c++-common/abort-3.c  |   3 +-
>  .../testsuite/libgomp.oacc-c-c++-common/abort-4.c  |   3 +-
>  .../libgomp.oacc-c-c++-common/asyncwait-1.c        |   1 +
>  .../libgomp.oacc-c-c++-common/avoid-offloading-1.c |  25 ++++
>  .../libgomp.oacc-c-c++-common/avoid-offloading-2.c |  38 ++++++
>  .../libgomp.oacc-c-c++-common/avoid-offloading-3.c |  29 ++++
>  .../combined-directives-1.c                        |   2 +-
>  .../libgomp.oacc-c-c++-common/default-1.c          |   4 +-
>  .../libgomp.oacc-c-c++-common/deviceptr-1.c        |   4 +-
>  .../libgomp.oacc-c-c++-common/host_data-1.c        |   1 +
>  libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c |   2 +-
>  .../libgomp.oacc-c-c++-common/kernels-1.c          |   4 +-
>  .../kernels-alias-ipa-pta-2.c                      |   5 +-
>  .../kernels-alias-ipa-pta-3.c                      |   5 +-
>  .../kernels-alias-ipa-pta.c                        |   5 +-
>  .../libgomp.oacc-c-c++-common/kernels-empty.c      |   3 +
>  .../kernels-loop-and-seq-2.c                       |   3 +-
>  .../kernels-loop-and-seq-5.c                       |   3 +-
>  .../kernels-loop-and-seq-6.c                       |   3 +-
>  .../kernels-loop-and-seq.c                         |   3 +-
>  .../kernels-loop-collapse.c                        |   3 +-
>  .../kernels-private-vars-local-worker-1.c          |   3 +-
>  .../kernels-private-vars-local-worker-2.c          |   3 +-
>  .../kernels-private-vars-local-worker-3.c          |   3 +-
>  .../kernels-private-vars-local-worker-4.c          |   3 +-
>  .../kernels-private-vars-local-worker-5.c          |   3 +-
>  .../kernels-private-vars-loop-gang-1.c             |   3 +-
>  .../kernels-private-vars-loop-gang-2.c             |   3 +-
>  .../kernels-private-vars-loop-gang-3.c             |   3 +-
>  .../kernels-private-vars-loop-gang-4.c             |   3 +-
>  .../kernels-private-vars-loop-gang-5.c             |   3 +-
>  .../kernels-private-vars-loop-gang-6.c             |   4 +
>  .../kernels-private-vars-loop-vector-1.c           |   3 +-
>  .../kernels-private-vars-loop-vector-2.c           |   3 +-
>  .../kernels-private-vars-loop-worker-1.c           |   3 +-
>  .../kernels-private-vars-loop-worker-2.c           |   3 +-
>  .../kernels-private-vars-loop-worker-3.c           |   3 +-
>  .../kernels-private-vars-loop-worker-4.c           |   3 +-
>  .../kernels-private-vars-loop-worker-5.c           |   3 +-
>  .../kernels-private-vars-loop-worker-6.c           |   3 +-
>  .../kernels-private-vars-loop-worker-7.c           |   3 +-
>  .../kernels-reduction-1.c                          |   3 +-
>  .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |   2 +-
>  .../testsuite/libgomp.oacc-fortran/asyncwait-1.f90 |   1 +
>  .../testsuite/libgomp.oacc-fortran/asyncwait-2.f90 |   1 +
>  .../testsuite/libgomp.oacc-fortran/asyncwait-3.f90 |   1 +
>  .../libgomp.oacc-fortran/avoid-offloading-1.f      |  29 ++++
>  .../libgomp.oacc-fortran/avoid-offloading-2.f      |  40 ++++++
>  .../libgomp.oacc-fortran/avoid-offloading-3.f      |  30 +++++
>  .../libgomp.oacc-fortran/combined-directives-1.f90 |   1 +
>  .../testsuite/libgomp.oacc-fortran/default-1.f90   |   3 +
>  .../testsuite/libgomp.oacc-fortran/deviceptr-1.f90 |   5 +-
>  libgomp/testsuite/libgomp.oacc-fortran/if-1.f90    |   5 +-
>  .../kernels-acc-loop-reduction-2.f90               |   5 +
>  .../kernels-acc-loop-reduction.f90                 |   5 +
>  .../libgomp.oacc-fortran/kernels-collapse-3.f90    |   2 +
>  .../libgomp.oacc-fortran/kernels-collapse-4.f90    |   2 +
>  .../libgomp.oacc-fortran/kernels-independent.f90   |   2 +-
>  .../libgomp.oacc-fortran/kernels-map-1.f90         |   3 +
>  .../kernels-private-vars-loop-gang-2.f90           |   2 +
>  .../kernels-private-vars-loop-gang-3.f90           |   2 +
>  .../kernels-private-vars-loop-gang-6.f90           |   2 +
>  .../kernels-private-vars-loop-vector-1.f90         |   2 +
>  .../kernels-private-vars-loop-vector-2.f90         |   2 +
>  .../kernels-private-vars-loop-worker-1.f90         |   2 +
>  .../kernels-private-vars-loop-worker-2.f90         |   2 +
>  .../kernels-private-vars-loop-worker-3.f90         |   2 +
>  .../kernels-private-vars-loop-worker-4.f90         |   2 +
>  .../kernels-private-vars-loop-worker-5.f90         |   2 +
>  .../kernels-private-vars-loop-worker-6.f90         |   2 +
>  .../kernels-private-vars-loop-worker-7.f90         |   2 +
>  .../libgomp.oacc-fortran/kernels-reduction-1.f90   |   2 +
>  .../libgomp.oacc-fortran/non-scalar-data.f90       |   1 +
>  84 files changed, 700 insertions(+), 78 deletions(-)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index cdd279b..f991b91 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,3 +1,17 @@
> +2016-01-21  Thomas Schwinge  <thomas@codesourcery.com>
> +
> +	* common.opt: Add -foffload-force.
> +	* lto-wrapper.c (merge_and_complain, append_compiler_options):
> +	Handle it.
> +	* doc/invoke.texi: Document it.
> +	* config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
> +	(record_id): Parse, and set it.
> +	(process): Use it.
> +	* config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
> +	offloading".
> +	(nvptx_record_offload_symbol): Use it.
> +	(nvptx_goacc_validate_dims): Set it.
> +
>  2016-01-20  Cesar Philippidis  <cesar@codesourcery.com>
>  
>  	* gimplify.c (gimplify_scan_omp_clauses):  Consider OACC_{DATA,
> diff --git gcc/common.opt gcc/common.opt
> index 793a062..c905f71 100644
> --- gcc/common.opt
> +++ gcc/common.opt
> @@ -1786,6 +1786,10 @@ Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
>  EnumValue
>  Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
>  
> +foffload-force
> +Common Var(flag_offload_force)
> +Force offloading if the compiler wanted to avoid it.
> +
>  fomit-frame-pointer
>  Common Report Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
> diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
> index cce562d..de6a8ad 100644
> --- gcc/config/nvptx/mkoffload.c
> +++ gcc/config/nvptx/mkoffload.c
> @@ -41,9 +41,19 @@ const char tool_name[] =nvptx mkoffload";
>  
>  #define COMMENT_PREFIX "#"
>  
> +enum id_map_flag
> +  {
> +    /* All clear.  */
> +    ID_MAP_FLAG_NONE =,
> +    /* Avoid offloading.  For example, because there is no sufficient
> +       parallelism.  */
> +    ID_MAP_FLAG_AVOID_OFFLOADING =
> +  };
> +
>  struct id_map
>  {
>    id_map *next;
> +  int flags;
>    char *ptx_name;
>  };
>  
> @@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where)
>      fatal_error (input_location, "malformed ptx file");
>  
>    id_map *v =NEW (id_map);
> +
> +  /* Do we have any flags?  */
> +  v->flags =D_MAP_FLAG_NONE;
> +  if (p1[0] ='(')
> +    {
> +      /* Current flag.  */
> +      const char *cur =1 + 1;
> +
> +      /* Seek to the beginning of ") ".  */
> +      p1 =trchr (cur, ')');
> +      if (!p1 || p1 > end || p1[1] != ')
> +	fatal_error (input_location, "malformed ptx file: "
> +		     "expected \") \" at \"%s\"", cur);
> +
> +      while (cur < p1)
> +	{
> +	  const char *next =trchr (cur, ',');
> +	  if (!next || next > p1)
> +	    next =1;
> +
> +	  if (strncmp (cur, "avoid offloading", next - cur - 1) =0)
> +	    v->flags |=D_MAP_FLAG_AVOID_OFFLOADING;
> +	  else
> +	    fatal_error (input_location, "malformed ptx file: "
> +			 "unknown flag at \"%s\"", cur);
> +
> +	  cur =ext;
> +	}
> +
> +      /* Skip past ") ".  */
> +      p1 +=;
> +    }
>    size_t len =nd - p1;
>    v->ptx_name =NEWVEC (char, len + 1);
>    memcpy (v->ptx_name, p1, len);
> @@ -296,12 +338,17 @@ process (FILE *in, FILE *out)
>    fprintf (out, "\n};\n\n");
>  
>    /* Dump out function idents.  */
> +  bool avoid_offloading_p =alse;
>    fprintf (out, "static const struct nvptx_fn {\n"
>  	   "  const char *name;\n"
>  	   "  unsigned short dim[%d];\n"
>  	   "} func_mappings[] =\n", GOMP_DIM_MAX);
>    for (comma =", id = func_ids; id; comma = ",", id = id->next)
> -    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
> +    {
> +      if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING)
> +	avoid_offloading_p =rue;
> +      fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
> +    }
>    fprintf (out, "\n};\n\n");
>  
>    fprintf (out,
> @@ -318,7 +365,11 @@ process (FILE *in, FILE *out)
>  	   "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
>  	   "  func_mappings,"
>  	   "  sizeof (func_mappings) / sizeof (func_mappings[0])\n"
> -	   "};\n\n");
> +	   "};\n");
> +  if (avoid_offloading_p)
> +    /* Need a unique handle for target_data.  */
> +    fprintf (out, "static int target_data_avoid_offloading;\n");
> +  fprintf (out, "\n");
>  
>    fprintf (out, "#ifdef __cplusplus\n"
>  	   "extern \"C\" {\n"
> @@ -338,18 +389,28 @@ process (FILE *in, FILE *out)
>    fprintf (out, "static __attribute__((constructor)) void init (void)\n"
>  	   "{\n"
>  	   "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
> -	   "%d/*NVIDIA_PTX*/, &target_data);\n"
> -	   "};\n",
> +	   "%d/*NVIDIA_PTX*/, &target_data);\n",
>  	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
>  	   GOMP_DEVICE_NVIDIA_PTX);
> +  if (avoid_offloading_p)
> +    fprintf (out, "  GOMP_offload_register_ver (%#x, (void *) 0,"
> +	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
> +	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
> +	     GOMP_DEVICE_NVIDIA_PTX);
> +  fprintf (out, "};\n");
>  
>    fprintf (out, "static __attribute__((destructor)) void fini (void)\n"
>  	   "{\n"
>  	   "  GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__,"
> -	   "%d/*NVIDIA_PTX*/, &target_data);\n"
> -	   "};\n",
> +	   "%d/*NVIDIA_PTX*/, &target_data);\n",
>  	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
>  	   GOMP_DEVICE_NVIDIA_PTX);
> +  if (avoid_offloading_p)
> +    fprintf (out, "  GOMP_offload_unregister_ver (%#x, (void *) 0,"
> +	     "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
> +	     GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
> +	     GOMP_DEVICE_NVIDIA_PTX);
> +  fprintf (out, "};\n");
>  }
>  
>  static void
> diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
> index dfbdcfb..3faacd5 100644
> --- gcc/config/nvptx/nvptx.c
> +++ gcc/config/nvptx/nvptx.c
> @@ -3811,6 +3811,9 @@ static const struct attribute_spec nvptx_attribute_table[]    /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>         affects_type_identity } */
>    { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
> +  /* Avoid offloading.  For example, because there is no sufficient
> +     parallelism.  */
> +  { "omp avoid offloading", 0, 0, true, false, false, NULL, false },
>    { NULL, 0, 0, false, false, false, NULL, false }
>  };
>  
> @@ -3875,7 +3878,10 @@ nvptx_record_offload_symbol (tree decl)
>  	tree dims =REE_VALUE (attr);
>  	unsigned ix;
>  
> -	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
> +	fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"",
> +		 (lookup_attribute ("omp avoid offloading",
> +				    DECL_ATTRIBUTES (decl))
> +		  ? "(avoid offloading) " : ""),
>  		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
>  
>  	for (ix =; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
> @@ -4135,6 +4141,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
>  static bool
>  nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
>  {
> +  /* Detect if a function is unsuitable for offloading.  */
> +  if (!flag_offload_force && decl)
> +    {
> +      tree oacc_function_attr =et_oacc_fn_attrib (decl);
> +      if (oacc_function_attr
> +	  && oacc_fn_attrib_kernels_p (oacc_function_attr))
> +	{
> +	  bool avoid_offloading_p =rue;
> +	  for (unsigned ix =; ix != GOMP_DIM_MAX; ix++)
> +	    {
> +	      if (dims[ix] > 1)
> +		{
> +		  avoid_offloading_p =alse;
> +		  break;
> +		}
> +	    }
> +	  if (avoid_offloading_p)
> +	    {
> +	      /* OpenACC kernels constructs will never be parallelized for
> +		 optimization levels smaller than -O2; avoid the diagnostic in
> +		 this case.  */
> +	      if (optimize >=)
> +		warning_at (DECL_SOURCE_LOCATION (decl), 0,
> +			    "OpenACC kernels construct will be executed "
> +			    "sequentially; will by default avoid offloading "
> +			    "to prevent data copy penalty");
> +	      DECL_ATTRIBUTES (decl)
> +		=ree_cons (get_identifier ("omp avoid offloading"),
> +			     NULL_TREE, DECL_ATTRIBUTES (decl));
> +
> +	    }
> +	}
> +    }
> +
>    bool changed =alse;
>  
>    /* The vector size must be 32, unless this is a SEQ routine.  */
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index c608a36..c9c79fc 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -1153,7 +1153,7 @@ See S/390 and zSeries Options.
>  -finstrument-functions-exclude-function-list=ar{sym},@var{sym},@dots{} @gol
>  -finstrument-functions-exclude-file-list=ar{file},@var{file},@dots{} @gol
>  -fno-common  -fno-ident @gol
> --foffload-alias={[}none@r{|}pointer@r{|}all@r{]} @gol
> +-foffload-alias={[}none@r{|}pointer@r{|}all@r{]}  -foffload-force @gol
>  -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
>  -fno-jump-tables @gol
>  -frecord-gcc-switches @gol
> @@ -24230,6 +24230,15 @@ objects references in an offload region do not alias.  The option
>  aliasing in offload regions.  The default value is
>  @option{-foffload-alias=ne}.
>  
> +@item -foffload-force
> +@opindex -foffload-force
> +The option @option{-foffload-force} forces offloading if the compiler
> +wanted to avoid it.  For example, when there isn't sufficient
> +parallelism in certain offloading constructs, the compiler may come to
> +the conclusion that offloading incurs too much overhead (for data
> +transfers, for example), and unless overridden with this flag, it then
> +suggests to the runtime (libgomp) to avoid offloading.
> +
>  @item -fexceptions
>  @opindex fexceptions
>  Enable exception handling.  Generates extra code needed to propagate
> diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
> index 91bb1e8..5e03544 100644
> --- gcc/lto-wrapper.c
> +++ gcc/lto-wrapper.c
> @@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options,
>  	case OPT_fsigned_zeros:
>  	case OPT_ftrapping_math:
>  	case OPT_fwrapv:
> +	case OPT_foffload_force:
>  	case OPT_fopenmp:
>  	case OPT_fopenacc:
>  	case OPT_fcheck_pointer_bounds:
> @@ -516,6 +517,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
>  	case OPT_fsigned_zeros:
>  	case OPT_ftrapping_math:
>  	case OPT_fwrapv:
> +	case OPT_foffload_force:
>  	case OPT_fopenmp:
>  	case OPT_fopenacc:
>  	case OPT_fopenacc_dim_:
> diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
> index 2003a8a..b089e27 100644
> --- libgomp/ChangeLog.gomp
> +++ libgomp/ChangeLog.gomp
> @@ -1,3 +1,153 @@
> +2016-01-21  Thomas Schwinge  <thomas@codesourcery.com>
> +
> +	* target.c (GOMP_offload_register_ver)
> +	(GOMP_offload_unregister_ver, gomp_init_device)
> +	(gomp_unload_device, gomp_offload_target_available_p): Handle and
> +	document "avoid offloading" ("host_table =NULL").
> +	(resolve_device): Document "avoid offloading".
> +	* oacc-init.c (resolve_device): Likewise.
> +	* libgomp.texi (Enabling OpenACC): Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
> +	file.
> +	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
> +	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
> +	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
> +	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Set
> +	"-foffload-force".
> +	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
> +	Likewise.
> +
> +	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Set
> +	"-ftree-parallelize-loops2".
> +	* testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
> +	Likewise.
> +	* testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
> +	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
> +	Likewise.
> +	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Likewise.
> +	* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
> +
>  2016-01-20  Cesar Philippidis  <cesar@codesourcery.com>
>  
>  	* testsuite/libgomp.oacc-c++/non-scalar-data.C: New test.
> diff --git libgomp/libgomp.texi libgomp/libgomp.texi
> index 8870084..2841b2e 100644
> --- libgomp/libgomp.texi
> +++ libgomp/libgomp.texi
> @@ -1818,6 +1818,14 @@ flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
>  arranges for automatic linking of the OpenACC runtime library 
>  (@ref{OpenACC Runtime Library Routines}).
>  
> +Offloading is enabled by default.  In some cases, the compiler may
> +come to the conclusion that offloading incurs too much overhead, and
> +suggest to the runtime to avoid it.  To counteract that, you can use
> +the option @option{-foffload-force} to force offloading in such cases.
> +Alternatively, offloading is also enabled if a specific device type is
> +requested, in a call to @code{acc_init} or by setting the
> +@env{ACC_DEVICE_TYPE} environment variable, for example.
> +
>  A complete description of all OpenACC directives accepted may be found in 
>  the @uref{http://www.openacc.org/, OpenACC} Application Programming
>  Interface manual, version 2.0.
> diff --git libgomp/oacc-init.c libgomp/oacc-init.c
> index a90732d..b3d13a8 100644
> --- libgomp/oacc-init.c
> +++ libgomp/oacc-init.c
> @@ -123,8 +123,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
>  	if (goacc_device_type)
>  	  {
>  	    /* Lookup the device that has been explicitly named, so do not pay
> -	       attention to gomp_offload_target_available_p.  (That is, hard
> -	       error if not actually available.)  */
> +	       attention to gomp_offload_target_available_p.  (That is,
> +	       enforced usage even with an "avoid offloading" flag set, and
> +	       hard error if not actually available.)  */
>  	    while (++d !=ACC_device_hwm)
>  	      if (dispatchers[d]
>  		  && !strcasecmp (goacc_device_type,
> @@ -154,7 +155,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
>  	    && dispatchers[d]->get_num_devices_func () > 0
>  	    /* No device has been explicitly named, so pay attention to
>  	       gomp_offload_target_available_p, to not decide on an offload
> -	       target that we don't have offload data available for.  */
> +	       target that we don't have offload data available for, or have an
> +	       "avoid offloading" flag set for.  */
>  	    && gomp_offload_target_available_p (dispatchers[d]->type))
>  	  goto found;
>        /* No non-host device found.  */
> diff --git libgomp/target.c libgomp/target.c
> index 7adc4d0..c60e52a 100644
> --- libgomp/target.c
> +++ libgomp/target.c
> @@ -130,8 +130,9 @@ resolve_device (int device)
>      }
>    gomp_mutex_unlock (&devices[device_id].lock);
>  
> -  /* If the device-var ICV does not actually have offload data available, don't
> -     try use it (which will fail), and use host fallback instead.  */
> +  /* Use host fallback instead of the device-var ICV if the latter doesn't
> +     actually have offload data available (offloading will fail), or has an
> +     "avoid offloading" flag set.  */
>    if (device =GOMP_DEVICE_ICV
>        && !gomp_offload_target_available_p (devices[device_id].type))
>      return NULL;
> @@ -1139,12 +1140,19 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
>  
>  /* This function should be called from every offload image while loading.
>     It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +   the target, and TARGET_DATA needed by target plugin.
> +
> +   If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid
> +   offloading" flag, and the TARGET_TYPE will not be considered by default
> +   until this image gets unregistered.  */
>  
>  void
>  GOMP_offload_register_ver (unsigned version, const void *host_table,
>  			   int target_type, const void *target_data)
>  {
> +  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
> +	      version, host_table, target_type, target_data);
> +
>    int i;
>  
>    if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
> @@ -1153,16 +1161,19 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
>    
>    gomp_mutex_lock (&register_lock);
>  
> -  /* Load image to all initialized devices.  */
> -  for (i =; i < num_devices; i++)
> +  if (host_table !=ULL)
>      {
> -      struct gomp_device_descr *devicep =devices[i];
> -      gomp_mutex_lock (&devicep->lock);
> -      if (devicep->type =target_type
> -	  && devicep->state =GOMP_DEVICE_INITIALIZED)
> -	gomp_load_image_to_device (devicep, version,
> -				   host_table, target_data, true);
> -      gomp_mutex_unlock (&devicep->lock);
> +      /* Load image to all initialized devices.  */
> +      for (i =; i < num_devices; i++)
> +	{
> +	  struct gomp_device_descr *devicep =devices[i];
> +	  gomp_mutex_lock (&devicep->lock);
> +	  if (devicep->type =target_type
> +	      && devicep->state =GOMP_DEVICE_INITIALIZED)
> +	    gomp_load_image_to_device (devicep, version,
> +				       host_table, target_data, true);
> +	  gomp_mutex_unlock (&devicep->lock);
> +	}
>      }
>  
>    /* Insert image to array of pending images.  */
> @@ -1188,26 +1199,36 @@ GOMP_offload_register (const void *host_table, int target_type,
>  
>  /* This function should be called from every offload image while unloading.
>     It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +   the target, and TARGET_DATA needed by target plugin.
> +
> +   If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for this
> +   image (TARGET_DATA), and this TARGET_TYPE may again be considered by
> +   default.  */
>  
>  void
>  GOMP_offload_unregister_ver (unsigned version, const void *host_table,
>  			     int target_type, const void *target_data)
>  {
> +  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
> +	      version, host_table, target_type, target_data);
> +
>    int i;
>  
>    gomp_mutex_lock (&register_lock);
>  
> -  /* Unload image from all initialized devices.  */
> -  for (i =; i < num_devices; i++)
> +  if (host_table !=ULL)
>      {
> -      struct gomp_device_descr *devicep =devices[i];
> -      gomp_mutex_lock (&devicep->lock);
> -      if (devicep->type =target_type
> -	  && devicep->state =GOMP_DEVICE_INITIALIZED)
> -	gomp_unload_image_from_device (devicep, version,
> -				       host_table, target_data);
> -      gomp_mutex_unlock (&devicep->lock);
> +      /* Unload image from all initialized devices.  */
> +      for (i =; i < num_devices; i++)
> +	{
> +	  struct gomp_device_descr *devicep =devices[i];
> +	  gomp_mutex_lock (&devicep->lock);
> +	  if (devicep->type =target_type
> +	      && devicep->state =GOMP_DEVICE_INITIALIZED)
> +	    gomp_unload_image_from_device (devicep, version,
> +					   host_table, target_data);
> +	  gomp_mutex_unlock (&devicep->lock);
> +	}
>      }
>  
>    /* Remove image from array of pending images.  */
> @@ -1241,7 +1262,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
>    for (i =; i < num_offload_images; i++)
>      {
>        struct offload_image_descr *image =offload_images[i];
> -      if (image->type =devicep->type)
> +      if (image->type =devicep->type
> +	  && image->host_table !=ULL)
>  	gomp_load_image_to_device (devicep, image->version,
>  				   image->host_table, image->target_data,
>  				   false);
> @@ -1261,7 +1283,8 @@ gomp_unload_device (struct gomp_device_descr *devicep)
>        for (i =; i < num_offload_images; i++)
>  	{
>  	  struct offload_image_descr *image =offload_images[i];
> -	  if (image->type =devicep->type)
> +	  if (image->type =devicep->type
> +	      && image->host_table !=ULL)
>  	    gomp_unload_image_from_device (devicep, image->version,
>  					   image->host_table,
>  					   image->target_data);
> @@ -1272,7 +1295,9 @@ gomp_unload_device (struct gomp_device_descr *devicep)
>  /* Do we have offload data available for the given offload target type?
>     Instead of verifying that *all* offload data is available that could
>     possibly be required, we instead just look for *any*.  If we later find any
> -   offload data missing, that's user error.  */
> +   offload data missing, that's user error.  If any offload data of this target
> +   type is tagged with an "avoid offloading" flag, do not consider this target
> +   type available unless it has been initialized already.  */
>  
>  attribute_hidden bool
>  gomp_offload_target_available_p (int type)
> @@ -1290,6 +1315,9 @@ gomp_offload_target_available_p (int type)
>        gomp_mutex_unlock (&devicep->lock);
>      }
>  
> +  /* If the offload target has been initialized already, we ignore "avoid
> +     offloading" flags.  This is important, because data/state may be present
> +     on the device, that we must continue to use.  */
>    if (!available)
>      {
>        gomp_mutex_lock (&register_lock);
> @@ -1303,8 +1331,14 @@ gomp_offload_target_available_p (int type)
>  
>        /* Can the offload target be initialized?  */
>        for (int i =; !available && i < num_offload_images; i++)
> -	if (offload_images[i].type =type)
> +	if (offload_images[i].type =type
> +	    && offload_images[i].host_table !=ULL)
>  	  available =rue;
> +      /* If yes, is an "avoid offloading" flag set?  */
> +      for (int i =; available && i < num_offload_images; i++)
> +	if (offload_images[i].type =type
> +	    && offload_images[i].host_table =NULL)
> +	  available =alse;
>  
>        gomp_mutex_unlock (&register_lock);
>      }
> diff --git libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
> index 180e86f..fe919c8 100644
> --- libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
> +++ libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C
> @@ -1,7 +1,8 @@
>  // Ensure that a non-scalar dummy arguments which are implicitly used inside
>  // offloaded regions are properly mapped using present_or_copy.
>  
> -// { dg-do run }
> +// Override the compiler's "avoid offloading" decision.
> +// { dg-additional-options "-foffload-force" }
>  
>  #include <cassert>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> index bca425e..b0da8b7 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdio.h>
>  #include <stdlib.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> index c29ca3f..3079b78 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> index f3b490a..02e43af 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> @@ -1,6 +1,7 @@
>  /* { dg-do run { target openacc_nvidia_accel_selected } } */
>  /* <http://news.gmane.org/find-root.php?message_id=C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
>     { dg-xfail-run-if "TODO" { *-*-* } } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
>  /* { dg-additional-options "-lcuda" } */
>  
>  #include <openacc.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> new file mode 100644
> index 0000000..e614785
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> @@ -0,0 +1,25 @@
> +/* Test that the compiler decides to "avoid offloading".  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
> +  *((volatile int *) &x) =3, y = acc_on_device (acc_device_host);
> +
> +  if (x !=3)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
> +  if (y !=)
> +    __builtin_abort();
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> new file mode 100644
> index 0000000..c13436f
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> @@ -0,0 +1,38 @@
> +/* Test that a user can override the compiler's "avoid offloading"
> +   decision.  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  /* Override the compiler's "avoid offloading" decision.  */
> +  acc_device_t d;
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  d =cc_device_nvidia;
> +#elif defined ACC_DEVICE_TYPE_host
> +  d =cc_device_host;
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +  acc_init (d);
> +
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
> +  *((volatile int *) &x) =3, y = acc_on_device (acc_device_host);
> +
> +  if (x !=3)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  if (y !=)
> +    __builtin_abort();
> +#else
> +  if (y !=)
> +    __builtin_abort();
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> new file mode 100644
> index 0000000..e2301e6
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> @@ -0,0 +1,29 @@
> +/* Test that a user can override the compiler's "avoid offloading"
> +   decision.  */
> +
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
> +
> +#include <openacc.h>
> +
> +int main(void)
> +{
> +  int x, y;
> +
> +#pragma acc data copyout(x, y)
> +#pragma acc kernels
> +  *((volatile int *) &x) =3, y = acc_on_device (acc_device_host);
> +
> +  if (x !=3)
> +    __builtin_abort();
> +#if defined ACC_DEVICE_TYPE_nvidia
> +  if (y !=)
> +    __builtin_abort();
> +#else
> +  if (y !=)
> +    __builtin_abort();
> +#endif
> +
> +  return 0;
> +}
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> index dad6d13..f8ebbb1 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> @@ -1,6 +1,6 @@
>  /* This test exercises combined directives.  */
>  
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> index 1ac0b95..e512fcf 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> @@ -1,4 +1,6 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include  <openacc.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> index e62c315..b5c29ab 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
> @@ -1,4 +1,6 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> index 51745ba..3ef6f9b 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-do run { target openacc_nvidia_accel_selected } } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
>  /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
>  
>  #include <stdlib.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> index 2887f66f..7b09917 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
>  
>  #include <openacc.h>
>  #include <stdlib.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> index aeb0142..a90c9466 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> @@ -1,4 +1,6 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> index 0f323c8..1dc0402 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
> @@ -1,4 +1,7 @@
> -/* { dg-additional-options "-O2 -fipa-pta" } */
> +/* { dg-additional-options "-fipa-pta" } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> index 17a0f3d..baf6662 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
> @@ -1,4 +1,7 @@
> -/* { dg-additional-options "-O2 -foffload-alias=l -fipa-pta" } */
> +/* { dg-additional-options "-foffload-alias=l -fipa-pta" } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> index 44d4fd2..efbe43a 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
> @@ -1,4 +1,7 @@
> -/* { dg-additional-options "-O2 -fipa-pta" } */
> +/* { dg-additional-options "-fipa-pta" } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> index a68a7cd..d527e14 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
> @@ -1,3 +1,6 @@
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
> +
>  int
>  main (void)
>  {
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> index 2e4100f..6b561e4 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> index 83d4e7f..d965348 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> index 01d5e5e..9548cd6 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> index 61d1283..237d56c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> index f7f04cb..67e75cd 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
> index 2e920cd..195b2c5 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
> index 72249cc..f182a2c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
> index 1b0a7cc..4da360c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
> index bbe6b3c..1a8fc9c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
> index 18e5676..a3f2fb9 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
> index e424739..eac168c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
> index a12e36e..0c0f1e1 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
> index f8ec543..0ee0a95 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
> index 73561b3..e54873a 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
> index 3334830..9660c14 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
> index 88ab245..e4d1437 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c
> @@ -1,3 +1,7 @@
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
> +
>  #include <assert.h>
>  
>  /* Test of gang-private aggregate variable declared on loop directive, with
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
> index 3f7062d..83f52de 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
> index dada424..25ceab5 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
> index 8d649d1..ac5f24a 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
> index a67f90e..a3d18a1 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
> index 465a800..3944399 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
> index a08ba69..d6dd81b 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
> index 1f76345..53293a3 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
> index fe2e23a..63b5b51 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
> index 12c17e4..65089de 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
> @@ -1,5 +1,6 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <assert.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
> index 3a2a5b5..ab38f91 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
> @@ -1,8 +1,9 @@
>  /* Verify that a simple, explicit acc loop reduction works inside
>   a kernels region.  */
>  
> -/* { dg-do run } */
>  /* { dg-additional-options "-ftree-parallelize-loops2" } */
> +/* Override the compiler's "avoid offloading" decision.
> +   { dg-additional-options "-foffload-force" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> index c164598..94a5ae2 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops2" } */
>  
>  #include <stdlib.h>
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
> index 01728bd..bc1210e 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
>  
>  program asyncwait
>    integer, parameter :: N =4
> diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
> index fe131b6..2dfed6a 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
>  
>  program asyncwait
>    integer, parameter :: N =4
> diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
> index fa96a01..2c33c0f 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
>  
>  program asyncwait
>    integer, parameter :: N =4
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> new file mode 100644
> index 0000000..0f4edb1
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> @@ -0,0 +1,29 @@
> +! Test that the compiler decides to "avoid offloading".
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! The warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { "" } }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
> +      X =3
> +      Y =CC_ON_DEVICE (ACC_DEVICE_HOST);
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
> +      IF (.NOT. Y) CALL ABORT
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> new file mode 100644
> index 0000000..4c8ceac
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> @@ -0,0 +1,40 @@
> +! Test that a user can override the compiler's "avoid offloading" decision.
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! The warning is only triggered for -O2 and higher.
> +! { dg-xfail-if "n/a" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { "" } }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER :: D
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!     Override the compiler's "avoid offloading" decision.
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      D =CC_DEVICE_NVIDIA
> +#elif defined ACC_DEVICE_TYPE_host
> +      D =CC_DEVICE_HOST
> +#else
> +# error Not ported to this ACC_DEVICE_TYPE
> +#endif
> +      CALL ACC_INIT (D)
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target openacc_nvidia_accel_selected } } */
> +      X =3
> +      Y =CC_ON_DEVICE (ACC_DEVICE_HOST)
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      IF (Y) CALL ABORT
> +#else
> +      IF (.NOT. Y) CALL ABORT
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> new file mode 100644
> index 0000000..5f669b7
> --- /dev/null
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> @@ -0,0 +1,30 @@
> +! Test that a user can override the compiler's "avoid offloading" decision.
> +
> +! { dg-do run }
> +! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +!     Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
> +
> +      IMPLICIT NONE
> +      INCLUDE "openacc_lib.h"
> +
> +      INTEGER :: D
> +      INTEGER, VOLATILE :: X
> +      LOGICAL :: Y
> +
> +!$ACC DATA COPYOUT(X, Y)
> +!$ACC KERNELS
> +      X =3
> +      Y =CC_ON_DEVICE (ACC_DEVICE_HOST)
> +!$ACC END KERNELS
> +!$ACC END DATA
> +
> +      IF (X .NE. 33) CALL ABORT
> +#if defined ACC_DEVICE_TYPE_nvidia
> +      IF (Y) CALL ABORT
> +#else
> +      IF (.NOT. Y) CALL ABORT
> +#endif
> +
> +      END
> diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> index 94100b2..3081e7a 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> @@ -1,6 +1,7 @@
>  ! This test exercises combined directives.
>  
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
>  
>  program main
>    integer, parameter :: n =2
> diff --git libgomp/testsuite/libgomp.oacc-fortran/default-1.f90 libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
> index 1059089..07c1e74 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
> @@ -1,4 +1,7 @@
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    implicit none
> diff --git libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90 libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
> index 276a172..4646be9 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/deviceptr-1.f90
> @@ -1,9 +1,10 @@
> -! { dg-do run }
> -
>  ! Test the deviceptr clause with various directives
>  ! and in combination with other directives where
>  ! the deviceptr variable is implied.
>  
> +! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +
>  subroutine subr1 (a, b)
>    implicit none
>    integer, parameter :: N =
> diff --git libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
> index e54c1b2..784f8a1 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
> @@ -1,5 +1,8 @@
> -! { dg-do run } */
> +! { dg-do run }
>  ! { dg-additional-options "-cpp" }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    use openacc
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
> index fdf9409..854fe9c 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
> @@ -1,3 +1,8 @@
> +! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
> +
>  program foo
>  
>    IMPLICIT NONE
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
> index 912a22b..b120b66 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
> @@ -1,3 +1,8 @@
> +! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
> +
>  program foo
>    IMPLICIT NONE
>    INTEGER :: vol =
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
> index 9378b12..1aafefa 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program collapse3
>    integer :: a(3,3,3), k, kk, kkk, l, ll, lll
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
> index dfd9cd2..1f2cf97 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program collapse4
>    integer :: i, j, k, a(1:7, -3:5, 12:19), b(1:7, -3:5, 12:19)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
> index 9f17308..f6b2255 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
> @@ -1,4 +1,4 @@
> -! { dg-do run } */
> +! { dg-do run }
>  ! { dg-additional-options "-cpp" }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
> index 01d62f8..14e14ab 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-map-1.f90
> @@ -1,6 +1,9 @@
>  ! Test the copy, copyin, copyout, pcopy, pcopyin, pcopyout, and pcreate
>  ! clauses on kernels constructs.
>  
> +! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
> +
>  program map
>    integer, parameter     :: n =0, c = 10
>    integer                :: i, a(n), b(n), d(n)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
> index 43a1988..51a57b2 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, arr(0:32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
> index e5806ee..948f811 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, arr(0:32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
> index 7d19bba..6be2692 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    type vec3
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
> index 379bb3a..0312ee7 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, k, idx, arr(0:32*32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
> index 8873efe..7ce7f1b 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
> index f513ec2..50d13e4 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, arr(0:32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
> index e7652d9..328a6b4 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, k, idx, arr(0:32*32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
> index c82ced7..a96221d 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, k, idx, arr(0:32*32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
> index e30de70..d2b30dd 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: x, i, j, k, idx, arr(0:32*32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
> index 20f8579..3cfcbb4 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: i, j, k, idx, arr(0:32*32*32)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
> index 48c3bfd..5f65926 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    type vec2
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
> index ca63796..27d1b27 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90
> @@ -3,6 +3,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program main
>    integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
> diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90 libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
> index e894b6d..dcabe02 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
> @@ -2,6 +2,8 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-ftree-parallelize-loops2" }
> +! Override the compiler's "avoid offloading" decision.
> +! { dg-additional-options "-foffload-force" }
>  
>  program reduction
>    integer, parameter     :: n =0
> diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> index 4afb562..cae39ac 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> @@ -2,6 +2,7 @@
>  ! offloaded regions are properly mapped using present_or_copy.
>  
>  ! { dg-do run }
> +! { dg-additional-options "-ftree-parallelize-loops2" }
>  
>  program main
>    implicit none
> 
> 
> Grüße
>  Thomas
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]