This is the mail archive of the
mailing list for the GCC project.
Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions
- From: Tom de Vries <tdevries at suse dot de>
- To: Cesar Philippidis <cesar at codesourcery dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>, Thomas Schwinge <thomas at codesourcery dot com>
- Date: Thu, 26 Jul 2018 14:46:02 +0200
- Subject: Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
>> Right, in fact there are two separate things you're trying to address
>> here: launch failure and occupancy heuristic, so split the patch.
> That hunk was small, so I included it with this patch. Although if you
> insist, I can remove it.
Please, for future reference, always assume that I insist instead of
asking me, unless you have an argument to present why that is not a good
idea. And just to be clear here: "small" is not such an argument.
Please keep in mind ( https://gcc.gnu.org/contribute.html#patches ):
Don't mix together changes made for different reasons. Send them
> + /* Check if the accelerator has sufficient hardware resources to
> + launch the offloaded kernel. */
> + if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
> + > targ_fn->max_threads_per_block)
> + GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
> + " launch '%s' with num_workers = %d and vector_length ="
> + " %d; recompile the program with 'num_workers = x and"
> + " vector_length = y' on that offloaded region or "
> + "'-fopenacc-dim=-:x:y' where x * y <= %d.\n",
> + targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
> + dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block);
This is copied from the state on an openacc branch where vector-length
is variable, and the error message text doesn't make sense on current
trunk for that reason. Also, it suggests a syntax for fopenacc-dim
that's not supported on trunk.
Committed as attached.
[libgomp, nvptx] Add error with recompilation hint for launch failure
Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
This patch detects this situation, and errors out with a hint on how to fix it.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Cesar Philippidis <email@example.com>
Tom de Vries <firstname.lastname@example.org>
* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
sufficient resources to launch a kernel, and give a hint on how to fix
libgomp/plugin/plugin-nvptx.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5d9b5151e95..3a4077a1315 100644
@@ -1204,6 +1204,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
dims[i] = default_dims[i];
+ /* Check if the accelerator has sufficient hardware resources to
+ launch the offloaded kernel. */
+ if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
+ > targ_fn->max_threads_per_block)
+ int suggest_workers
+ = targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR];
+ GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
+ " launch '%s' with num_workers = %d; recompile the"
+ " program with 'num_workers = %d' on that offloaded"
+ " region or '-fopenacc-dim=:%d'",
+ targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
+ suggest_workers, suggest_workers);
/* This reserves a chunk of a pre-allocated page of memory mapped on both
the host and the device. HP is a host pointer to the new chunk, and DP is
the corresponding device pointer. */