This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, 6/16] Add pass_oacc_kernels
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Richard Biener <rguenther at suse dot de>
- Date: Mon, 9 Nov 2015 18:39:19 +0100
- Subject: [PATCH, 6/16] Add pass_oacc_kernels
- Authentication-results: sourceware.org; auth=none
- References: <5640BD31 dot 2060602 at mentor dot com>
On 09/11/15 16:35, Tom de Vries wrote:
Hi,
this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.
The patch series contains these patches:
1 Insert new exit block only when needed in
transform_to_exit_first_loop_alt
2 Make create_parallel_loop return void
3 Ignore reduction clause on kernels directive
4 Implement -foffload-alias
5 Add in_oacc_kernels_region in struct loop
6 Add pass_oacc_kernels
7 Add pass_dominator_oacc_kernels
8 Add pass_ch_oacc_kernels
9 Add pass_parallelize_loops_oacc_kernels
10 Add pass_oacc_kernels pass group in passes.def
11 Update testcases after adding kernels pass group
12 Handle acc loop directive
13 Add c-c++-common/goacc/kernels-*.c
14 Add gfortran.dg/goacc/kernels-*.f95
15 Add libgomp.oacc-c-c++-common/kernels-*.c
16 Add libgomp.oacc-fortran/kernels-*.f95
The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.
Bootstrapped and reg-tested on x86_64.
Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
I'll post the individual patches in reply to this message.
this patchs add a pass group pass_oacc_kernels (which will be added to
the pass list as a whole in patch 10).
Atm, the parallelization behaviour for the kernels region is controlled
by flag_tree_parallelize_loops, which is also used to control generic
auto-parallelization by autopar using omp. That is not ideal, and we may
want a separate flag (or param) to control the behaviour for oacc
kernels, f.i. -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.
The purpose of the pass group as a whole is to massage the offloaded
function into a shape that parloops can deal with it, and then run
parloops on it.
Consider a testcase with a reduction, and a loop counter declared
outside the offload region:
...
unsigned int a[n];
unsigned int
foo (void)
{
int i;
unsigned int sum = 1;
#pragma acc kernels copyin (a[0:n]) copy (sum)
{
for (i = 0; i < n; ++i)
sum += a[i];
}
return sum;
}
...
After ealias, the loop body looks like this:
...
<bb 5>:
_8 = *.omp_data_i_3(D).a;
_9 = *.omp_data_i_3(D).i;
_10 = *_9;
_11 = *_8[_10];
_12 = *.omp_data_i_3(D).sum;
sum.0_13 = *_12;
sum.1_14 = _11 + sum.0_13;
_15 = *.omp_data_i_3(D).sum;
*_15 = sum.1_14;
_17 = *.omp_data_i_3(D).i;
_18 = *_17;
_19 = *.omp_data_i_3(D).i;
_20 = _18 + 1;
*_19 = _20;
goto <bb 6>;
...
In other words, the iteration variable is in memory, as is the reduction
variable, and the body contains lots of loop invariant loads.
At the end of the pass group, just before parloops, the body has been
rewritten to have a local iteration variable and a local reduction
variable, and all the loop invariant loads have been moved out of the loop:
...
<bb 4>:
# _27 = PHI <0(2), _20(5)>
# D__lsm.7_28 = PHI <D__lsm.7_29(2), sum.1_14(5)>
_11 = *_8[_27];
sum.1_14 = _11 + D__lsm.7_28;
_20 = _27 + 1;
if (_20 <= 9999)
goto <bb 5>;
else
goto <bb 3>;
...
Thanks,
- Tom
Add pass_oacc_kernels
2015-11-09 Tom de Vries <tom@codesourcery.com>
* tree-pass.h (make_pass_oacc_kernels): Declare.
* tree-ssa-loop.c (gate_oacc_kernels): New static function.
(pass_data_oacc_kernels): New pass_data.
(class pass_oacc_kernels): New pass.
(make_pass_oacc_kernels): New function.
---
gcc/tree-pass.h | 1 +
gcc/tree-ssa-loop.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 49e22a9..4ed8da6 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -463,6 +463,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
/* IPA Passes */
extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 8ecd140..b51cac2 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-inline.h"
#include "tree-scalar-evolution.h"
#include "tree-vectorizer.h"
+#include "omp-low.h"
/* A pass making sure loops are fixed up. */
@@ -141,6 +142,70 @@ make_pass_tree_loop (gcc::context *ctxt)
return new pass_tree_loop (ctxt);
}
+/* Gate for oacc kernels pass group. */
+
+static bool
+gate_oacc_kernels (function *fn)
+{
+ if (flag_tree_parallelize_loops <= 1)
+ return false;
+
+ tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
+ if (oacc_function_attr == NULL_TREE)
+ return false;
+
+ tree val = TREE_VALUE (oacc_function_attr);
+ while (val != NULL_TREE && TREE_VALUE (val) == NULL_TREE)
+ val = TREE_CHAIN (val);
+
+ if (val != NULL_TREE)
+ return false;
+
+ struct loop *loop;
+ FOR_EACH_LOOP (loop, 0)
+ if (loop->in_oacc_kernels_region)
+ return true;
+
+ return false;
+}
+
+/* The oacc kernels superpass. */
+
+namespace {
+
+const pass_data pass_data_oacc_kernels =
+{
+ GIMPLE_PASS, /* type */
+ "oacc_kernels", /* name */
+ OPTGROUP_LOOP, /* optinfo_flags */
+ TV_TREE_LOOP, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_oacc_kernels : public gimple_opt_pass
+{
+public:
+ pass_oacc_kernels (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_oacc_kernels, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual bool gate (function *fn) { return gate_oacc_kernels (fn); }
+
+}; // class pass_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_kernels (gcc::context *ctxt)
+{
+ return new pass_oacc_kernels (ctxt);
+}
+
/* The no-loop superpass. */
namespace {
--
1.9.1