This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Mon, 16 Nov 2015 12:55:06 +0100
- Subject: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
- Authentication-results: sourceware.org; auth=none
- References: <5640BD31 dot 2060602 at mentor dot com> <5640FB07 dot 6010008 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511111159040 dot 4884 at t29 dot fhfr dot qr>
On 11/11/15 12:02, Richard Biener wrote:
On Mon, 9 Nov 2015, Tom de Vries wrote:
On 09/11/15 16:35, Tom de Vries wrote:
Hi,
this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.
The patch series contains these patches:
1 Insert new exit block only when needed in
transform_to_exit_first_loop_alt
2 Make create_parallel_loop return void
3 Ignore reduction clause on kernels directive
4 Implement -foffload-alias
5 Add in_oacc_kernels_region in struct loop
6 Add pass_oacc_kernels
7 Add pass_dominator_oacc_kernels
8 Add pass_ch_oacc_kernels
9 Add pass_parallelize_loops_oacc_kernels
10 Add pass_oacc_kernels pass group in passes.def
11 Update testcases after adding kernels pass group
12 Handle acc loop directive
13 Add c-c++-common/goacc/kernels-*.c
14 Add gfortran.dg/goacc/kernels-*.f95
15 Add libgomp.oacc-c-c++-common/kernels-*.c
16 Add libgomp.oacc-fortran/kernels-*.f95
The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.
Bootstrapped and reg-tested on x86_64.
Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
I'll post the individual patches in reply to this message.
This patch adds the pass_oacc_kernels pass group to the pass list in
passes.def.
Note the repetition of pass_lim/pass_copy_prop. The first pair is for an inner
loop in a loop nest, the second for an outer loop in a loop nest.
@@ -86,6 +86,27 @@ along with GCC; see the file COPYING3. If not see
/* pass_build_ealias is a dummy pass that ensures that we
execute TODO_rebuild_alias at this point. */
NEXT_PASS (pass_build_ealias);
+ /* Pass group that runs when there are oacc kernels in the
+ function. */
+ NEXT_PASS (pass_oacc_kernels);
+ PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_ch_oacc_kernels);
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);
iterate lim/copyprop twice?! Why's that needed?
I've managed to eliminate the last pass_copy_prop, but not pass_lim.
I've added a comment:
...
/* We use pass_lim to rewrite in-memory iteration and reduction
variable accesses in loops into local variables accesses.
However, a single pass instantion manages to do this only for
one loop level, so we use pass_lim twice to at least be able to
handle a loop nest with a depth of two. */
NEXT_PASS (pass_lim);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_lim);
...
+ NEXT_PASS (pass_scev_cprop);
What's that for? It's supposed to help removing loops - I don't
expect kernels to vanish.
I'm using pass_scev_cprop for the "final value replacement"
functionality. Added comment.
+ NEXT_PASS (pass_tree_loop_done);
+ NEXT_PASS (pass_dominator_oacc_kernels);
Three times DOM? No please. I wonder why you don't run oacc_kernels
after FRE and drop the initial DOM(s).
Done. There's just one pass_dominator_oacc_kernels left now.
+ NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+ NEXT_PASS (pass_expand_omp_ssa);
+ NEXT_PASS (pass_tree_loop_done);
The switches into/outof tree_loop also look odd to me, but well
(they'll be controlled by -ftree-loop-optimize)).
I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done
in the pass group. Instead, I've added conditional loop optimizer setup in:
- pass_lim and pass_scev_cprop (added in this patch), and
- pass_parallelize_loops_oacc_kernels (added in patch "Add
pass_parallelize_loops_oacc_kernels").
Thanks,
- Tom
Add pass_oacc_kernels pass group in passes.def
2015-11-09 Tom de Vries <tom@codesourcery.com>
* omp-low.c (pass_expand_omp_ssa::clone): New function.
* passes.def: Add pass_oacc_kernels pass group.
* tree-ssa-loop-ch.c (pass_ch::clone): New function.
* tree-ssa-loop-im.c (tree_ssa_lim): Allow to run outside
pass_tree_loop.
* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
(pass_scev_cprop::execute): Allow to run outside pass_tree_loop.
---
gcc/omp-low.c | 1 +
gcc/passes.def | 25 +++++++++++++++++++++++++
gcc/tree-ssa-loop-ch.c | 2 ++
gcc/tree-ssa-loop-im.c | 14 ++++++++++++++
gcc/tree-ssa-loop.c | 22 +++++++++++++++++++++-
5 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 9eae09a..8078afb 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13385,6 +13385,7 @@ public:
return !(fun->curr_properties & PROP_gimple_eomp);
}
virtual unsigned int execute (function *) { return execute_expand_omp (); }
+ opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
}; // class pass_expand_omp_ssa
diff --git a/gcc/passes.def b/gcc/passes.def
index db822d3..d76cfd3 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -87,6 +87,31 @@ along with GCC; see the file COPYING3. If not see
execute TODO_rebuild_alias at this point. */
NEXT_PASS (pass_build_ealias);
NEXT_PASS (pass_fre);
+ /* Pass group that runs when the function is an offloaded function
+ containing oacc kernels loops. */
+ NEXT_PASS (pass_oacc_kernels);
+ PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+ /* We need pass_ch here, because pass_lim has no effect on
+ exit-first loops (PR65442). Ideally we want to remove both
+ this pass instantiation, and the reverse transformation
+ transform_to_exit_first_loop_alt, which is done in
+ pass_parallelize_loops_oacc_kernels. */
+ NEXT_PASS (pass_ch);
+ /* We use pass_lim to rewrite in-memory iteration and reduction
+ variable accesses in loops into local variables accesses.
+ However, a single pass instantion manages to do this only for
+ one loop level, so we use pass_lim twice to at least be able to
+ handle a loop nest with a depth of two. */
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_lim);
+ /* We use pass_scev_cprop here for final value replacement. */
+ NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+ NEXT_PASS (pass_expand_omp_ssa);
+ POP_INSERT_PASSES ()
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_dse);
NEXT_PASS (pass_cd_dce);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 7e618bf..6493fcc 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -165,6 +165,8 @@ public:
/* Initialize and finalize loop structures, copying headers inbetween. */
virtual unsigned int execute (function *);
+ opt_pass * clone () { return new pass_ch (m_ctxt); }
+
protected:
/* ch_base method: */
virtual bool process_loop_p (struct loop *loop);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 30b53ce..48810f3 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-ssa-propagate.h"
#include "trans-mem.h"
#include "gimple-fold.h"
+#include "tree-scalar-evolution.h"
/* TODO: Support for predicated code motion. I.e.
@@ -2501,6 +2502,19 @@ tree_ssa_lim (void)
{
unsigned int todo;
+ if (!loops_state_satisfies_p (LOOPS_NORMAL
+ | LOOPS_HAVE_RECORDED_EXITS
+ | LOOP_CLOSED_SSA))
+ {
+ loop_optimizer_init (LOOPS_NORMAL
+ | LOOPS_HAVE_RECORDED_EXITS);
+ rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
+ /* We might discover new loops, e.g. when turning irreducible
+ regions into reducible. */
+ scev_initialize ();
+ }
+
tree_ssa_lim_initialize ();
/* Gathers information about memory accesses in the loops. */
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index b51cac2..570406f 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -373,10 +373,30 @@ public:
/* opt_pass methods: */
virtual bool gate (function *) { return flag_tree_scev_cprop; }
- virtual unsigned int execute (function *) { return scev_const_prop (); }
+ virtual unsigned int execute (function *);
+ opt_pass * clone () { return new pass_scev_cprop (m_ctxt); }
}; // class pass_scev_cprop
+unsigned int
+pass_scev_cprop::execute (function *)
+{
+ if (!loops_state_satisfies_p (LOOPS_NORMAL
+ | LOOPS_HAVE_RECORDED_EXITS
+ | LOOP_CLOSED_SSA))
+ {
+ loop_optimizer_init (LOOPS_NORMAL
+ | LOOPS_HAVE_RECORDED_EXITS);
+ rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
+ /* We might discover new loops, e.g. when turning irreducible
+ regions into reducible. */
+ scev_initialize ();
+ }
+
+ return scev_const_prop ();
+}
+
} // anon namespace
gimple_opt_pass *