This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, 4/16] Implement -foffload-alias
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Jakub Jelinek <jakub at redhat dot com>, "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>
- Date: Fri, 27 Nov 2015 12:42:09 +0100
- Subject: Re: [PATCH, 4/16] Implement -foffload-alias
- Authentication-results: sourceware.org; auth=none
- References: <5640BD31 dot 2060602 at mentor dot com> <5640C560 dot 1000007 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511111150020 dot 4884 at t29 dot fhfr dot qr> <20151111110034 dot GF5675 at tucnak dot redhat dot com> <5644B84D dot 6050504 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511130944270 dot 4884 at t29 dot fhfr dot qr> <5645C33B dot 9080802 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511131228450 dot 4884 at t29 dot fhfr dot qr> <20151113113938 dot GM5675 at tucnak dot redhat dot com> <565058F0 dot 8040509 at mentor dot com> <alpine dot LSU dot 2 dot 11 dot 1511231241230 dot 4884 at t29 dot fhfr dot qr>
On 23/11/15 12:41, Richard Biener wrote:
On Sat, 21 Nov 2015, Tom de Vries wrote:
>On 13/11/15 12:39, Jakub Jelinek wrote:
> >On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:
> > > >thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta
> > > >issues'.
> > > >
> > > >Any feedback on the '#pragma GCC offload-alias=<none|pointer|all>' bit
> > > >above?
> > > >Is that sort of what you had in mind?
> > >
> > >Yes. Whether that makes sense is another question of course. You can
> > >annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
> > >as well if you know dependences without the users intervention.
> >
> >I really don't like even the GCC offload-alias, I just don't see anything
> >special on the offload code. Not to mention that the same issue is already
> >with other outlined functions, like OpenMP tasks or parallel regions, those
> >aren't offloaded, yet they can suffer from worse alias/points-to analysis
> >too.
>
>AFAIU there is one aspect that is different for offloaded code: the setup of
>the data on the device.
>
>Consider this example:
>...
>unsigned int a[N];
>unsigned int b[N];
>unsigned int c[N];
>
>int
>main (void)
>{
> ...
>
>#pragma acc kernels copyin (a) copyin (b) copyout (c)
> {
> for (COUNTERTYPE ii = 0; ii < N; ii++)
> c[ii] = a[ii] + b[ii];
> }
>
> ...
>...
>
>At gimple level, we have:
>...
>#pragma omp target oacc_kernels \
> map(force_from:c [len: 2097152]) \
> map(force_to:b [len: 2097152]) \
> map(force_to:a [len: 2097152])
>...
>
>[ The meaning of the force_from/force_to mappings is given in
>include/gomp-constants.h:
>...
> /* Allocate. */
> GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
> /* ..., and copy to device. */
> GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
> /* ..., and copy from device. */
> GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
> /* ..., and copy to and from device. */
> GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
>... ]
>
>So before calling the offloaded function, a separate alloc is done for a, b
>and c, and the base pointers of the newly allocated objects are passed to the
>offloaded function.
>
>This means we can mark those base pointers as restrict in the offloaded
>function.
>
>Attached proof-of-concept patch implements that.
>
> >We simply have some compiler internal interface between the caller and
> >callee of the outlined regions, each interface in between those has
> >its own structure type used to communicate the info;
> >we can attach attributes on the fields, or some flags to indicate some
> >properties interesting from aliasing POV.
> >We don't really need to perform
> >full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
> >the relationship in between such callers and callees (for offloading regions
> >we already have "omp target entrypoint" attribute on the callee and a
> >singler caller), tell LTO if possible not to split those into different
> >partitions if easily possible, and then just for these pairs perform
> >aliasing/points-to analysis in the caller and the result record using
> >cliques/special attributes/whatever to the callee side, so that the callee
> >(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.
>
>As a start, is the approach of this patch OK?
Works for me but leaving to Jakub to review for correctness.
Attached patch is a complete version:
- added ChangeLog
- added missing function header comments
- moved analysis to separate function
omp_target_base_pointers_restrict_p
- added example in comment before analysis
- fixed error in omp_target_base_pointers_restrict_p where I was using
GOMP_MAP_ALLOC but should have been using GOMP_MAP_FORCE_ALLOC
- added testcases
Bootstrapped and reg-tested on x86_64.
OK for stage3 trunk?
Thanks,
- Tom
Mark pointers to allocated target vars as restricted, if possible
2015-11-26 Tom de Vries <tom@codesourcery.com>
* omp-low.c (install_var_field_1): New function, factored out of ...
(install_var_field): ... here.
(scan_sharing_clauses_1): New function, factored out of ...
(scan_sharing_clauses): ... here.
(omp_target_base_pointers_restrict_p): New function.
(scan_omp_target): Call scan_sharing_clauses_1 instead of
scan_sharing_clauses, with base_pointers_restrict arg.
* c-c++-common/goacc/kernels-alias-2.c: New test.
* c-c++-common/goacc/kernels-alias-3.c: New test.
* c-c++-common/goacc/kernels-alias-4.c: New test.
* c-c++-common/goacc/kernels-alias-5.c: New test.
* c-c++-common/goacc/kernels-alias-6.c: New test.
* c-c++-common/goacc/kernels-alias-7.c: New test.
* c-c++-common/goacc/kernels-alias-8.c: New test.
* c-c++-common/goacc/kernels-alias.c: New test.
---
gcc/omp-low.c | 109 +++++++++++++++++++--
gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c | 27 +++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c | 20 ++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c | 22 +++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c | 19 ++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c | 23 +++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c | 25 +++++
gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c | 22 +++++
gcc/testsuite/c-c++-common/goacc/kernels-alias.c | 29 ++++++
9 files changed, 289 insertions(+), 7 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0d4c6e5..6843c49 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1366,10 +1366,12 @@ build_sender_ref (tree var, omp_context *ctx)
return build_sender_ref ((splay_tree_key) var, ctx);
}
-/* Add a new field for VAR inside the structure CTX->SENDER_DECL. */
+/* Add a new field for VAR inside the structure CTX->SENDER_DECL. If
+ BASE_POINTERS_RESTRICT, declare the field with restrict. */
static void
-install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+install_var_field_1 (tree var, bool by_ref, int mask, omp_context *ctx,
+ bool base_pointers_restrict)
{
tree field, type, sfield = NULL_TREE;
splay_tree_key key = (splay_tree_key) var;
@@ -1393,7 +1395,11 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
type = build_pointer_type (build_pointer_type (type));
}
else if (by_ref)
- type = build_pointer_type (type);
+ {
+ type = build_pointer_type (type);
+ if (base_pointers_restrict)
+ type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+ }
else if ((mask & 3) == 1 && is_reference (var))
type = TREE_TYPE (type);
@@ -1457,6 +1463,14 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
splay_tree_insert (ctx->sfield_map, key, (splay_tree_value) sfield);
}
+/* As install_var_field_1, but with base_pointers_restrict == false. */
+
+static void
+install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+{
+ install_var_field_1 (var, by_ref, mask, ctx, false);
+}
+
static tree
install_var_local (tree var, omp_context *ctx)
{
@@ -1810,10 +1824,12 @@ fixup_child_record_type (omp_context *ctx)
}
/* Instantiate decls as necessary in CTX to satisfy the data sharing
- specified by CLAUSES. */
+ specified by CLAUSES. If BASE_POINTERS_RESTRICT, install var field with
+ restrict. */
static void
-scan_sharing_clauses (tree clauses, omp_context *ctx)
+scan_sharing_clauses_1 (tree clauses, omp_context *ctx,
+ bool base_pointers_restrict)
{
tree c, decl;
bool scan_array_reductions = false;
@@ -2070,7 +2086,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
&& TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
install_var_field (decl, true, 7, ctx);
else
- install_var_field (decl, true, 3, ctx);
+ install_var_field_1 (decl, true, 3, ctx,
+ base_pointers_restrict);
if (is_gimple_omp_offloaded (ctx->stmt))
install_var_local (decl, ctx);
}
@@ -2336,6 +2353,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
scan_omp (&OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c), ctx);
}
+/* As scan_sharing_clauses_1, but with base_pointers_restrict == false. */
+
+static void
+scan_sharing_clauses (tree clauses, omp_context *ctx)
+{
+ scan_sharing_clauses_1 (clauses, ctx, false);
+}
+
/* Create a new name for omp child function. Returns an identifier. If
IS_CILK_FOR is true then the suffix for the child function is
"_cilk_for_fn." */
@@ -3032,6 +3057,68 @@ scan_omp_single (gomp_single *stmt, omp_context *outer_ctx)
layout_type (ctx->record_type);
}
+/* Return true if the CLAUSES of an omp target guarantee that the base pointers
+ used in the corresponding offloaded function are restrict. */
+
+static bool
+omp_target_base_pointers_restrict_p (tree clauses)
+{
+ /* The analysis relies on the GOMP_MAP_FORCE_* mapping kinds, which are only
+ used by OpenACC. */
+ if (flag_openacc == 0)
+ return false;
+
+ /* I. Basic example:
+
+ void foo (void)
+ {
+ unsigned int a[2], b[2];
+
+ #pragma acc kernels \
+ copyout (a) \
+ copyout (b)
+ {
+ a[0] = 0;
+ b[0] = 1;
+ }
+ }
+
+ After gimplification, we have:
+
+ #pragma omp target oacc_kernels \
+ map(force_from:a [len: 8]) \
+ map(force_from:b [len: 8])
+ {
+ a[0] = 0;
+ b[0] = 1;
+ }
+
+ Because both mappings have the force prefix, we know that they will be
+ allocated when calling the corresponding offloaded function, which means we
+ can mark the base pointers for a and b in the offloaded function as
+ restrict. */
+
+ tree c;
+ for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
+ return false;
+
+ switch (OMP_CLAUSE_MAP_KIND (c))
+ {
+ case GOMP_MAP_FORCE_ALLOC:
+ case GOMP_MAP_FORCE_TO:
+ case GOMP_MAP_FORCE_FROM:
+ case GOMP_MAP_FORCE_TOFROM:
+ break;
+ default:
+ return false;
+ }
+ }
+
+ return true;
+}
+
/* Scan a GIMPLE_OMP_TARGET. */
static void
@@ -3053,13 +3140,21 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
DECL_NAMELESS (name) = 1;
TYPE_NAME (ctx->record_type) = name;
TYPE_ARTIFICIAL (ctx->record_type) = 1;
+
+ bool base_pointers_restrict = false;
if (offloaded)
{
create_omp_child_function (ctx, false);
gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
+
+ base_pointers_restrict = omp_target_base_pointers_restrict_p (clauses);
+ if (base_pointers_restrict
+ && dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Base pointers in offloaded function are restrict\n");
}
- scan_sharing_clauses (clauses, ctx);
+ scan_sharing_clauses_1 (clauses, ctx, base_pointers_restrict);
scan_omp (gimple_omp_body_ptr (stmt), ctx);
if (TYPE_FIELDS (ctx->record_type) == NULL)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
new file mode 100644
index 0000000..d437c47
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+void
+foo (void)
+{
+ unsigned int a;
+ unsigned int b;
+ unsigned int c;
+ unsigned int d;
+
+#pragma acc kernels copyin (a) create (b) copyout (c) copy (d)
+ {
+ a = 0;
+ b = 0;
+ c = 0;
+ d = 0;
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 4 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 8 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
new file mode 100644
index 0000000..0eda7e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
@@ -0,0 +1,20 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+void
+foo (void)
+{
+ unsigned int a;
+ unsigned int *p = &a;
+
+#pragma acc kernels pcopyin (a, p[0:1])
+ {
+ a = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
new file mode 100644
index 0000000..037901f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+#define N 2
+
+void
+foo (void)
+{
+ unsigned int a[N];
+ unsigned int *p = &a[0];
+
+#pragma acc kernels pcopyin (a, p[0:2])
+ {
+ a[0] = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
new file mode 100644
index 0000000..69cd3fb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+void
+foo (int *a)
+{
+ int *p = a;
+
+#pragma acc kernels pcopyin (a[0:1], p[0:1])
+ {
+ *a = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
new file mode 100644
index 0000000..6ebce15
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern void *acc_copyin (void *, size_t);
+
+void
+foo (void)
+{
+ int a = 0;
+ int *p = (int *)acc_copyin (&a, sizeof (a));
+
+#pragma acc kernels deviceptr (p) pcopy(a)
+ {
+ a = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
new file mode 100644
index 0000000..40eb235
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern void *acc_copyin (void *, size_t);
+
+#define N 2
+
+void
+foo (void)
+{
+ int a[N];
+ int *p = (int *)acc_copyin (&a[0], sizeof (a));
+
+#pragma acc kernels deviceptr (p) pcopy(a)
+ {
+ a[0] = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
new file mode 100644
index 0000000..0b93e35
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern void *acc_copyin (void *, size_t);
+
+void
+foo (int *a, size_t n)
+{
+ int *p = (int *)acc_copyin (&a, n);
+
+#pragma acc kernels deviceptr (p) pcopy(a[0:n])
+ {
+ a = 0;
+ *p = 1;
+ }
+}
+
+/* Only the omp_data_i related loads should be annotated with cliques. */
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
new file mode 100644
index 0000000..25821ab2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
@@ -0,0 +1,29 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+#define N 2
+
+void
+foo (void)
+{
+ unsigned int a[N];
+ unsigned int b[N];
+ unsigned int c[N];
+ unsigned int d[N];
+
+#pragma acc kernels copyin (a) create (b) copyout (c) copy (d)
+ {
+ a[0] = 0;
+ b[0] = 0;
+ c[0] = 0;
+ d[0] = 0;
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 4 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 8 "ealias" } } */
+
- References:
- [PATCH series, 16] Use parloops to parallelize oacc kernels regions
- [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias
- Re: [PATCH, 4/16] Implement -foffload-alias