This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, PR68953] Fix pdr accesses order
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: Sebastian Pop <sebpop at gmail dot com>, <tobias at grosser dot es>
- Cc: Richard Biener <rguenther at suse dot de>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 8 Apr 2016 09:03:10 +0200
- Subject: [PATCH, PR68953] Fix pdr accesses order
- Authentication-results: sourceware.org; auth=none
Hi,
this patch fixes wrong-code PR68953, a graphite 6 regression.
I.
Consider test.c:
...
int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } };
int
main (void)
{
int zh, ro;
for (zh = 0; zh < 2; ++zh)
for (ro = 0; ro < 3; ++ro)
yu[ro][0] = yu[zh + 1][0];
return yu[0][0];
}
...
The program is supposed to return 2, but when compiling with -O1
-floop-nest-optimize, the program returns 3 instead.
The problem is that graphite rewrites this loop nest:
...
for (zh = 0; zh < 2; ++zh)
for (ro = 0; ro < 3; ++ro)
yu[ro][0] = yu[zh + 1][0];
...
into this loop nest, which has different semantics:
...
for (ro = 0; ro < 3; ++ro)
for (zh = 0; zh < 2; ++zh)
yu[ro][0] = yu[zh + 1][0];
...
II.
Now, consider test2.c, the equivalent of test.c without the extra array
dimension on yu:
...
int yu[4] = { 1, 2, 3, 4 };
int
main (void)
{
int zh, ro;
for (zh = 0; zh < 2; ++zh)
for (ro = 0; ro < 3; ++ro)
yu[ro] = yu[zh + 1];
return yu[0];
}
...
In this case, when compiling with -O1 -floop-nest-optimize, the graphite
transformation doesn't happen, and the program returns 2.
III.
So, what is the difference in graphite analysis that causes different
conclusions?
For test2.c, we find the data references:
...
data references (
reads: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <= 1 and i2 <= 2
and i2 >= 0 }
must_writes: { S_4[i1, i2] -> [1, i2] : i2 >= 0 and i2 <= 2 and i1 >=
0 and i1 <= 1 }
may_writes: { }
)
...
For test.c, we find these data references:
...
data references (
reads: { }
must_writes: { S_4[i1, 0] -> [1, 0, 0] : i1 >= 0 and i1 <= 1 }
may_writes: { }
)
...
In other words, there are no reads in the program, which is of course
incorrect.
Focusing on the reads, we find for test2.c:
...
Adding read to depedence graph: pdr_0 (read
in gimple stmt: _9 = yu[_8];
data accesses: { S_4[i1, i2] -> [1, 1 + i1] }
subscript sizes: { [1, i1] : i1 >= 0 and i1 <= 3 }
)
Reads depedence graph: { S_4[i1, i2] -> [1, 1 + i1] : i1 >= 0 and i1 <=
1 and i2 <= 2 and i2 >= 0 }
...
And for test.c:
...
Adding read to depedence graph: pdr_0 (read
in gimple stmt: _9 = yu[_8][0];
data accesses: { S_4[i1, i2] -> [1, 0, 1 + i1] }
subscript sizes: { [1, i1, 0] : i1 >= 0 and i1 <= 3 }
)
Reads depedence graph: { }
...
IV.
AFAICT, the data accesses and subscript sizes are parsed as follows:
...
data accesses:
{
S_4[i1, i2]
/* [alias set, last subscript, first subscript] */
-> [1, 0, 1 + i1]
}
subscript sizes:
{
/* [alias set range, first subscript range, last subscript range] */
[1, i1, 0]
: i1 >= 0 and i1 <= 3
}
...
In add_pdr_constraints, we do an intersection of the data accesses and
the subscript_sizes:
...
/* Constrain pdr->accesses with pdr->subscript_sizes and pbb->domain.*/
static isl_map *
add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb)
{
isl_map *x
= isl_map_intersect_range (isl_map_copy (pdr->accesses),
isl_set_copy (pdr->subscript_sizes));
x = isl_map_coalesce (x);
return constrain_domain (x, isl_set_copy (pbb->domain));
}
...
The output of the intersection is:
...
x: { S_4[-1, i2] -> [1, 0, 0] }
...
AFAICT, we're intersecting
'[1, 0, 1 + i1]'
with
'[1, i1, 0]'
and end up with
'[1, 0, 0]'.
In other words, we're intersecting:
- the last subscript access function with the first subscript range,
and
- the first subscript access function with the last subscript range.
This is the point from where things go wrong.
V.
I'm not really sure how this is supposed to be fixed. I imagine that we
should do one of 3:
1. we change the order in the access functions
2. we change the order in the subscript_sizes
3. we keep the orders as they are, but don't intersect them directly
but do an order inversion before.
I've picked 1, since that was the easiest for me to implement (but I'm
not sure if by doing so, I've broken any hardcoded graphite assumptions).
Bootstrapped and regtested on x86_64.
OK for stage4?
Thanks,
- Tom
Fix pdr accesses order
2016-04-07 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/68953
* graphite-sese-to-poly.c (pdr_add_memory_accesses): Order accesses from
first to last subscript.
* gcc.dg/graphite/pr68953.c: New test.
---
gcc/graphite-sese-to-poly.c | 2 +-
gcc/testsuite/gcc.dg/graphite/pr68953.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index b62789f8..22a09a1 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -672,7 +672,7 @@ pdr_add_memory_accesses (isl_map *acc, dr_info &dri)
aff = extract_affine (scop, afn,
isl_space_domain (isl_map_get_space (acc)));
- acc = set_index (acc, i + 1, aff);
+ acc = set_index (acc, nb_subscripts - i , aff);
}
return isl_map_coalesce (acc);
diff --git a/gcc/testsuite/gcc.dg/graphite/pr68953.c b/gcc/testsuite/gcc.dg/graphite/pr68953.c
new file mode 100644
index 0000000..12c632d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr68953.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -floop-nest-optimize" } */
+
+extern void abort (void);
+
+int yu[4][1] = { { 1 }, { 2 }, { 3 }, { 4 } };
+
+static void __attribute__((noinline,noclone))
+foo (void)
+{
+ int zh, ro;
+
+ for (zh = 0; zh < 2; ++zh)
+ for (ro = 0; ro < 3; ++ro)
+ yu[ro][0] = yu[zh + 1][0];
+}
+
+int
+main (void)
+{
+ foo ();
+
+ if (yu[0][0] != 2
+ || yu[1][0] != 2
+ || yu[2][0] != 2
+ || yu[3][0] != 4)
+ abort ();
+
+ return 0;
+}