This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR84859
On Tue, 20 Mar 2018, Richard Biener wrote:
> On Tue, 20 Mar 2018, Bin.Cheng wrote:
>
> > On Mon, Mar 19, 2018 at 11:57 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote:
> > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?
> > >
> > > Ok, thanks.
> > Hi Richard,
> >
> > This change causes below test failure on aarch64/arm linux:
> > FAIL: libgomp.graphite/force-parallel-4.c scan-tree-dump-times
> > graphite "2 loops carried no dependency" 1
>
> So it looks like the conditional stores to A[i] in
>
> for (i = 0; i < N; i++)
> {
> if (i < T)
> /* loop i1: carried no dependency. */
> A[i] = A[i+T];
> else
> /* loop i2: carried dependency. */
> A[i] = A[i+T+1];
> }
>
> vs. the stores commoned in the merge block causes the merge
> block to appear in the schedule and thus
>
> [scheduler] original ast:
> {
> for (int c0 = 0; c0 <= 19999; c0 += 1)
> S_3(c0);
> - for (int c0 = 0; c0 <= 999; c0 += 1)
> - S_5(c0);
> - for (int c0 = 1000; c0 <= 9999; c0 += 1)
> - S_6(c0);
> + for (int c0 = 0; c0 <= 9999; c0 += 1) {
> + if (c0 >= 1000) {
> + S_6(c0);
> + } else {
> + S_5(c0);
> + }
> + S_7(c0);
> + }
> }
>
> [scheduler] AST generated by isl:
> {
> for (int c0 = 0; c0 <= 19999; c0 += 1)
> S_3(c0);
> - for (int c0 = 0; c0 <= 999; c0 += 1)
> + for (int c0 = 0; c0 <= 999; c0 += 1) {
> S_5(c0);
> - for (int c0 = 1000; c0 <= 9999; c0 += 1)
> + S_7(c0);
> + }
> + for (int c0 = 1000; c0 <= 9999; c0 += 1) {
> S_6(c0);
> + S_7(c0);
> + }
> }
>
> not sure why neither of the two loops end up parallelizable then.
> Possibly because of the dependences we generate for the cross-BB
> scalar deps which store and load to a "scalar" memory location
> (in the end those are just SSA dependences). So it's an artificial
> limitation caused by how we represent scheduling dependences
> I guess:
>
> +Adding scalar write: _4
> +From stmt: _4 = A[_3];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar write: _2
> +From stmt: _2 = A[_1];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar read: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
>
> I have no idea/plan to improve that on the GRAPHITE/ISL side right now
> but maybe Sebastian has? We'd need to tell ISL that we can "elide"
> the dependence or rather that it is inner-iteration only and doesn't
> leak cross-iteration - kind of clobbering it at the end?
>
> So I'm going to XFAIL the testcase.
2018-03-20 Richard Biener <rguenther@suse.de>
* testsuite/libgomp.graphite/force-parallel-4.c: XFAIL one
parallelizable loop.
Index: libgomp/testsuite/libgomp.graphite/force-parallel-4.c
===================================================================
--- libgomp/testsuite/libgomp.graphite/force-parallel-4.c (revision 258678)
+++ libgomp/testsuite/libgomp.graphite/force-parallel-4.c (working copy)
@@ -46,7 +46,10 @@ int main(void)
return 0;
}
-/* Check that parallel code generation part make the right answer. */
-/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" } } */
+/* Check that parallel code generation part make the right answer.
+ ??? XFAILed for i1 because conditional store elimination wrecks
+ our dependence representation. */
+/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 1 "graphite" } } */
/* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
/* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */