This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Fix PR84859

From: Richard Biener <rguenther at suse dot de>
To: "Bin.Cheng" <amker dot cheng at gmail dot com>
Cc: Jakub Jelinek <jakub at redhat dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
Date: Tue, 20 Mar 2018 13:42:04 +0100 (CET)
Subject: Re: [PATCH] Fix PR84859
References: <alpine.LSU.2.20.1803161503010.18265@zhemvz.fhfr.qr> <20180319115741.GL8577@tucnak> <CAHFci28wRnZMnY2D+CiUUauZEM8tOmMjV_p23D3ASB3Wn_otZA@mail.gmail.com> <alpine.LSU.2.20.1803201255430.18265@zhemvz.fhfr.qr>

On Tue, 20 Mar 2018, Richard Biener wrote:

> On Tue, 20 Mar 2018, Bin.Cheng wrote:
> 
> > On Mon, Mar 19, 2018 at 11:57 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote:
> > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?
> > >
> > > Ok, thanks.
> > Hi Richard,
> > 
> > This change causes below test failure on aarch64/arm linux:
> > FAIL: libgomp.graphite/force-parallel-4.c scan-tree-dump-times
> > graphite "2 loops carried no dependency" 1
> 
> So it looks like the conditional stores to A[i] in
> 
>   for (i = 0; i < N; i++)
>     {
>       if (i < T)
>         /* loop i1: carried no dependency.  */
>         A[i] = A[i+T];
>       else
>         /* loop i2: carried dependency.  */
>         A[i] = A[i+T+1];
>     }
> 
> vs. the stores commoned in the merge block causes the merge
> block to appear in the schedule and thus
> 
>  [scheduler] original ast:
>  {
>    for (int c0 = 0; c0 <= 19999; c0 += 1)
>      S_3(c0);
> -  for (int c0 = 0; c0 <= 999; c0 += 1)
> -    S_5(c0);
> -  for (int c0 = 1000; c0 <= 9999; c0 += 1)
> -    S_6(c0);
> +  for (int c0 = 0; c0 <= 9999; c0 += 1) {
> +    if (c0 >= 1000) {
> +      S_6(c0);
> +    } else {
> +      S_5(c0);
> +    }
> +    S_7(c0);
> +  }
>  }
> 
>  [scheduler] AST generated by isl:
>  {
>    for (int c0 = 0; c0 <= 19999; c0 += 1)
>      S_3(c0);
> -  for (int c0 = 0; c0 <= 999; c0 += 1)
> +  for (int c0 = 0; c0 <= 999; c0 += 1) {
>      S_5(c0);
> -  for (int c0 = 1000; c0 <= 9999; c0 += 1)
> +    S_7(c0);
> +  }
> +  for (int c0 = 1000; c0 <= 9999; c0 += 1) {
>      S_6(c0);
> +    S_7(c0);
> +  }
>  }
> 
> not sure why neither of the two loops end up parallelizable then.
> Possibly because of the dependences we generate for the cross-BB
> scalar deps which store and load to a "scalar" memory location
> (in the end those are just SSA dependences).  So it's an artificial
> limitation caused by how we represent scheduling dependences
> I guess:
> 
> +Adding scalar write: _4
> +From stmt: _4 = A[_3];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar write: _2
> +From stmt: _2 = A[_1];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar read: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> 
> I have no idea/plan to improve that on the GRAPHITE/ISL side right now
> but maybe Sebastian has?  We'd need to tell ISL that we can "elide"
> the dependence or rather that it is inner-iteration only and doesn't
> leak cross-iteration - kind of clobbering it at the end?
> 
> So I'm going to XFAIL the testcase.

2018-03-20  Richard Biener  <rguenther@suse.de>

	* testsuite/libgomp.graphite/force-parallel-4.c: XFAIL one
	parallelizable loop.

Index: libgomp/testsuite/libgomp.graphite/force-parallel-4.c
===================================================================
--- libgomp/testsuite/libgomp.graphite/force-parallel-4.c	(revision 258678)
+++ libgomp/testsuite/libgomp.graphite/force-parallel-4.c	(working copy)
@@ -46,7 +46,10 @@ int main(void)
   return 0;
 }
 
-/* Check that parallel code generation part make the right answer.  */
-/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" } } */
+/* Check that parallel code generation part make the right answer.
+   ???  XFAILed for i1 because conditional store elimination wrecks
+   our dependence representation.  */
+/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 1 "graphite" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */

Follow-Ups:
- Re: [PATCH] Fix PR84859
  - From: Jakub Jelinek

References:
- [PATCH] Fix PR84859
  - From: Richard Biener
- Re: [PATCH] Fix PR84859
  - From: Jakub Jelinek
- Re: [PATCH] Fix PR84859
  - From: Bin.Cheng
- Re: [PATCH] Fix PR84859
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]