This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][GRAPHITE] More TLC

From: Richard Biener <rguenther at suse dot de>
To: Sebastian Pop <sebpop at gmail dot com>
Cc: Sven Verdoolaege <sven dot verdoolaege at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 27 Sep 2017 15:04:21 +0200 (CEST)
Subject: Re: [PATCH][GRAPHITE] More TLC
Authentication-results: sourceware.org; auth=none
References: <alpine.LSU.2.20.1709221453010.26836@zhemvz.fhfr.qr> <CAFk3UF_6h1fhLfx_YzBtkhTsNr2A0ppaUvJX2H=AXAZcUmATCQ@mail.gmail.com> <alpine.LSU.2.20.1709251511510.26836@zhemvz.fhfr.qr> <CAFk3UF8bW65O0j=QDkRkQ191LqyG59nzMieN_FCcx4aJcm6QwA@mail.gmail.com> <alpine.LSU.2.20.1709271409290.26836@zhemvz.fhfr.qr>

On Wed, 27 Sep 2017, Richard Biener wrote:

> On Tue, 26 Sep 2017, Sebastian Pop wrote:
> 
> > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguenther@suse.de> wrote:
> > 
> > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > >
> > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguenther@suse.de>
> > > wrote:
> > > >
> > > > >
> > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > > > when reworking canonicalize_loop_closed_ssa.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > > >
> > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > >
> > > > >  61 loop nests "optimized"
> > > > >  45 loop nest transforms cancelled because of code generation issues
> > > > >  21 loop nest optimizations timed out the 350000 ISL "operations" we
> > > allow
> > > > >
> > > > > I say "optimized" because the usual transform I've seen is static
> > > tiling
> > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > There's no way to automagically figure what kind of transform ISL did
> > > > >
> > > >
> > > > Here is how to automate (without magic) the detection
> > > > of the transform that isl did.
> > > >
> > > > The problem solved by isl is the minimization of strides
> > > > in memory, and to do this, we need to tell the isl scheduler
> > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > (RAR + validity) maps.  The proximity does include the
> > > > read after read, as the isl scheduler needs to minimize
> > > > strides between consecutive reads.
> 
> Ah, so I now see why we do not perform interchange on trivial cases like
> 
> double A[1024][1024], B[1024][1024];
> 
> void foo(void)
> {
>   for (int i = 0; i < 1024; ++i)
>     for (int j = 0; j < 1024; ++j)
>       A[j][i] = B[j][i];
> }
> 
> which is probably because
> 
>   /* FIXME: proximity should not be validity.  */
>   isl_union_map *proximity = isl_union_map_copy (validity);
> 
> falls apart when there is _no_ dependence?
> 
> I can trick GRAPHITE into performing the interchange for
> 
> double A[1024][1024], B[1024][1024];
> 
> void foo(void)
> {
>   for (int i = 1; i < 1023; ++i)
>     for (int j = 0; j < 1024; ++j)
>       A[j][i] = B[j][i-1] + A[j][i+1];
> }
> 
> because now there is a dependence.  Any idea on how to rewrite
> scop_get_dependences to avoid "simplifying"?  I suppose the
> validity constraints _do_ also specify kind-of a proximity
> we just may not prune / optimize them in the same way as
> dependences?

Another thing I notice is that we don't handle the multi-dimensional
accesses the fortran frontend produces:

(gdb) p debug_data_reference (dr)
#(Data Ref: 
#  bb: 18 
#  stmt: _43 = *a_141(D)[_42];
#  ref: *a_141(D)[_42];
#  base_object: *a_141(D);
#  Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +, 
stride.88_115}_5

ultimatively we fail here because we try to build a constraint for

{{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5

which ends up computing isl_pw_aff_mul (A, stride.88_115) with
A being the non-constant constraint generated for
{(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
a parameter.  ISL doesn't like that multiplication as the result
isn't affine (well - it is, we just have parameters in there).

I suppose ISL doesn't handle this form of accesses given the
two "dimensions" in this scalarized form may overlap?  So we'd
really need to turn those into references with different access
functions (even if that's not 100% a valid semantic transformation
as scalarization isn't reversible without extra information)?

Thanks,
Richard.

Follow-Ups:
- Re: [PATCH][GRAPHITE] More TLC
  - From: Richard Biener
- Re: [PATCH][GRAPHITE] More TLC
  - From: Sebastian Pop

References:
- [PATCH][GRAPHITE] More TLC
  - From: Richard Biener
- Re: [PATCH][GRAPHITE] More TLC
  - From: Sebastian Pop
- Re: [PATCH][GRAPHITE] More TLC
  - From: Richard Biener
- Re: [PATCH][GRAPHITE] More TLC
  - From: Sebastian Pop
- Re: [PATCH][GRAPHITE] More TLC
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]