This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [graphite] RFC: Add ISL variants of remaining PPL things

From: Tobias Grosser <tobias at grosser dot es>
To: Michael Matz <matz at suse dot de>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 29 Jun 2012 15:35:30 +0200
Subject: Re: [graphite] RFC: Add ISL variants of remaining PPL things
References: <alpine.LNX.2.00.1206261203020.25982@wotan.suse.de>

On 06/26/2012 12:21 PM, Michael Matz wrote:

Hi,

so, to make progress on the graphite front we want to get rid of the ppl
dependency.  We start from Tobis move-to-isl-and-isl-scheduler branch at
github, merged current trunk into that (see also Richis mails about
cloog/isl configury), and add ISL implementations of the essential parts
that still are ppl-only.  These are the number of iteration analysis for
potentially transformed iteration spaces (this wasn't mentioned in Tobis
mail from February) and the interchange heuristic (basically stride
calculation for transformed spaces).

So I learned PPL and ISL and rewrote that part :)  Like in the patch
below.  I'm not asking for any proper review for this, but rather if the
isl code makes sort of sense.  The testsuite works, except for pr43012.c,
which exposes the fact that the current PPL nr-iter implementation is
buggy (it computes the nr-iter of a 77<=i<=88 loop to be 65), so that
testcase runs into the nr-iter-ppl==nr-iter-isl assert.

In a private branch I have already removed all PPL code whatsoever (and
cleaned up the ISL code I added) so we make good progress there.

Hi Michael,

I am very impressed with how little time it took you to implement this patch. As it is working for several cases, you got a lot of stuff right. ;-)

I will still add some comments to the patch.

+static isl_constraint *
+build_linearized_memory_access (isl_map *map, poly_dr_p pdr)
+{
+  isl_constraint *res;
+  isl_local_space *ls = isl_local_space_from_space (isl_map_get_space (map));
+  unsigned offset, nsubs;
+  int i;
+  isl_int size, subsize;
+
+  res = isl_equality_alloc (ls);
+  isl_int_init (size);
+  isl_int_set_ui (size, 1);
+  isl_int_init (subsize);
+  isl_int_set_ui (subsize, 1);
+
+  nsubs = isl_set_dim (pdr->extent, isl_dim_set);
+  /* -1 for the already included L dimension.  */
+  offset = isl_map_dim (map, isl_dim_out) - 1 - nsubs;
+  res = isl_constraint_set_coefficient_si (res, isl_dim_out, offset + nsubs, -1);
+  /* Go through all subscripts from last to first.  First dimension
+     is the alias set, ignore it.  */
+  for (i = nsubs - 1; i>= 1; i--)
+    {
+      isl_space *dc;
+      isl_aff *aff;
+      enum isl_lp_result lpres;
+
+      res = isl_constraint_set_coefficient (res, isl_dim_out, offset + i, size);
+
+      dc = isl_set_get_space (pdr->extent);
+      aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc));
+      aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1);
+      lpres = isl_set_max (pdr->extent, aff,&subsize);
+      isl_aff_free (aff);
+      isl_int_mul (size, size, subsize);
+    }
+
+  isl_int_clear (subsize);
+  isl_int_clear (size);
+
+  return res;
+}

The function itself looks correct. However, I would personally have returned an isl_map instead of an isl_constraint. A map {A[i,j] -> [200 * i]} is probably a nicer way to represent a transformation from the higher level array type to an actual memory reference.

+static void
+pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr)
+{
+  poly_bb_p pbb = PDR_PBB (pdr);
+  isl_map *map;
+  isl_set *set;
+  isl_aff *aff;
+  isl_space *dc;
+  isl_constraint *lma, *c;
+  isl_int islstride;
+  enum isl_lp_result lpres;
+  graphite_dim_t time_depth;
+  unsigned offset, nt;
+  unsigned i;
+  /* pdr->accesses:    [P1..nb_param,I1..nb_domain]->[a,S1..nb_subscript]
+          ??? [P] not used for PDRs?
+     pdr->extent:      [a,S1..nb_subscript]
+     pbb->domain:      [P1..nb_param,I1..nb_domain]
+     pbb->transformed: [P1..nb_param,I1..nb_domain]->[T1..Tnb_sctr]
+          [T] includes local vars (currently unused)
+
+     First we create [P,I] ->  [T,a,S].  */
+
+  map = isl_map_flat_range_product (isl_map_copy (pbb->transformed),
+				    isl_map_copy (pdr->accesses));
+  /* Add a dimension for L: [P,I] ->  [T,a,S,L].*/
+  map = isl_map_add_dims (map, isl_dim_out, 1);
+  /* Build a constraint for "lma[S] - L == 0", effectively calculating
+     L in terms of subscripts.  */
+  lma = build_linearized_memory_access (map, pdr);
+  /* And add it to the map, so we now have:
+     [P,I] ->  [T,a,S,L] : lma([S]) == L.  */
+  map = isl_map_add_constraint (map, lma);
+
+  /* Then we create  [P,I,P',I'] ->  [T,a,S,L,T',a',S',L'].  */
+  map = isl_map_flat_product (map, isl_map_copy (map));

Your analysis is here a 1:1 mapping from PPL to isl. That works, but is probably not the nicest/conceptually cleanest way to implement this. In general, in isl we try to avoid adding/removing individual dimensions explicitly. The canonical way to express such transformations is to create a new map and to "apply" this map on the set/map you want to modify. For this algorithm, I would start with the pdr->accesses map and 'apply' pbb->transformed to map the original iteration space into the scattering space. I would than apply the isl_map calculated by build_linearized_memory_access to map the array to the actual data location accessed. Now you create a map 'scattering->scattering' that maps from one scattering time point to the next. And apply the map 'scattering->memory' on both the range and the domain. You will end up with a map memory -> memory. Here you can use the deltas function to give you the distances. ;-)

You can have a look in Polly, where I implement a similar calculation:

http://repo.or.cz/w/polly-mirror.git/blob/HEAD:/lib/Analysis/ScopInfo.cpp#l390

I wrote the stride detection code. You are allowed to use a copy of it for GCC under my copyright assignment.

  /* Sets STRIDES to the sum of all the strides of the data references
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 4520bbb..bf1854c 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -26,6 +26,8 @@ along with GCC; see the file COPYING3.  If not see
  #include<isl/map.h>
  #include<isl/union_map.h>
  #include<isl/constraint.h>
+#include<isl/ilp.h>
+#include<isl/aff.h>
  #include<cloog/cloog.h>
  #include<cloog/isl/domain.h>
  #endif
@@ -1573,6 +1575,25 @@ debug_scop_params (scop_p scop, int verbosity)
    print_scop_params (stderr, scop, verbosity);
  }

+extern isl_ctx *the_isl_ctx;
+void print_isl_set (FILE *f, isl_set *set);
+void print_isl_map (FILE *f, isl_map *map);
+void
+print_isl_set (FILE *f, isl_set *set)
+{
+  isl_printer *p = isl_printer_to_file (the_isl_ctx, f);
+  p = isl_printer_print_set (p, set);
+  isl_printer_free (p);
+}
+
+void
+print_isl_map (FILE *f, isl_map *map)
+{
+  isl_printer *p = isl_printer_to_file (the_isl_ctx, f);
+  p = isl_printer_print_map (p, map);
+  isl_printer_free (p);
+}

Are you introducing these functions for debugging purposes. If this is the case, you can either use isl_set_dump(set), isl_map_dump(map), .. or you can use the gdb islprint command that should be made available by the gdb python script, installed by isl.

Also, you can get the_isl_ctx, by calling isl_map_get_ctx(map).

@@ -1704,6 +1753,7 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb,
       it to the scattering.  */
    ppl_new_Pointset_Powerset_C_Polyhedron_from_C_Polyhedron
      (&sctr_lb, PBB_TRANSFORMED_SCATTERING (pbb));
+  lbmap = isl_map_copy (pbb->transformed);
    for (i = 0; i<  (int) dim_iter_domain; i++)
      {
        ppl_Linear_Expression_t eq;
@@ -1732,6 +1782,23 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb,
        ppl_delete_Pointset_Powerset_C_Polyhedron (pph);
        ppl_delete_Constraint (pc);
        ppl_delete_Constraint_System (cs);
+	{
+	  isl_space *dc = isl_set_get_space (pbb->domain);
+	  isl_aff *aff;
+	  isl_int isllb;
+	  enum isl_lp_result res;
+	  isl_constraint *c;
+	  isl_int_init (isllb);
+	  aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc));
+	  aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1);
+	  res = isl_set_min (pbb->domain, aff,&isllb);
+	  c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (pbb->transformed)));
+	  c = isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1);
+	  c = isl_constraint_set_constant (c, isllb);
+	  lbmap = isl_map_add_constraint (lbmap, c);
+	  isl_int_clear (isllb);
+	  isl_aff_free (aff);
+	}
      }

    /* Extract the number of iterations.  */
@@ -1741,6 +1808,56 @@ pbb_number_of_iterations_at_time (poly_bb_p pbb,
    ppl_max_for_le_pointset (sctr_ub, le, ub);
    mpz_sub (res, ub, lb);

+    {
+      isl_space *dc;
+      isl_aff *aff;
+      isl_int isllb, islub, islres;
+      enum isl_lp_result lpres;
+
+      isl_int_init (isllb);
+      isl_int_init (islub);
+      isl_int_init (islres);
+
+      lbset = isl_map_range (lbmap);
+      ubset = isl_map_range (ubmap);
+
+      dc = isl_set_get_space (lbset);
+      aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc));
+      aff = isl_aff_set_coefficient_si (aff, isl_dim_in, time_depth, 1);
+      lpres = isl_set_min (lbset, aff,&isllb);
+      lpres = isl_set_max (ubset, aff,&islub);
+
+      isl_int_sub (islres, islub, isllb);
+
+      isl_int_set_gmp (isllb, res);
+
+      /* This would ideally hold, but the PPL code actually is buggy
+         (returns upper bound 0 for unanalyzable domains), and incomplete,
+	 pdr_add_data_dimensions can constrain the ISL domain more than
+	 the PPL domain.  So, with ISL we get better results on pr37485.c.  */
+      /*gcc_assert (isl_int_eq (isllb, islres));*/
+
+      /* Even shorter and more correct method: map the iteration domain
+         through the current scatter, and work on the resulting set.  */
+      isl_set_free (lbset);
+      lbset = isl_set_apply (isl_set_copy (pbb->domain),
+			     isl_map_copy (pbb->transformed));
+      lpres = isl_set_min (lbset, aff,&isllb);
+      lpres = isl_set_max (lbset, aff,&islub);
+
+      isl_int_sub (islub, islub, isllb);
+      isl_int_add_ui (islub, islub, 1);
+      gcc_assert (isl_int_eq (islub, islres));
+
+      isl_int_clear (isllb);
+      isl_int_clear (islub);
+      isl_int_clear (islres);
+      isl_aff_free (aff);
+
+      isl_set_free (lbset);
+      isl_set_free (ubset);
+    }

This is again modeling the PPL code. There is nothing wrong with this, however you may be able to simplify the code by using isl_*_deltas.

+
    mpz_clear (one);
    mpz_clear (diff);
    mpz_clear (lb);
diff --git a/gcc/graphite.c b/gcc/graphite.c
index dd98f5e..821efb1 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -253,6 +253,8 @@ graphite_finalize (bool need_cfg_cleanup_p)
      print_loops (dump_file, 3);
  }

+isl_ctx *the_isl_ctx;

Why are you creating a global ctx here?

Thanks a lot Michael for working on this.

Tobi

References:
- [graphite] RFC: Add ISL variants of remaining PPL things
  - From: Michael Matz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]