This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Proposal: changing representation of memory references

From: "Seongbae Park" <seongbae dot park at gmail dot com>
To: "Zdenek Dvorak" <rakdver at atrey dot karlin dot mff dot cuni dot cz>
Cc: gcc at gcc dot gnu dot org
Date: Wed, 4 Apr 2007 19:09:35 -0700
Subject: Re: Proposal: changing representation of memory references
References: <20070404143508.GA461@atrey.karlin.mff.cuni.cz>

On 4/4/07, Zdenek Dvorak <rakdver@atrey.karlin.mff.cuni.cz> wrote:

Hello,

at the moment, any pass that needs to process memory references are
complicated (or restricted to handling just a limited set of cases) by
the need to interpret the quite complex representation of memory
references that we have in gimple.  For example, there are about 1000 of
lines of quite convoluted code in tree-data-ref.c and about 500 lines in
tree-ssa-loop-ivopts.c dealing with parsing and analysing memory
references.

I would like to propose (and possibly also work on implementing) using a
more uniform representation, described in more details below.  The
proposal is based on the previous discussions
(http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned
about the way memory references are represented in ICC.
It also subsumes the patches of Daniel to make p[i] (where p is pointer)
use ARRAY_REFs/MEM_REFs.

I am not sure whether someone does not already work on something
similar, e.g. with respect to LTO (where the mentioned discussion
started), or gimple-tuples branch?

Proposal:

For each memory reference, we remember the following information:

-- base of the reference
-- constant offset
-- vector of indices
-- type of the accessed location
-- original tree of the memory reference (or another summary of the
      structure of the access, for aliasing purposes)
-- flags

for each index, we remeber
-- lower and upper bound
-- step
-- value of the index

The address of the reference can be computed as

base + offset + sum_{idx} offsetof(idx)

where offsetof(idx) = (value - lower) * step

For example, a.x[i][j].y  would be represented as
base = &a
offset = offsetof (x) + offsetof (y);
indices:
  lower = 0 upper = ? step = sizeof (a.x[i]) value = i
  lower = 0 upper = ? step = sizeof (a.x[j]) value = j

p->x would be represented as
base = p;
offset = offsetof (x);
indices: empty

p[i] as
base = p;
offset = 0
indices:
  lower = 0 upper = ? step = sizeof (p[i]) value = i

Remarks:
-- it would be guaranteed that the indices of each memory reference are
   independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only
   if idx1 == idx1' and idx2 = idx2'; this is important for dependency
   analysis (and for this reason we also need to remember the list of
   indices, rather than trying to reconstruct them from the address).


I didn't completely think through this,
but how would this property help in the presence of array flattening ?
e.g. suppose two adjacent loops, both two-nest-deep,
and only one of them is flattened, then
one loop will have one dimensional access (hence will have only one index)
vs the other loop will have two dimensional.
In other words, &ref[idx1] == &ref[idx1'][idx2'] can happen.
So most likely we'll need to be able to compare the linearized address form,
rather than simply comparing vectors of indexes.

This is not to say I don't like your proposal (I actually do),
but I just want to understand what properties
you're expecting from the representation
(and I think this has implication on the canonicalization of the references
you mentioned below).

Another question is, how would the representation look
for more complicated address calculations
e.g. a closed hashtable access like:

table[hash_value % hash_size]

and would it help in such cases ?

-- it would be guaranteed that the computations of the address do not
   overflow.
-- possibly there could be a canonicalization pass that, given

   for (p = q; p < q + 100; p++)
     p->x = ...  {represented the way described above}

would transform it to

   for (p = q, i = 0; p < q + 100; p++, i++)
     {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = sizeof (*p) value = i}

   so that dependence analysis and friends do not have to distinguish
   between accesses through pointers and arrays

--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";

Follow-Ups:
- Re: Proposal: changing representation of memory references
  - From: Zdenek Dvorak

References:
- Proposal: changing representation of memory references
  - From: Zdenek Dvorak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]