Flattening Memory Reference Trees in the GIMPLE IL

The goal of this project is to canonicalize and flatten the representation of memory references in the GIMPLE IL. The work is done on a branch in SVN, svn://gcc.gnu.org/svn/gcc/branches/mem-ref.

Current Trunk IL State

Currently there are several different tree codes that can be nested to form a single memory reference. This includes all tree codes of the tcc_reference tree code class such as COMPONENT_REF, BIT_FIELD_REF, INDIRECT_REF, ALIGN_INDIRECT_REF, MISALIGNED_INDIRECT_REF, TARGET_MEM_REF, ARRAY_REF, ARRAY_RANGE_REF, VIEW_CONVERT_EXPR, REALPART_EXPR and IMAGPART_EXPR. These are used to for example create a reference like __real a.b[c].d[e].

Several of those reference trees are used on both memory and registers, where for operation on registers partial stores are not allowed.

Operations on Registers

In addition to operating on memory arguments, REALPART_EXPR and IMAGPART_EXPR are used to access the real or imaginary part of a complex register. Stores to a complex register require the complete complex value to be rebuilt using COMPLEX_EXPR which combines the real and imaginary part of a complex value.

BIT_FIELD_REF is used in a similar way to access parts of a vector register. CONSTRUCTOR is used as the counterpart to COMPLEX_EXPR for vectors. BIT_FIELD_REF is also used to extract arbitrary bit-parts from a register. The mem-ref proposal includes a new BIT_FIELD_EXPR which can be used to combine bit-field parts.

VIEW_CONVERT_EXPR is used to interpret the bit representation of a register as a register of a different type. It is unspecified if this is allowed to change the register size. If disallowed this case needs to go through an integer mode and an intermediate BIT_FIELD_REF.

Proposal for Register Operations

Operation of REALPART_EXPR, IMAGPART_EXPR, BIT_FIELD_REF and VIEW_CONVERT_EXPR is disallowed for memory operands. A new BIT_FIELD_EXPR tree code is introduced with the following semantics:

Given a word OP0, a value OP1 and a bitfield position OP2 and size OP3 within the word, produce the value that results if replacing the specified parts of the word with value. Thus,

computes the value

where BS(word) is the bitsize of word. The value computed has the same type as that of the first argument, word.

This allows for BIT_FIELD_REF stores to be decomposed into a read-modify-write cycle, using BIT_FIELD_EXPR for the modify part. In turn this allows for BIT_FIELD_EXPR and BIT_FIELD_REF operations on registers to be combined and simplified where applicable.

Proposal for Memory Operations

Overview

MEM_REF and INDIRECT_MEM_REF operations are introduced. Both have a base as operand zero and an offset as operand one. In addition to these, the effective alias-set and the alignment of the memory access are stored and can be accessed via MEM_REF_ALIAS_SET and MEM_REF_ALIGN respectively.

INDIRECT_MEM_REF <ptr, offset> loads the object of the type of the INDIRECT_MEM_REF reference at the address ptr + offset.

MEM_REF <object, offset> loads the object of the type of the MEM_REF reference at the address &object + offset.

The offset is specified in bytes, the target addresses ptr + offset and &object + offset are aligned at least to MEM_REF_ALIGN of the reference and the access conflicts with all accesses whose alias-set conflicts with the alias-set MEM_REF_ALIAS_SET of the reference.

Alias and Alignment Information

The specification of the alias-set involved in the memory access allows to simplify the TBAA code in the middle-end by pushing all language-level knowledge to the frontends, respective the get_alias_set langhook and its shared generic code. It allows for maintaining the alias-sets as given in the source while exchanging the types the memory accesses work on, like what happens during auto-vectorization or lowering of aggregate copies.

Likewise maintaining alignment information in the reference trees allows for analysis results to be preserved which should ease for example versioning for alignment in the autovectorizer. This also gets rid of the need for ALIGNED_INDIRECT_REF and MISALIGNED_INDIRECT_REF variants.

Both of these additional informations are also present in the RTL mem expressions and such can be maintained across expansion. This also should remove the need to keep MEM_EXPR and some of the flags.

Lowering Process

The GENERIC memory reference trees (which are not changed, so frontends are unaffected by this project) are lowered by pass_lower_mem.

ARRAY_REF Lowering, IDX_EXPR

It is important to preserve index, stride and size information of (multi-dimensional) array accesses for data-dependency analysis and loop optimizations to be effective. At present this information is encoded in the recursive memory reference tree and the intermediate array types referenced.

The ARRAY_REF and ARRAY_RANGE_REF accesses are lowered as follows. The index operand is converted into a byte index appropriate for MEM_REF by encoding the multiplication with the element size and the chaining with other offsets in a new expression, IDX_EXPR. IDX_EXPR behaves as a multiply-add operation while preserving operand position to give them semantics.

IDX_EXPR <offset, index, size> computes offset + (T)index * (T)size

The type T of IDX_EXPR shall match that of a POINTER_PLUS_EXPR offset, likewise the offset argument. The index and size arguments are appropriately converted to T before the multiplication (FIXME?).

Operands of IDX_EXPR should not be re-associated with other expressions, nor should the index and size operands be swapped. Constant-folding of IDX_EXPR should be limited to reducing the whole IDX_EXPR to a constant (FIXME?).

Examples

  float A[8][8];
  ...
  return A[i][j];

is lowered as

  j.2 = (unsigned int) j.1;
  D.1666 = IDX <0 + j.2 * 4>;
  i.3 = (unsigned int) i.0;
  D.1668 = IDX <D.1666 + i.3 * 32>;
  D.1660 = MEM <&A + D.1668 {0}>;
  return D.1660;

struct {
  int a : 3;
  int b : 4;
  int c : 9;
} m;
...
  return m.b;

is lowered as

  MEML.0 = MEM <&m {0}>;
  D.1185 = BIT_FIELD_REF <MEML.0, 4, 3>;
  D.1184 = (int) D.1185;
  return D.1184;

  m.b = e;

is lowered as

  D.1185 = (<unnamed-signed:4>) e;
  MEML.0 = MEM <&m {0}>;
  MEML.0 = BIT_FIELD_EXPR <MEML.0, D.1185, 4, 3>;
  MEM <&m {0}> = MEML.0;

Proposal for Canonicalizing Adresses and Objects

ADDR_EXPR gets an extra offset operand. This allows to retain invariant addresses we allow now, like &a.x[2] with a single tree code.

  ADDR_EXPR <object, offset>  <-->  POINTER_PLUS_EXPR <ptr, offset>
            |               \     /         |
            |               /     \         |
  MEM_REF <object, offset>   <-->  INDIRECT_MEM_REF <ptr, offset>

The symmetry allows simple propagation of addresses and base objects across offset chains, as well as taking addresses or dereferencing references and objects.

Advantages of MEM_REF

More intermediate values that are computed get SSA_NAMEs assigned and thus are exposed to scalar optimizers. In particular this is true for multi-dimensional array accesses where some of these values can be CSEd and are loop-invariant.

Querying offset, size and alias-set of an access is cheaper.

SCCVN can easily handle punning through unions by value numbering type-agnostic and inserting VIEW_CONVERT_EXPRs at replacement time where needed.

Alignment information derived from VRP or by versioning can be kept associated to the memory accesses.

bit-field lowering makes manual bit-field implementation and structure component bit-fields canonical.

Storing the effective alias-set allows for more propagation of address expressions, simplifying the IL.

TARGET_MEM_REF can be simply represented as INDIRECT_MEM_REF <BASE, IDX <OFFSET, INDEX, STEP> > or MEM_REF <SYMBOL, IDX <OFFSET, INDEX, STEP> >.

Disadvantages of MEM_REF

  struct { int a[3]; int b; } m;

With MEM_REF we cannot easily say that m.a[i] does not alias m.b. gcc.dg/tree-ssa/alias-10.c is exactly this case. The proposal is to keep persistent value range information for i to handle this case. Zdeneks earlier proposal included the array size per array ref operand (http://gcc.gnu.org/ml/gcc/2007-04/msg00096.html).

Open Questions

We preserve IDX_EXPRs once they are not fully constant. So we even get things like

  D.1189_4 = D.1188_2 + i_3(D);
  D.1193_5 = IDX <0 + D.1189_4 * 1>;
  D.1190_6 = MEM <char {0}, &regs_ever_live + D.1193_5>;

where D.1193_5 is a simple (obfuscated) copy of D.1189_4. Still it makes it easier to recognize an array access this way. And it would be consistent with the IL for a constant but non-1 stride.

The whole issue about invariant addresses with offsets is still open. At the moment &a p+ CST is considered a GIMPLE invariant just like &a[CST] is in the current GIMPLE IL. But this is a nested tree (ADDR_EXPR inside POINTER_PLUS_EXPR) which is not too nice, a ADDR_EXPR with an offset operand or a ADDR_PLUS_EXPR would solve this (minor) issue.

Branch Status

The branch bootstraps ok for C, C++ and Fortran on x86_64 with multilibs on. The C execute.exp and compile.exp tortures are clean (the builtins.exp torture shows missed optimizations). At this point no regressions in the C testsuite should be checked for any patch.

Implementation TODO

The following still needs to be done ontop of the current patch:

The following basic optimizations are missing:

The following optimization passes need rewrite or adjustments:

None: MemRef (last edited 2008-03-11 14:23:46 by RichardGuenther)