[gcc(refs/vendors/ARM/heads/morello)] Handle capabilities in affine trees and induction variables

Mon Sep 11 17:25:34 GMT 2023

https://gcc.gnu.org/g:7fc4b29813c02707bc8fef2bf22b6986847e0542

commit 7fc4b29813c02707bc8fef2bf22b6986847e0542
Author: Matthew Malcomson <matthew.malcomson@arm.com>
Date:   Mon Sep 11 16:18:54 2023 +0100

    Handle capabilities in affine trees and induction variables
    
    This enables many loop optimisations throughout the compiler, which
    means we also need to update their code.  As it happens the majority of
    that code only needs local changes (things like changing the "step" of a
    pointer variable throughout a loop to be in the "offset" type rather
    than the pointer type).
    
    An affine tree represents a value by a polynomial.  This consists of up
    to MAX_AFF_ELTS basic variables, each with an associated coefficient.
    It seems reasonable to limit how a capability can be represented in this
    data structure so that handling of capability-specific requirements can
    be implemented easily.
    
    There are various small changes throughout the rest of the compiler
    which I believe stand on their own.  Below are discussion on those
    points which I believe need more explanation.
    
    ---- Representation of capabilities in affine trees.
    Given that any variable can only take its provenance from one source, it
    seems to be a somewhat obvious restriction to disallow having multiple
    capability-typed `elts` in any affine tree.
    There are two other questions which seem less obvious.  First, should it
    be allowed for there to be a capability-typed affine tree which *does
    not* contain a capability-typed element.  Second, some affine tree
    calculations have temporaries with multiples of themselves -- this
    should clearly be interpreted as a single provenance with a multiple of
    the address, but should we represent it in the affine tree data
    structure as a "multiple" of the capability element or as a single
    capability element with extra "address value" elements.
    
    Originally, we went with disallowing any capability-typed affine tree
    which does not have a capability-typed element in it.  This invariant
    was maintained by dropping the capability type on an affine tree when a
    capability element was removed, and taking the capability type from the
    new element when a capability element was added to an affine tree
    without such a capability type.
    
    We found that this resulted in a reasonable amount of type information
    loss since there are a good number of places which build an affine tree
    up incrementally and/or perform operations which drop capability
    information temporarily in order to add them back in.
    Each place *could* be fixed up to do the same operation differently (so
    that the original type is maintained over the entire operation), but it
    seemed a cleaner mechanism of maintaining the requested type throughout
    operations to allow a capability-typed affine tree to have no provenance
    and treat that as a NULL-derived capability on regeneration.
    
    Hence the invariant we go with in this patch is:
      - A capability-typed affine tree may have up to one capability-typed
        element.
      - A non-capability-typed affine tree may not have a capability-typed
        element.
    It seems that the choice between these two invariants does not change
    the optimisations enabled (the same ~2000 optimisation tests go from
    FAIL->PASS).
    
    For the second choice (around representing a multiple of capabilities as
    either a multiple on a capability or a single capability plus some
    multiple of another element of "address values") it seems that the
    majority of code manipulating affine trees would "just work" with a
    multiple of the capability element but could "miss" any extra
    non-capability elements.  This seems like it could result in confusion,
    especially around losing capability information when we should have
    actually been removing address value addends.
    None of that is a deal-breaker, it's just more code-changes that would
    need to be made, but it seemed the only functional difference between
    these two choices and worth choosing the representation of a multiple of
    capability elements.
    
    --- Whether to change valid affine tree operations.
    While adding this patch we need to make sure that the operations one can
    perform on an affine tree do not lead to invalid operations being made
    on a capability.
    
    The primitives that affine trees currently have are limited to addition
    and multiplication.  Subtraction is implemented by multiplying an affine
    tree by a constant -1 and performing addition.
    When working with capabilities we need to ensure that one does not add
    two different capability affine trees to each other (since we must not
    have an affine tree with two sources of provenance).  Based on this it
    seems worth considering creating new affine tree primitives to match the
    equivalent tree primitives (POINTER_PLUS and POINTER_DIFF).
    
    The hope with such a change would be that we could enforce more
    discipline on callers (by ensuring that whenever we have a pointer one
    much call the new primitives), and that would help make code written for
    non-CHERI correct by design for CHERI as well.
    
    When looking into this, we found that there was a surprisingly limited
    number of places that an affine tree was operated on in terms of a
    pointer.  Most places used the affine tree as an unsigned integer, and
    the only real complex use-case would cast any pointer type to an
    unsigned integral type in order to avoid generating overflow concerns on
    the operations that it generates.
    
    Hence we believe there would be little benefit to adding such new
    primitives.
    
    --- Provenance of different SSA_NAME's in get_computation_aff_1.
    The most tricky part of this work has been around get_computation_aff_1.
    This function determines the expression which represents a USE in terms
    of an induction variable CAND.  It should return false if the USE can
    not be represented in terms of CAND.
    
    This function is called in two contexts.  It is sometimes called to
    query whether something is possible, and sometimes called when the
    surrounding code *knows* that representing USE in terms of CAND is
    possible.
    
    When the surrounding code knows that the representation is possible, it
    is often because the surrounding code has provided get_computation_aff_1
    with a candidate which is an SSA_NAME representing the USE expression
    (or representing a small offset from the relevant expression).  Before
    capabilities this is not a problem, we just add the difference between
    values to the variable.  With capabilities this is a problem since we
    can not represent one capability in terms of another unless their
    provenances are equal.
    We could have handled this by:
    1) Pass an argument to the function telling it that when it is called
       from a given position the provenances of each variable is known to be
       the same.
    2) Adding some mechanism to look through the SSA_NAME definition
       statements to bottom out where the provenance comes from for both
       elements and ensure that *this* is equal.
    3) Ensure that the representation returned always takes the provenance
       from the "USE" when there is a question around metadata.
    
    Option (1) positives:
    - Easily understandable in implementation.
    Negatives:
    - Puts quite strong limitations on what can be analyzed in this pass.
    
    (2) positives:
    - Puts medium strong limitations on what can be analyzed in this pass.
    Negatives:
    - Still some limitations.
    - Reasonably awkward to implement
    
    (3) positives:
    - Puts least limitations on what can be analyzed in this pass.
    - Pretty simple to implement.
    Negatives:
    - May end up doing too much manipulation to represent one capability in
      terms of another.  Though have added some costing adjustments to
      mitigate this.
    
    This patch implements option (3).
    
    As an interesting note is that there is an existing limitation put on
    representations that we do not represent one pointer in terms of
    another.  This is documented as because representing one pointer in
    terms of another pointer can confuse RTL alias analysis.  The capability
    analysis we use naturally splits the "real" pointer and any offsets.
    Hence we remove this restriction for capability targets.
    
    --- Temporaries and large offsets coming from get_computation_aff_1
    When get_computation_aff_1 represents a use of a variable in terms of an
    induction variable, it can sometimes end up with very large offsets.
    One example of this would be representing the use of a variable which
    iterates forwards in terms of a variable which iterates backwards.  The
    code ends up representing splitting things into an invariant of `2*&x`
    and a variable part of `1*&x + loop-num*constant`.  That way the
    variable part can be directly used in other areas which require a
    positive step *and* can be used via subtraction from the `2*&x`
    invariant when the use requires a negative step.
    
    As it stands these multiples could be almost anything, the only
    restriction being that we end up with the correct value in the end.
    This could be represented with an affine tree via a combination of
    different SSA_NAME's which have almost the same value.
    In order to handle this we "extract" the capability from an affine tree
    and introduce it back after calculating the offset we need.  This should
    avoid any temporary calculation invalidating the capability.
    
    We take this care in rewrite_use_nonlinear_expr, rewrite_use_address,
    and aff_combination_to_tree.  As it happens both aff_combination_to_tree
    and rewrite_use_nonlinear_expr already had some mechanism to calculate
    the offset to a pointer and then combine it to a pointer type, and we
    simply extend them to also account for capabilities.
    rewrite_use_address required changing how a struct mem_address and
    TARGET_MEM_REF are represented in order to maintain a separation between
    the "base" capability and the different integral offset parts.
    
    This rewrite needed to ensure that offsets and capabilities were
    combined such that all offsets that could be large were combined before
    being added to the capability.  Without this behaviour the capability
    could be brought out of its representable range with one offset before
    being brought back in to representable range by another offset.  That
    round-trip would leave the capability pointing at the correct place but
    having lost its validity bit.
    
    In order to do this we chose to have a separate `base_cap` member of the
    `mem_address` structure.  This allowed maintaining the split between
    different offsets in `index*step` and in `base` as is currently
    maintained, while still maintaining this separation between the base
    capability and its offsets.  N.b. for reference the `offset` member in
    `mem_address` must be a constant.
    
    There is only one other place which takes an affine tree combination and
    uses it directly.  This is rewrite_use_compare, and is about
    comparisons, so the metadata is dropped anyway and no special handling
    is needed.
    
    The adjustment of TARGET_MEM_REF requires special attention.  There is
    an existing invariant in TARGET_MEM_REF codes which asserts that when
    BASE is not the address of a static or global variable, then INDEX2 is
    NULL.  For non-capabilities this makes sense since there is no reason to
    maintain two register values to be added together.
    In order to avoid combining any base capability with a temporary offset
    we can not simply combine BASE and INDEX2 when they are both registers.
    We could solve this through three different options.
    - Specify a different invariant for capability targets.
      Maintaining the original invariant for non-capability targets.
      Code may be written assuming the original invariant that would be
      broken on capability targets.
      There seems to only be two cases in the existing codebase which rely
      on this invariant, so this is not a *huge* concern.
    - Introduce a new operand to TARGET_MEM_REF.
      New operand is only used on capability targets.
      Still reasonable that non-capability targets may write code which
      forgets to handle this operand.
    - Whenever we need to express a value which could be problematic,
      combine all offset parts into one.
      I.e. whenever BASE is not an ADDR_EXPR of a static or global symbol,
      emit code to manually combine all offsets into one, then emit a
      TARGET_MEM_REF of that single BASE capability.
      So:
      - single TARGET_MEM_REF representing "BASE + STEP*INDEX + OFFSET +
        INDEX2".
      gets replaced with
      - GIMPLE for "TOTAL_OFF = STEP*INDEX + OFFSET + INDEX2".
      - GIMPLE for "FINAL_BASE = BASE + TOTAL_OFF".
      - TARGET_MEM_REF for "access FINAL_BASE".
      Some optimisation opportunities may be lost since we have lost the
      benefit of there being a TARGET_MEM_REF tree code anyway.
    
    N.b. this did not seem like a choice which mattered a huge amount as:
    1) There did not seem to be a large amount of optimisation tests which
       newly failed when choosing the last option (which should be the least
       performant).
    2) There are not many cases which rely on the existing invariant.
    
    I chose to break the invariant since that way we can see whether the
    optimistic outlook works and fall-back to the pessimistic approach.
    Choosing the pessimistic approach would not be able to tell us whether
    the optimistic approach would have been OK.
    
    When investigating differences in the testsuite it seems that the tests
    `tree-ssa/{loop-2,scev-3,scev-4}.c` have a more compact TREE
    representation with this choice than without, but that slightly nicer
    representation does not correlate to a difference in final codegen.
    
    --- Representing pointers in terms of offset induction variables.
    The pointer which we use to access x[i] could be represented in terms of
    a pointer induction variable (starting at x and incrementing with the
    same step as i has) or in terms of an offset induction variable (where
    the induction variable is essentially i, and there is a constant addend
    to the induction variable of x).
    
    These represent the same thing, but using one or the other may be more
    convenient depending on what other induction variables are in the loop.
    
    After the adjustments for handling SSA_NAME's with different provenance
    and for handling temporaries with large offsets, the second
    representation now works by correctly identifying the "capability" part
    and ensuring that temporaries do not invalidate it by taking it out of
    bounds.
    
    --- Some vectorizer overread fixes.
    Due to the behaviour of capabilities, we hit a bunch of places where the
    vectorizer reads out of bounds of a given object.
    
    Without capabilities this is fine as long as the vector loads can never
    cause the crossing of a page boundary where the scalar loads would not
    have done so.  With capabilities this is never OK, the pointer to this
    object will not allow accesses outside of the object.
    Hence we have to adjust those places which implement this check based on
    known alignment of the array we are loading from.
    
    --- INTCAP typed variables.
    As it stands we do not look for INTCAP typed induction variables.
    Since all operations on INTCAP typed variables are performed via an
    operation on their address value followed by a call to
    REPLACE_ADDRESS_VALUE generically looking for such an induction variable
    would require keeping track of where the "base" in a
    REPLACE_ADDRESS_VALUE comes from, and at the same time following what
    happens with the address value.
    
    This would seems like a bit more work to implement, and we have observed
    some loops manage to analyze the logical iteration by noticing the
    integer address value iteration itself and modelling that as the
    induction variable.
    
    --- Performance improvements seen.
    Running on a *modified* spec2006 benchmark (modified such that the
    benchmarks compile and run correctly for Morello), we see the following
    performance changes.
    
    Standard Deviations
                                after             before
                   Est. Base Run Time Est. Base Run Time
    Benchmark
    400.perlbench            0.000840           0.001592
    401.bzip2                0.001357           0.000604
    403.gcc                  0.001725           0.003570
    429.mcf                  0.000906           0.000320
    445.gobmk                0.000610           0.000483
    456.hmmer                0.005931           0.001125
    458.sjeng                0.000263           0.000110
    462.libquantum           0.001306           0.000840
    464.h264ref              0.000707           0.000544
    471.omnetpp              0.003110           0.011609
    473.astar                0.001239           0.008656
    483.xalancbmk            0.001580           0.014558
    Means
                                after             before
                   Est. Base Run Time Est. Base Run Time
    Benchmark
    400.perlbench                 1.0           0.998872
    401.bzip2                     1.0           1.015555
    403.gcc                       1.0           0.951290
    429.mcf                       1.0           1.005914
    445.gobmk                     1.0           0.978017
    456.hmmer                     1.0           0.671299
    458.sjeng                     1.0           1.007290
    462.libquantum                1.0           1.026067
    464.h264ref                   1.0           0.952616
    471.omnetpp                   1.0           0.952673
    473.astar                     1.0           1.019703
    483.xalancbmk                 1.0           0.949062
    
    --------------------
    
    after   Est. Base Run Time    1.000000
    before  Est. Base Run Time    0.960696
    dtype: float64
    
    I have not looked into the reason for any of the performance
    regressions.

Diff:
---
 gcc/gimple-ssa-strength-reduction.c                |   6 +-
 gcc/testsuite/gcc.dg/20020219-1.c                  |   4 +
 gcc/testsuite/gcc.dg/unroll-7.c                    |   5 +
 gcc/testsuite/gcc.dg/vect/pr103116-2.c             |   2 +-
 .../morello/strict-align-capabilities-only.c       |   4 +
 .../aarch64/morello/vect-non-slp-half-gap.c        |  23 ++
 .../morello/vect-peel-stack-object-runner.c        |   7 +
 .../aarch64/morello/vect-peel-stack-object.c       |  22 ++
 .../morello/vect-peel-twice-double-access.c        |  26 ++
 .../aarch64/morello/vect-peel-twice-gaps.c         |  17 +
 gcc/tree-affine.c                                  | 206 +++++++++--
 gcc/tree-affine.h                                  |   9 +
 gcc/tree-chrec.c                                   |  17 +-
 gcc/tree-data-ref.c                                |  12 +-
 gcc/tree-predcom.c                                 |   3 +-
 gcc/tree-scalar-evolution.c                        |  41 +--
 gcc/tree-ssa-address.c                             | 199 +++++++++--
 gcc/tree-ssa-address.h                             |  11 +-
 gcc/tree-ssa-loop-ivopts.c                         | 379 ++++++++++++++++-----
 gcc/tree-vect-data-refs.c                          |  10 +-
 gcc/tree-vect-loop-manip.c                         |  10 +-
 gcc/tree-vect-stmts.c                              |  40 ++-
 gcc/tree.def                                       |   3 +-
 23 files changed, 849 insertions(+), 207 deletions(-)

diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c
index 828d23bf8578..87947b2d0d4b 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -2024,8 +2024,12 @@ valid_mem_ref_cand_p (slsr_cand_t c)
     return false;
 
   struct mem_address addr
-    = { NULL_TREE, c->base_expr, TREE_OPERAND (c->stride, 0),
+    = { NULL_TREE, NULL_TREE, NULL_TREE, TREE_OPERAND (c->stride, 0),
 	TREE_OPERAND (c->stride, 1), wide_int_to_tree (sizetype, c->index) };
+  if (tree_is_capability_value (c->base_expr))
+    addr.base_cap = c->base_expr;
+  else
+    addr.base = c->base_expr;
 
   return
     valid_mem_ref_p (TYPE_MODE (c->cand_type), TYPE_ADDR_SPACE (c->cand_type),
diff --git a/gcc/testsuite/gcc.dg/20020219-1.c b/gcc/testsuite/gcc.dg/20020219-1.c
index 055b5b935ea7..ceaf824a7b62 100644
--- a/gcc/testsuite/gcc.dg/20020219-1.c
+++ b/gcc/testsuite/gcc.dg/20020219-1.c
@@ -16,6 +16,10 @@
 /* { dg-skip-if "" { aarch64*-*-* && ilp32 } } */
 /* { dg-skip-if "" { "ia64-*-hpux*" } "*" "-mlp64" } */
 /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } } */
+/* Fails on capability targets because the value is taken far out of bounds
+   before being brought back into bounds.  That means we end up accessing an
+   invalidated capability.  */
+/* { dg-skip-if "capability taken out of bounds and brought back" { cheri_capability_pure } } */
 
 /* Disable the test entirely for 16-bit targets.  */
 #if __INT_MAX__ > 32767
diff --git a/gcc/testsuite/gcc.dg/unroll-7.c b/gcc/testsuite/gcc.dg/unroll-7.c
index 055369bf8b14..6f01f04da511 100644
--- a/gcc/testsuite/gcc.dg/unroll-7.c
+++ b/gcc/testsuite/gcc.dg/unroll-7.c
@@ -10,6 +10,11 @@ int t(void)
   for (i=0;i<1000000;i++)
     a[i]++;
 }
+/* TODO Number of iterations is not calculated correctly because the
+ * calculation bails out once it sees a SUBREG (see iv_number_of_iterations ->
+ * ... suitable_set_for_replacement -> simple_rhs_p).  SUBREG is there when
+ * extracting address from the pointer `a`.  
+ * This is something to fix, but not right now.  */
 /* { dg-final { scan-rtl-dump "Unrolled loop" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "number of iterations: .const_int 999999" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "upper bound: 999999" "loop2_unroll" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr103116-2.c b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
index 2f4ed0f404c7..aa9797a94074 100644
--- a/gcc/testsuite/gcc.dg/vect/pr103116-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr103116-2.c
@@ -31,7 +31,7 @@ loop (TYPE *restrict x, TYPE *restrict y)
     }
 }
 
-TYPE x[COUNT * 4];
+TYPE x[COUNT * 8];
 
 int
 main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/strict-align-capabilities-only.c b/gcc/testsuite/gcc.target/aarch64/morello/strict-align-capabilities-only.c
index 69844443ac8f..f2a5b9e9dd6d 100644
--- a/gcc/testsuite/gcc.target/aarch64/morello/strict-align-capabilities-only.c
+++ b/gcc/testsuite/gcc.target/aarch64/morello/strict-align-capabilities-only.c
@@ -1,6 +1,10 @@
 /* { dg-do run } */
 /* { dg-skip-if "" { *-*-* } { "*" } { "-O" "-O1" "-O2" "-O3" } } */
 /* { dg-additional-options "--save-temps" } */
+
+/* Noinline in order to make sure that the `memcpy` in this function does not
+   get copied into `main` and break our scan-assembler-times.  */
+__attribute__ ((noinline))
 __uintcap_t foo (void *p)
 {
   __uintcap_t n;
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/vect-non-slp-half-gap.c b/gcc/testsuite/gcc.target/aarch64/morello/vect-non-slp-half-gap.c
new file mode 100644
index 000000000000..061933d48341
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/vect-non-slp-half-gap.c
@@ -0,0 +1,23 @@
+/* This is testing we do not decide that an overrun is fine when vectorising a
+   known-alignment vector *in the non-SLP branch of get_group_load_store_type*.  */
+/* { dg-do run } */
+/* { dg-additional-sources "vect-peel-stack-object-runner.c" } */
+/* Is hard to trigger this code path, turn of cost modelling in order to help.  */
+/* { dg-additional-options "-fvect-cost-model=unlimited" } */
+#define MB_BLOCK_SIZE 16
+#define VERT_PRED_16 0
+#define HOR_PRED_16 1
+#define DC_PRED_16 2
+extern unsigned short mprr_2[5][16][16];
+extern void initialise_s(int *);
+
+void foo() {
+  int s[62];
+  int j;
+  initialise_s(&s[0]);
+  for (j=0; j < MB_BLOCK_SIZE; j++)
+    {
+      mprr_2[HOR_PRED_16][j][0]=s[j*4];
+      mprr_2[VERT_PRED_16][j][0]=s[j*4 + 1];
+    }
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object-runner.c b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object-runner.c
new file mode 100644
index 000000000000..8c5872b29341
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object-runner.c
@@ -0,0 +1,7 @@
+/* Not a test in itself, but a `main` file to run the foo
+   function from various other tests.  */
+extern void foo();
+unsigned short mprr_2[5][16][16];
+void initialise_s(int *s) { }
+int get_sval() { return 1; }
+int main() { foo(); return 0; }
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object.c b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object.c
new file mode 100644
index 000000000000..9c4a5c64b551
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-stack-object.c
@@ -0,0 +1,22 @@
+/* When vectorising an object on the stack the vectorizor may determine that
+   the object can have vector-size alignment.  In that case accesses outside of
+   the object are known not to trigger a page fault and hence we're fine
+   (we only read the data and do not use it).  For CHERI we need to worry about
+   such accesses, hence we add that test here.  */
+/* { dg-do run } */
+/* { dg-additional-sources "vect-peel-stack-object-runner.c" } */
+#define MB_BLOCK_SIZE 16
+#define VERT_PRED_16 0
+#define HOR_PRED_16 1
+#define DC_PRED_16 2
+extern unsigned short mprr_2[5][16][16];
+extern void initialise_s(int *);
+
+void foo() {
+  int s[62];
+  int i,j;
+  initialise_s(&s[0]);
+  for (j=0; j < MB_BLOCK_SIZE; j++)
+    for (i=0; i < MB_BLOCK_SIZE; i++)
+      mprr_2[HOR_PRED_16][j][i]=s[j*4];
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-double-access.c b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-double-access.c
new file mode 100644
index 000000000000..767e59910b92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-double-access.c
@@ -0,0 +1,26 @@
+/* Similar to vect-peel-stack-object.c this test only fails on Morello.
+   The over-read only happens within alignment boundaries which means it can't
+   trigger any problem by reading over a page boundary.
+   This is testing a different mechanism to vect-peel-stack-object.c, this is
+   testing reducing a vector load to a half-vector while that is testing
+   peeling for gaps.  */
+/* { dg-do run } */
+/* { dg-additional-sources "vect-peel-stack-object-runner.c" } */
+#define MB_BLOCK_SIZE 16
+#define VERT_PRED_16 0
+#define HOR_PRED_16 1
+#define DC_PRED_16 2
+extern unsigned short mprr_2[5][16][16];
+extern void initialise_s(int *);
+
+void foo() {
+  int s[62];
+  int i,j;
+  initialise_s(&s[0]);
+  for (j=0; j < MB_BLOCK_SIZE; j++)
+    for (i=0; i < MB_BLOCK_SIZE; i++)
+      {
+	mprr_2[VERT_PRED_16 ][j][i]=s[j*4 + 1];
+	mprr_2[HOR_PRED_16 ][j][i]=s[j*4];
+      }
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-gaps.c b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-gaps.c
new file mode 100644
index 000000000000..67abb02d8241
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/vect-peel-twice-gaps.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-additional-sources "vect-peel-stack-object-runner.c" } */
+#define MB_BLOCK_SIZE 16
+#define VERT_PRED_16 0
+#define HOR_PRED_16 1
+#define DC_PRED_16 2
+extern unsigned short mprr_2[5][16][16];
+extern void initialise_s(int *);
+
+void foo() {
+  int s[31];
+  int i,j;
+  initialise_s(&s[0]);
+  for (j=0; j < MB_BLOCK_SIZE; j++)
+    for (i=0; i < MB_BLOCK_SIZE; i++)
+      mprr_2[HOR_PRED_16][j][i]=s[j*2];
+}
diff --git a/gcc/tree-affine.c b/gcc/tree-affine.c
index cab7804f578b..1a1d3bf2a3f9 100644
--- a/gcc/tree-affine.c
+++ b/gcc/tree-affine.c
@@ -79,6 +79,7 @@ aff_combination_elt (aff_tree *comb, tree type, tree elt)
   aff_combination_zero (comb, type);
 
   comb->n = 1;
+  gcc_assert (capability_type_p (type) || !tree_is_capability_value (elt));
   comb->elts[0].val = elt;
   comb->elts[0].coef = 1;
 }
@@ -169,6 +170,41 @@ aff_combination_add_elt (aff_tree *comb, tree elt, const widest_int &scale_in)
 	  }
 	return;
       }
+
+  if (CHECKING_P && !capability_type_p (comb->type))
+    {
+      gcc_assert (!tree_is_capability_value (elt));
+      for (i = 0; i < comb->n; i++)
+	gcc_assert (!tree_is_capability_value (comb->elts[i].val));
+    }
+  else if (CHECKING_P)
+    {
+      for (i = 1; i < comb->n; i++)
+	gcc_assert (!tree_is_capability_value (comb->elts[i].val));
+    }
+
+  /* We really don't want to be having an affine tree with multiple
+     capabilities to derive provenance from.
+     The fact that all capabilities have a single provenance in the
+     original code should mean that we never end up attempting to add
+     two capabilities into a single affine tree.
+     Whenever an affine combination has a capability element we want it stored
+     in the first `elt`.  */
+  if (capability_type_p (comb->type) && tree_is_capability_value (elt))
+    {
+      gcc_assert (!aff_cap_provenance_p (comb));
+      class aff_comb_elt temp = comb->elts[0];
+      comb->elts[0].val = elt;
+      comb->elts[0].coef = scale;
+      if (comb->n == 0)
+	{
+	  comb->n++;
+	  return;
+	}
+      elt = temp.val;
+      scale = temp.coef;
+    }
+
   if (comb->n < MAX_AFF_ELTS)
     {
       comb->elts[comb->n].coef = scale;
@@ -180,6 +216,8 @@ aff_combination_add_elt (aff_tree *comb, tree elt, const widest_int &scale_in)
   type = comb->type;
   if (POINTER_TYPE_P (type))
     type = sizetype;
+  else if (INTCAP_TYPE_P (type))
+    type = noncapability_type (type);
 
   if (scale == 1)
     elt = fold_convert (type, elt);
@@ -217,6 +255,39 @@ aff_combination_add (aff_tree *comb1, aff_tree *comb2)
     aff_combination_add_elt (comb1, comb2->rest, 1);
 }
 
+/* Drops the provenance-providing capability from an affine tree.  */
+void
+aff_combination_drop_capability (aff_tree *comb1)
+{
+  if (!capability_type_p (comb1->type))
+    return;
+  comb1->type = noncapability_type (comb1->type);
+  if (aff_cap_provenance_p (comb1))
+    comb1->elts[0].val = fold_drop_capability (comb1->elts[0].val);
+}
+
+/* Some affine trees have a logical capability provenance, while others do not.
+   This function extracts the capability from which the provided affine tree
+   takes its provenance and returns it.  If there is no capability in this
+   affine tree return NULL_TREE.  */
+tree
+aff_combination_extract_provenance_cap (aff_tree *comb)
+{
+  if (!aff_cap_provenance_p (comb))
+    return NULL_TREE;
+
+  tree ret = comb->elts[0].val;
+  if (POINTER_TYPE_P (comb->type) && INTCAP_TYPE_P (TREE_TYPE (ret)))
+    ret = fold_convert (comb->type, ret);
+
+  if (comb->elts[0].coef == 1)
+    aff_combination_remove_elt (comb, 0);
+  else
+    comb->elts[0].coef -= 1;
+  aff_combination_drop_capability (comb);
+  return ret;
+}
+
 /* Converts affine combination COMB to TYPE.  */
 
 void
@@ -225,12 +296,8 @@ aff_combination_convert (aff_tree *comb, tree type)
   unsigned i, j;
   tree comb_type = comb->type;
 
-  if (capability_type_p (type) && ! capability_type_p (comb_type))
-    {
-      tree val = aff_combination_to_tree (comb);
-      tree ret = fold_build_replace_address_value (build_int_cst (type, 0), val);
-      return tree_to_aff_combination (ret, type, comb);
-    }
+  /* Should not be converting from non-capability type to capability.  */
+  gcc_assert (!capability_type_p (type) || capability_type_p (comb_type));
 
   if  (TYPE_NONCAP_PRECISION (type) > TYPE_NONCAP_PRECISION (comb_type))
     {
@@ -240,7 +307,7 @@ aff_combination_convert (aff_tree *comb, tree type)
     }
 
   comb->type = type;
-  if (comb->rest && !POINTER_TYPE_P (type))
+  if (comb->rest && !POINTER_TYPE_P (type) && !capability_type_p (type))
     comb->rest = fold_convert (type, comb->rest);
 
   if (TYPE_NONCAP_PRECISION (type) == TYPE_NONCAP_PRECISION (comb_type)
@@ -277,14 +344,6 @@ expr_to_aff_combination (aff_tree *comb, tree_code code, tree type,
   aff_tree tmp;
   poly_int64 bitpos, bitsize, bytepos;
 
-  /* MORELLO TODO (OPTIMISED):
-     Currently we bail out of splitting capabilities because there have been a
-     few difficulties attempting to cast from sizetypes to capabilities etc.
-     This is something to implement in the future.  */
-  /* Bail out of splitting capabilities into affine trees.  */
-  if (capability_type_p (type))
-    return false;
-
   switch (code)
     {
     case POINTER_PLUS_EXPR:
@@ -407,11 +466,6 @@ tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
 
   STRIP_NOPS (expr);
 
-  /* MORELLO TODO (OPTIMISED): Would like to implement this in the future.  */
-  /* Bail out of splitting capabilities into affine trees.  */
-  if (capability_type_p (type))
-    return aff_combination_elt (comb, type, expr);
-
   code = TREE_CODE (expr);
   switch (code)
     {
@@ -467,7 +521,11 @@ tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
 	core = build_fold_addr_expr (code, core);
 
       if (ADDR_EXPR_P (core))
-	aff_combination_add_elt (comb, core, 1);
+	{
+	  if (!capability_type_p (type))
+	    core = fold_drop_capability (core);
+	  aff_combination_add_elt (comb, core, 1);
+	}
       else
 	{
 	  tree_to_aff_combination (core, type, &tmp);
@@ -475,13 +533,17 @@ tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
 	}
       if (toffset)
 	{
-	  tree_to_aff_combination (toffset, type, &tmp);
+	  tree_to_aff_combination (toffset, noncapability_type (type), &tmp);
 	  aff_combination_add (comb, &tmp);
 	}
       return;
 
     default:
       {
+	/* Whenever we know where the capability metadata comes from, we want
+	   that element included as the first element in our affine tree.  */
+	if (code == INTEGER_CST && capability_type_p (type))
+	  return aff_combination_elt (comb, type, expr);
 	if (poly_int_tree_p (expr))
 	  {
 	    aff_combination_const (comb, type, wi::to_poly_widest (expr));
@@ -491,6 +553,8 @@ tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
       }
     }
 
+  if (!capability_type_p (type) && tree_is_capability_value (expr))
+    expr = fold_drop_capability (expr);
   aff_combination_elt (comb, type, expr);
 }
 
@@ -549,15 +613,38 @@ aff_combination_to_tree (aff_tree *comb)
   gcc_assert (comb->n == MAX_AFF_ELTS || comb->rest == NULL_TREE);
 
   i = 0;
-  if (POINTER_TYPE_P (type))
+  if (capability_type_p (type) || POINTER_TYPE_P (type))
     {
-      type = sizetype;
+      tree off_type = POINTER_TYPE_P (type) ? sizetype
+	: noncapability_type (type);
       if (comb->n > 0 && comb->elts[0].coef == 1
 	  && POINTER_TYPE_P (TREE_TYPE (comb->elts[0].val)))
 	{
 	  base = comb->elts[0].val;
 	  ++i;
 	}
+      else if (aff_cap_provenance_p (comb))
+	{
+	  gcc_assert (capability_type_p (type));
+	  base = comb->elts[0].val;
+	  if (comb->elts[0].coef != 1)
+	    expr = add_elt_to_tree (expr, off_type,
+				    fold_drop_capability (comb->elts[0].val),
+				    comb->elts[0].coef-1);
+	  ++i;
+	}
+      else if (capability_type_p (type))
+	base = build_zero_cst (type);
+
+      /* Options are:
+	 - comb->elts[0].val is a capability and we want to its provenance.
+	 - comb->elts[0].val is not a capability, affine tree is a capability,
+	   want NULL provenance.
+	 - affine tree is not a capability, is a pointer, comb->elts[0].val is
+	   a pointer, want to use POINTER_PLUS_EXPR.
+	 - affine tree is not a capability, is a pointer type, do not have
+	   single pointer to use as base.  Want to use sizetype instead.  */
+      type = off_type;
     }
 
   for (; i < comb->n; i++)
@@ -581,9 +668,53 @@ aff_combination_to_tree (aff_tree *comb)
   expr = add_elt_to_tree (expr, type, wide_int_to_tree (type, off), sgn);
 
   if (base)
-    return fold_build_pointer_plus (base, expr);
-  else
-    return fold_convert (comb->type, expr);
+    {
+      /* If the base is of pointer type and the combination was of pointer
+	 type, then may as well generate our TREE expression in the same type
+	 as the base (all pointers have the same range of defined behaviour).
+
+	 If the base is not of pointer type then it must be of INTCAP_TYPE.  In
+	 that case stay with the pointer type of the affine tree.  The pointer
+	 type of the affine tree is more limited than the INTCAP type of the
+	 base, but the code that we represent it with is something that the
+	 rest of the compiler can recognise better.  The restrictions on
+	 overflow are known to not be a problem because this type was what the
+	 ivopts machinery and ivopts have found and used (and did not find any
+	 reason to need to convert the pointer typed affine tree to an intcap).
+
+	 If the affine tree type is an INTCAP_TYPE then we can't use a pointer
+	 type since we can't assume the operations on it are valid for
+	 pointers.  */
+      if (POINTER_TYPE_P (TREE_TYPE (base)))
+	return fold_build_pointer_plus (base, expr);
+      else if (POINTER_TYPE_P (comb->type))
+	{
+	  gcc_assert (INTCAP_TYPE_P (TREE_TYPE (base)));
+	  base = fold_convert (comb->type, base);
+	  expr = fold_build_pointer_plus (base, expr);
+	}
+      else if (INTCAP_TYPE_P (TREE_TYPE (base)))
+	{
+	  /* Currently all affine trees happen to be made from induction
+	     variables, and that means they never have side effects.  I believe
+	     there is no reason that affine trees *must* only be made from
+	     variables without side effects.  If that ever becomes the case
+	     then we can not duplicate the evaluation of `base` without using
+	     something like `SAVE_EXPR` to ensure we avoid duplicating the side
+	     effects.  This assert should alert us when such a problematic case
+	     arises.  */
+	  gcc_assert (!TREE_SIDE_EFFECTS (base));
+	  tree noncap_type = noncapability_type (comb->type);
+	  tree base_value = fold_convert (noncap_type, base);
+	  tree res_value = fold_build2 (PLUS_EXPR, noncap_type,
+					base_value, expr);
+	  expr = fold_build_replace_address_value (base, res_value);
+	}
+      else
+	gcc_unreachable ();
+    }
+
+  return fold_convert (comb->type, expr);
 }
 
 /* Copies the tree elements of COMB to ensure that they are not shared.  */
@@ -678,6 +809,7 @@ aff_combination_mult (aff_tree *c1, aff_tree *c2, aff_tree *r)
 {
   unsigned i;
   gcc_assert (TYPE_PRECISION (c1->type) == TYPE_PRECISION (c2->type));
+  gcc_assert (!aff_cap_provenance_p (c1) || !aff_cap_provenance_p (c2));
 
   aff_combination_zero (r, c1->type);
 
@@ -737,13 +869,14 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_UNUSED,
 			hash_map<tree, name_expansion *> **cache)
 {
   unsigned i;
-  aff_tree to_add, current, curre;
+  aff_tree to_add, current, curre, to_subtract;
   tree e;
   gimple *def;
   widest_int scale;
   class name_expansion *exp;
 
   aff_combination_zero (&to_add, comb->type);
+  aff_combination_zero (&to_subtract, comb->type);
   for (i = 0; i < comb->n; i++)
     {
       tree type, name;
@@ -838,19 +971,26 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_UNUSED,
 	  gcc_assert (!exp->in_progress);
 	  current = exp->expansion;
 	}
-      if (!useless_type_conversion_p (comb->type, current.type))
-	aff_combination_convert (&current, comb->type);
+      tree elt_type = (i == 0 && capability_type_p (type))
+	? comb->type : noncapability_type (comb->type);
+      if (!useless_type_conversion_p (elt_type, current.type))
+	aff_combination_convert (&current, elt_type);
 
       /* Accumulate the new terms to TO_ADD, so that we do not modify
-	 COMB while traversing it; include the term -coef * E, to remove
-         it from COMB.  */
+	 COMB while traversing it.  We could include the term -coef * E, to
+	 remove it from COMB, but that would mean that this temporary affine
+	 tree could break capability invariants (that there is a maximum of one
+	 provenance-bearing capability in any affine tree).  Hence we
+	 accumulate these negative coefficients in a separate affine tree
+	 TO_SUBTRACT.  */
       scale = comb->elts[i].coef;
       aff_combination_zero (&curre, comb->type);
       aff_combination_add_elt (&curre, e, -scale);
       aff_combination_scale (&current, scale);
       aff_combination_add (&to_add, &current);
-      aff_combination_add (&to_add, &curre);
+      aff_combination_add (&to_subtract, &curre);
     }
+  aff_combination_add (comb, &to_subtract);
   aff_combination_add (comb, &to_add);
 }
 
diff --git a/gcc/tree-affine.h b/gcc/tree-affine.h
index 4ec4924879df..f7a07b6084d8 100644
--- a/gcc/tree-affine.h
+++ b/gcc/tree-affine.h
@@ -71,6 +71,8 @@ void aff_combination_elt (aff_tree *, tree, tree);
 void aff_combination_scale (aff_tree *, const widest_int &);
 void aff_combination_mult (aff_tree *, aff_tree *, aff_tree *);
 void aff_combination_add (aff_tree *, aff_tree *);
+void aff_combination_drop_capability (aff_tree *);
+tree aff_combination_extract_provenance_cap (aff_tree *);
 void aff_combination_add_elt (aff_tree *, tree, const widest_int &);
 void aff_combination_remove_elt (aff_tree *, unsigned);
 void aff_combination_convert (aff_tree *, tree);
@@ -126,4 +128,11 @@ aff_combination_singleton_var_p (aff_tree *aff)
 	  && known_eq (aff->offset, 0)
 	  && (aff->elts[0].coef == 1 || aff->elts[0].coef == -1));
 }
+
+/* Return true iff AFF has capability metadata in it.  */
+inline bool
+aff_cap_provenance_p (aff_tree *aff)
+{
+  return aff->n != 0 && tree_is_capability_value (aff->elts[0].val);
+}
 #endif /* GCC_TREE_AFFINE_H */
diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c
index 570c19a224cb..0c868c09e493 100644
--- a/gcc/tree-chrec.c
+++ b/gcc/tree-chrec.c
@@ -683,7 +683,14 @@ chrec_replace_initial_condition (tree chrec,
   if (automatically_generated_chrec_p (chrec))
     return chrec;
 
-  gcc_assert (chrec_type (chrec) == chrec_type (init_cond));
+  /* Allow replacing a capability typed chrec into one of just the offset.
+     The only place that this function is used (dr_analyze_indices) wants to do
+     exactly that -- it finds the evolution of a pointer, then separates the
+     pointer and offsets into a base pointer and an evolution of the offset.
+     */
+  gcc_assert (chrec_type (chrec) == chrec_type (init_cond)
+	      || (noncapability_type (chrec_type (chrec))
+		  == chrec_type (init_cond)));
 
   switch (TREE_CODE (chrec))
     {
@@ -1303,7 +1310,9 @@ convert_affine_scev (class loop *loop, tree type,
   bool enforce_overflow_semantics;
   bool must_check_src_overflow, must_check_rslt_overflow;
   tree new_base, new_step;
-  tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
+  tree step_type = POINTER_TYPE_P (type)
+    ? sizetype
+    : noncapability_type (type);
 
   /* In general,
      (TYPE) (BASE + STEP * i) = (TYPE) BASE + (TYPE -- sign extend) STEP * i,
@@ -1320,7 +1329,7 @@ convert_affine_scev (class loop *loop, tree type,
 	become a wrapping variable with values 125, 126, 127, -128, -127, ...,
 	which would confuse optimizers that assume that this does not
 	happen.  */
-  must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);
+  must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_NONCAP_PRECISION (type);
 
   enforce_overflow_semantics = (use_overflow_semantics
 				&& nowrap_type_p (type));
@@ -1344,7 +1353,7 @@ convert_affine_scev (class loop *loop, tree type,
 	    must_check_rslt_overflow = false;
 	}
       else if (TYPE_UNSIGNED (ct) == TYPE_UNSIGNED (type)
-	       && TYPE_PRECISION (ct) == TYPE_PRECISION (type))
+	       && TYPE_PRECISION (ct) == TYPE_NONCAP_PRECISION (type))
 	{
 	  must_check_rslt_overflow = false;
 	  must_check_src_overflow = true;
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 54c260e1761e..b255c71b4402 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1168,7 +1168,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
 	  memoff = wide_int_to_tree (TREE_TYPE (memoff), rem);
 	  /* And finally replace the initial condition.  */
 	  access_fn = chrec_replace_initial_condition
-	      (access_fn, fold_convert (orig_type, off));
+	    (access_fn, fold_convert (noncapability_type (orig_type), off));
 	  /* ???  This is still not a suitable base object for
 	     dr_may_alias_p - the base object needs to be an
 	     access that covers the object as whole.  With
@@ -2096,7 +2096,7 @@ create_waw_or_war_checks (tree *cond_expr,
 
   /* Make sure that we can operate on sizetype without loss of precision.  */
   tree addr_type = TREE_TYPE (DR_BASE_ADDRESS (dr_a.dr));
-  if (TYPE_PRECISION (addr_type) != TYPE_PRECISION (sizetype))
+  if (TYPE_NONCAP_PRECISION (addr_type) != TYPE_PRECISION (sizetype))
     return false;
 
   /* All addresses involved are known to have a common alignment ALIGN.
@@ -2374,8 +2374,12 @@ create_intersect_range_checks (class loop *loop, tree *cond_expr,
 
   *cond_expr
     = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
-	fold_build2 (cmp_code, boolean_type_node, seg_a_max, seg_b_min),
-	fold_build2 (cmp_code, boolean_type_node, seg_b_max, seg_a_min));
+	fold_build2 (cmp_code, boolean_type_node,
+		     fold_drop_capability (seg_a_max),
+		     fold_drop_capability (seg_b_min)),
+	fold_build2 (cmp_code, boolean_type_node,
+		     fold_drop_capability (seg_b_max),
+		     fold_drop_capability (seg_a_min)));
   if (dump_enabled_p ())
     dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index d2dcfe7f42d3..f04fc7baf5f1 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1627,7 +1627,8 @@ ref_at_iteration (data_reference_p dr, int iter,
   tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
   addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
 				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
+  tree alias_ptr = fold_convert_for_mem_ref (reference_alias_ptr_type (ref),
+					     coff);
   tree type = build_aligned_type (TREE_TYPE (ref),
 				  get_object_alignment (ref));
   ref = build2 (MEM_REF, type, addr, alias_ptr);
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index d3ddb3ac78c6..004e38eb5abe 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1234,15 +1234,6 @@ tail_recurse:
     {
     CASE_CONVERT:
       {
-	/* MORELLO TODO (OPTIMISATION)
-	   If we were casting from a capability, then stop here.
-	   Trying to avoid capabilities since there seems to be a lot of work
-	   to get this analysis working for them.  */
-	if (capability_type_p (type) || tree_is_capability_value (rhs0))
-	  {
-	    *evolution_of_loop = chrec_dont_know;
-	    return t_dont_know;
-	  }
 	/* This assignment is under the form "a_1 = (cast) rhs.  */
 	t_bool res = follow_ssa_edge_expr (loop, at_stmt, rhs0, halting_phi,
 					   evolution_of_loop, limit);
@@ -1344,15 +1335,15 @@ simplify_peeled_chrec (class loop *loop, tree arg, tree init_cond)
   if (!POINTER_TYPE_P (type)
       && !INTEGRAL_TYPE_P (type))
     return chrec_dont_know;
-  /* MORELLO TODO (OPTIMISATION): Would like to re-enable later if we can.  */
-  if (capability_type_p (type))
-    return chrec_dont_know;
 
   /* Try harder to check if they are equal.  */
   tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map);
   tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map);
   free_affine_expand_cache (&peeled_chrec_map);
+
   aff_combination_scale (&aff2, -1);
+  aff_combination_drop_capability (&aff1);
+  aff_combination_drop_capability (&aff2);
   aff_combination_add (&aff1, &aff2);
 
   /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP
@@ -1566,9 +1557,6 @@ interpret_loop_phi (class loop *loop, gphi *loop_phi_node)
   tree init_cond;
 
   gcc_assert (phi_loop == loop);
-  /* MORELLO TODO (OPTIMISATION): Just disabling for now.  */
-  if (tree_is_capability_value (gimple_phi_result (loop_phi_node)))
-    return chrec_dont_know;
 
   /* Otherwise really interpret the loop phi.  */
   init_cond = analyze_initial_condition (loop_phi_node);
@@ -1639,10 +1627,6 @@ interpret_rhs_expr (class loop *loop, gimple *at_stmt,
   tree res, chrec1, chrec2, ctype;
   gimple *def;
 
-  /* MORELLO TODO (OPTIMISATION): Just disabling for now.  */
-  if (capability_type_p (type))
-    return chrec_dont_know;
-
   if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
     {
       if (is_gimple_min_invariant (rhs1))
@@ -3063,7 +3047,7 @@ iv_can_overflow_p (class loop *loop, tree type, tree base, tree step)
     return false;
 
   if (TREE_CODE (base) == INTEGER_CST)
-    base_min = base_max = wi::to_wide (base);
+    base_min = base_max = wi::to_wide (fold_drop_capability (base));
   else if (TREE_CODE (base) == SSA_NAME
 	   && INTEGRAL_TYPE_P (TREE_TYPE (base))
 	   && get_range_info (base, &base_min, &base_max) == VR_RANGE)
@@ -3083,8 +3067,8 @@ iv_can_overflow_p (class loop *loop, tree type, tree base, tree step)
   if (!get_max_loop_iterations (loop, &nit))
     return true;
 
-  type_min = wi::min_value (type);
-  type_max = wi::max_value (type);
+  type_min = wi::min_value (noncapability_type (type));
+  type_max = wi::max_value (noncapability_type (type));
 
   /* Just sanity check that we don't see values out of the range of the type.
      In this case the arithmetics bellow would overflow.  */
@@ -3099,9 +3083,9 @@ iv_can_overflow_p (class loop *loop, tree type, tree base, tree step)
 
   /* NIT is typeless and can exceed the precision of the type.  In this case
      overflow is always possible, because we know STEP is non-zero.  */
-  if (wi::min_precision (nit, UNSIGNED) > TYPE_PRECISION (type))
+  if (wi::min_precision (nit, UNSIGNED) > TYPE_NONCAP_PRECISION (type))
     return true;
-  wide_int nit2 = wide_int::from (nit, TYPE_PRECISION (type), UNSIGNED);
+  wide_int nit2 = wide_int::from (nit, TYPE_NONCAP_PRECISION (type), UNSIGNED);
 
   /* If step can be positive, check that nit*step <= type_max-base.
      This can be done by unsigned arithmetic and we only need to watch overflow
@@ -3234,13 +3218,6 @@ simple_iv_with_niters (class loop *wrto_loop, class loop *use_loop,
       && !INTEGRAL_TYPE_P (type))
     return false;
 
-  /* MORELLO TODO (OPTIMISATION)
-       Currently returning false whenever the base type is a capability.
-       This avoids the entire set of induction variable optimisations for
-       capabilities.  Eventually it would be nice to implement them.  */
-  if (capability_type_p (type))
-    return false;
-
   ev = analyze_scalar_evolution_in_loop (wrto_loop, use_loop, op,
 					 &folded_casts);
   if (chrec_contains_undetermined (ev)
@@ -3250,7 +3227,7 @@ simple_iv_with_niters (class loop *wrto_loop, class loop *use_loop,
   if (tree_does_not_contain_chrecs (ev))
     {
       iv->base = ev;
-      iv->step = build_int_cst (TREE_TYPE (ev), 0);
+      iv->step = build_int_cst (noncapability_type (TREE_TYPE (ev)), 0);
       iv->no_overflow = true;
       return true;
     }
diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index 1194370fa132..5e43c5b7742c 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -93,24 +93,26 @@ struct GTY (()) mem_addr_template {
 
 static GTY(()) vec<mem_addr_template, va_gc> *mem_addr_template_list;
 
-#define TEMPL_IDX(AS, SYMBOL, BASE, INDEX, STEP, OFFSET) \
-  (((int) (AS) << 5) \
-   | ((SYMBOL != 0) << 4) \
+#define TEMPL_IDX(AS, SYMBOL, BASE_CAP, BASE, INDEX, STEP, OFFSET) \
+  (((int) (AS) << 6) \
+   | ((SYMBOL != 0) << 5) \
+   | ((BASE_CAP != 0) << 4) \
    | ((BASE != 0) << 3) \
    | ((INDEX != 0) << 2) \
    | ((STEP != 0) << 1) \
    | (OFFSET != 0))
 
-/* Stores address for memory reference with parameters SYMBOL, BASE, INDEX,
-   STEP and OFFSET to *ADDR using address mode ADDRESS_MODE.  Stores pointers
-   to where step is placed to *STEP_P and offset to *OFFSET_P.  */
+/* Stores address for memory reference with parameters SYMBOL, BASE_CAP, BASE,
+   INDEX, STEP and OFFSET to *ADDR using address mode ADDRESS_MODE.  Stores
+   pointers to where step is placed to *STEP_P and offset to *OFFSET_P.  */
 
 static void
 gen_addr_rtx (scalar_addr_mode address_mode,
-	      rtx symbol, rtx base, rtx index, rtx step, rtx offset,
-	      rtx *addr, rtx **step_p, rtx **offset_p)
+	      rtx symbol, rtx base_cap, rtx base, rtx index, rtx step,
+	      rtx offset, rtx *addr, rtx **step_p, rtx **offset_p)
 {
   rtx act_elem;
+  gcc_assert (!symbol || !base_cap);
 
   *addr = NULL_RTX;
   if (step_p)
@@ -136,7 +138,7 @@ gen_addr_rtx (scalar_addr_mode address_mode,
   if (base && base != const0_rtx)
     {
       if (*addr)
-	*addr = gen_pointer_plus (address_mode, base, *addr);
+	*addr = gen_pointer_plus (offset_mode (address_mode), base, *addr);
       else
 	*addr = base;
     }
@@ -158,7 +160,23 @@ gen_addr_rtx (scalar_addr_mode address_mode,
 	}
 
       if (*addr)
-	*addr = gen_raw_pointer_plus (address_mode, *addr, act_elem);
+	*addr = gen_raw_pointer_plus (address_mode, act_elem, *addr);
+      else
+	*addr = act_elem;
+    }
+  else if (base_cap)
+    {
+      act_elem = base_cap;
+      if (offset)
+	{
+	  act_elem = gen_raw_pointer_plus (address_mode, act_elem, offset);
+
+	  if (offset_p)
+	    *offset_p = &XEXP (act_elem, 1);
+	}
+
+      if (*addr)
+	*addr = gen_raw_pointer_plus (address_mode, act_elem, *addr);
       else
 	*addr = act_elem;
     }
@@ -195,7 +213,8 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
   bool is_cap = addr->is_capability ();
   scalar_addr_mode address_mode = targetm.addr_space.address_mode (as, is_cap);
   scalar_addr_mode pointer_mode = targetm.addr_space.pointer_mode (as, is_cap);
-  rtx address, sym, bse, idx, st, off;
+  scalar_addr_mode poff_mode = offset_mode (pointer_mode);
+  rtx address, sym, bcp, bse, idx, st, off;
   struct mem_addr_template *templ;
 
   if (addr->step && !integer_onep (addr->step))
@@ -215,7 +234,8 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
   if (!really_expand)
     {
       unsigned int templ_index
-	= TEMPL_IDX (as, addr->symbol, addr->base, addr->index, st, off);
+	= TEMPL_IDX (as, addr->symbol, addr->base_cap, addr->base, addr->index,
+		     st, off);
 
       if (templ_index >= vec_safe_length (mem_addr_template_list))
 	vec_safe_grow_cleared (mem_addr_template_list, templ_index + 1);
@@ -227,15 +247,18 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
 	  sym = (addr->symbol ?
 		 gen_rtx_SYMBOL_REF (pointer_mode, ggc_strdup ("test_symbol"))
 		 : NULL_RTX);
-	  bse = (addr->base ?
+	  bcp = (addr->base_cap ?
 		 gen_raw_REG (pointer_mode, LAST_VIRTUAL_REGISTER + 1)
 		 : NULL_RTX);
+	  bse = (addr->base ?
+		 gen_raw_REG (poff_mode, LAST_VIRTUAL_REGISTER + 2)
+		 : NULL_RTX);
 	  idx = (addr->index ?
 		 gen_raw_REG (offset_mode (pointer_mode),
-			      LAST_VIRTUAL_REGISTER + 2)
+			      LAST_VIRTUAL_REGISTER + 3)
 		 : NULL_RTX);
 
-	  gen_addr_rtx (pointer_mode, sym, bse, idx,
+	  gen_addr_rtx (pointer_mode, sym, bcp, bse, idx,
 			st? const0_rtx : NULL_RTX,
 			off? const0_rtx : NULL_RTX,
 			&templ->ref,
@@ -255,8 +278,11 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
   sym = (addr->symbol
 	 ? expand_expr (addr->symbol, NULL_RTX, pointer_mode, EXPAND_NORMAL)
 	 : NULL_RTX);
+  bcp = (addr->base_cap
+	 ? expand_expr (addr->base_cap, NULL_RTX, pointer_mode, EXPAND_NORMAL)
+	 : NULL_RTX);
   bse = (addr->base
-	 ? expand_expr (addr->base, NULL_RTX, pointer_mode, EXPAND_NORMAL)
+	 ? expand_expr (addr->base, NULL_RTX, poff_mode, EXPAND_NORMAL)
 	 : NULL_RTX);
   idx = (addr->index
 	 ? expand_expr (addr->index, NULL_RTX,
@@ -271,13 +297,14 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
   if (bse && GET_CODE (bse) == CONST_INT)
     {
       if (off)
-	off = simplify_gen_binary (PLUS, pointer_mode, bse, off);
+	off = simplify_gen_binary (PLUS, poff_mode, bse, off);
       else
 	off = bse;
       gcc_assert (GET_CODE (off) == CONST_INT);
       bse = NULL_RTX;
     }
-  gen_addr_rtx (pointer_mode, sym, bse, idx, st, off, &address, NULL, NULL);
+  gen_addr_rtx (pointer_mode, sym, bcp, bse, idx,
+		st, off, &address, NULL, NULL);
   if (pointer_mode != address_mode)
     address = convert_memory_address (address_mode, address);
   return address;
@@ -351,6 +378,13 @@ valid_mem_ref_p (machine_mode mode, addr_space_t as,
 {
   rtx address;
 
+  if (CAPABILITY_MODE_P (Pmode))
+    {
+      tree x = addr->symbol ? addr->symbol : addr->base_cap;
+      gcc_assert (x);
+      gcc_assert (!addr->symbol || !addr->base_cap);
+      gcc_assert (tree_is_capability_value (x));
+    }
   address = addr_for_mem_ref (addr, as, false);
   if (!address)
     return false;
@@ -386,6 +420,11 @@ create_mem_ref_raw (tree type, tree alias_ptr_type, struct mem_address *addr,
       base = addr->symbol;
       index2 = addr->base;
     }
+  else if (addr->base_cap)
+    {
+      base = addr->base_cap;
+      index2 = addr->base;
+    }
   else if (addr->base
 	   && POINTER_TYPE_P (TREE_TYPE (addr->base)))
     {
@@ -553,6 +592,7 @@ add_to_parts (struct mem_address *parts, tree elt)
       return;
     }
 
+  gcc_assert (!tree_is_capability_value (elt));
   /* Add ELT to base.  */
   type = TREE_TYPE (parts->base);
   if (POINTER_TYPE_P (type))
@@ -732,6 +772,7 @@ addr_to_parts (tree type, aff_tree *addr, tree iv_cand, tree base_hint,
   unsigned i;
 
   parts->symbol = NULL_TREE;
+  parts->base_cap = NULL_TREE;
   parts->base = NULL_TREE;
   parts->index = NULL_TREE;
   parts->step = NULL_TREE;
@@ -744,21 +785,43 @@ addr_to_parts (tree type, aff_tree *addr, tree iv_cand, tree base_hint,
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
+  *var_in_base = (base_hint != NULL && parts->symbol == NULL);
+  /* If we have capability metadata, this is always the base_cap.  */
+  if (aff_cap_provenance_p (addr))
+    {
+      /* If there is an `ADDR_EXPR` in the affine tree that would be a
+	 capability, it would be removed in
+	 `move_fixed_address_to_symbol`.  The affine tree capability invariant
+	 means there could have only been one capability element => there
+	 should not be any capability metadata left on the affine tree.  */
+      gcc_assert (!parts->symbol);
+      parts->base_cap = aff_combination_extract_provenance_cap (addr);
+      *var_in_base = false;
+    }
   /* Since at the moment there is no reliable way to know how to
      distinguish between pointer and its offset, we decide if var
      part is the pointer based on guess.  */
-  *var_in_base = (base_hint != NULL && parts->symbol == NULL);
-  if (*var_in_base)
+  else if (*var_in_base)
     *var_in_base = move_hint_to_base (type, parts, base_hint, addr);
   else
     move_variant_to_index (parts, addr, iv_cand);
 
+  /* Capability affine trees have more invariants than non-capability affine
+     trees.  With these extra invariants, we can guarantee that the base must
+     have already been found.  Hence we can drop the capability type on the
+     affine tree, so that other functions do not need to handle that
+     possibility.  */
+  gcc_assert (!capability_type_p (addr->type)
+	      || (!aff_cap_provenance_p (addr)
+		  && (parts->symbol || parts->base_cap)));
+  aff_combination_drop_capability (addr);
+
   /* First move the most expensive feasible multiplication to index.  */
   if (!parts->index)
     most_expensive_mult_to_index (type, parts, addr, speed);
 
   /* Move pointer into base.  */
-  if (!parts->symbol && !parts->base)
+  if (!parts->symbol && !parts->base && !parts->base_cap)
     move_pointer_to_base (parts, addr);
 
   /* Then try to process the remaining elements.  */
@@ -783,6 +846,11 @@ gimplify_mem_ref_parts (gimple_stmt_iterator *gsi, struct mem_address *parts)
     parts->base = force_gimple_operand_gsi_1 (gsi, parts->base,
 					    is_gimple_mem_ref_addr, NULL_TREE,
 					    true, GSI_SAME_STMT);
+  if (parts->base_cap)
+    parts->base_cap = force_gimple_operand_gsi_1 (gsi, parts->base_cap,
+						  is_gimple_mem_ref_addr,
+						  NULL_TREE, true,
+						  GSI_SAME_STMT);
   if (parts->index)
     parts->index = force_gimple_operand_gsi (gsi, parts->index,
 					     true, NULL_TREE,
@@ -818,6 +886,26 @@ add_offset_to_base (gimple_stmt_iterator *gsi, mem_address *parts)
   parts->offset = NULL_TREE;
 }
 
+/* Fold PARTS->base into PARTS->base_cap, so that there is no longer
+   a separate capability and offset.  Emit any new instructions before GSI.
+   We know this is safe since the eventual value that is created must be safe
+   to access, and this is used last after all offsets have been combined into
+   PARTS->base.  */
+
+static void
+add_base_to_base_cap (gimple_stmt_iterator *gsi, mem_address *parts)
+{
+  tree tmp = parts->base;
+  if (parts->base_cap)
+    {
+      tmp = fold_build_pointer_plus (parts->base_cap, tmp);
+      tmp = force_gimple_operand_gsi_1 (gsi, tmp, is_gimple_mem_ref_addr,
+					NULL_TREE, true, GSI_SAME_STMT);
+    }
+  parts->base_cap = tmp;
+  parts->base = NULL_TREE;
+}
+
 /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
    computations are emitted in front of GSI.  TYPE is the mode
    of created memory reference. IV_CAND is the selected iv candidate in ADDR,
@@ -843,15 +931,24 @@ create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
   /* Merge symbol into other parts.  */
   if (parts.symbol)
     {
+      gcc_assert (!parts.base_cap);
       tmp = parts.symbol;
       parts.symbol = NULL_TREE;
       gcc_assert (is_gimple_val (tmp));
 
-      if (parts.base)
+      if (tree_is_capability_value (tmp))
+	{
+	  /* Move symbol into base_cap, forcing it to a register.  */
+	  tmp = force_gimple_operand_gsi_1 (gsi, tmp,
+					    is_gimple_mem_ref_addr,
+					    NULL_TREE, true,
+					    GSI_SAME_STMT);
+	  parts.base_cap = tmp;
+	}
+      else if (parts.base)
 	{
 	  gcc_assert (useless_type_conversion_p (sizetype,
 						 TREE_TYPE (parts.base)));
-
 	  if (parts.index)
 	    {
 	      /* Add the symbol to base, eventually forcing it to register.  */
@@ -992,11 +1089,23 @@ create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
 	return mem_ref;
     }
 
+  /* Transform [base_cap + base] into:
+       base_cap' = base_cap + base
+       [base_cap'].  */
+  if (parts.base_cap && parts.base)
+    {
+      add_base_to_base_cap (gsi, &parts);
+      mem_ref = create_mem_ref_raw (type, alias_ptr_type, &parts, true);
+      if (mem_ref)
+	return mem_ref;
+    }
+
   /* Verify that the address is in the simplest possible shape
      (only a register).  If we cannot create such a memory reference,
      something is really wrong.  */
   gcc_assert (parts.symbol == NULL_TREE);
   gcc_assert (parts.index == NULL_TREE);
+  gcc_assert (!parts.base_cap || !parts.base);
   gcc_assert (!parts.step || integer_onep (parts.step));
   gcc_assert (!parts.offset || integer_zerop (parts.offset));
   gcc_unreachable ();
@@ -1007,6 +1116,9 @@ create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
 void
 get_address_description (tree op, struct mem_address *addr)
 {
+  /* One of the below will be set to something, the other left as NULL.  */
+  addr->base = NULL_TREE;
+  addr->base_cap = NULL_TREE;
   if (ADDR_EXPR_P (TMR_BASE (op)))
     {
       addr->symbol = TMR_BASE (op);
@@ -1015,7 +1127,17 @@ get_address_description (tree op, struct mem_address *addr)
   else
     {
       addr->symbol = NULL_TREE;
-      if (TMR_INDEX2 (op))
+      if (tree_is_capability_value (TMR_BASE (op)))
+	{
+	  /* This is the only time that we can have TMR_BASE as something which
+	     is not an ADDR_EXPR_P of a global or static but we also have
+	     TMR_INDEX2 set.  N.b. this was an explicit invariant before
+	     capabilities, it is documented in tree.def under TARGET_MEM_REF.
+	     */
+	  addr->base_cap = TMR_BASE (op);
+	  addr->base = TMR_INDEX2 (op);
+	}
+      else if (TMR_INDEX2 (op))
 	{
 	  gcc_assert (integer_zerop (TMR_BASE (op)));
 	  addr->base = TMR_INDEX2 (op);
@@ -1098,7 +1220,7 @@ copy_ref_info (tree new_ref, tree old_ref)
    all arguments are cast to an integer type where needed (e.g. when needed
    because the pointer type is a capability and we want to apply integer
    arithmetic to it).  */
-tree
+static tree
 fold_cap_binary_to_constant (enum tree_code code, tree type, tree op0,
 			     tree op1)
 {
@@ -1106,7 +1228,9 @@ fold_cap_binary_to_constant (enum tree_code code, tree type, tree op0,
 			  fold_drop_capability (op0),
 			  fold_drop_capability (op1));
   if (tem)
-    return fold_convert (type, tem);
+    return capability_type_p (type)
+      ? fold_convert_for_mem_ref (type, tem)
+      : tem;
   return NULL_TREE;
 }
 
@@ -1150,8 +1274,11 @@ maybe_fold_tmr (tree ref)
       addr.symbol = build_fold_addr_expr
 		      (get_addr_base_and_unit_offset
 		         (TREE_OPERAND (addr.symbol, 0), &offset));
+      tree off_type = TREE_TYPE (addr.offset);
       addr.offset = int_const_binop (PLUS_EXPR,
-				     addr.offset, size_int (offset));
+				     fold_drop_capability (addr.offset),
+				     size_int (offset));
+      addr.offset = fold_convert_for_mem_ref (off_type, addr.offset);
       changed = true;
     }
 
@@ -1203,7 +1330,10 @@ preferred_mem_scale_factor (tree base, machine_mode mem_mode,
 
   /* Addressing mode "base + index".  */
   parts.index = integer_one_node;
-  parts.base = integer_one_node;
+  if (CAPABILITY_MODE_P (ptr_mode))
+    parts.base_cap = build_int_cst (ptr_type_node, 1);
+  else
+    parts.base = integer_one_node;
   rtx addr = addr_for_mem_ref (&parts, as, false);
   unsigned cost = address_cost (addr, mem_mode, as, speed);
 
@@ -1231,6 +1361,12 @@ dump_mem_address (FILE *file, struct mem_address *parts)
       print_generic_expr (file, TREE_OPERAND (parts->symbol, 0), TDF_SLIM);
       fprintf (file, "\n");
     }
+  if (parts->base_cap)
+    {
+      fprintf (file, "base_cap: ");
+      print_generic_expr (file, parts->base_cap, TDF_SLIM);
+      fprintf (file, "\n");
+    }
   if (parts->base)
     {
       fprintf (file, "base: ");
@@ -1265,10 +1401,11 @@ mem_address::is_capability () const
   if (symbol)
     return tree_is_capability_value (symbol);
 
-  if (base && POINTER_TYPE_P (TREE_TYPE (base)))
-    return tree_is_capability_value (base);
+  if (base_cap)
+    return tree_is_capability_value (base_cap);
 
-  return CAPABILITY_MODE_P (ptr_mode);
+  gcc_assert (!CAPABILITY_MODE_P (ptr_mode));
+  return false;
 }
 
 #include "gt-tree-ssa-address.h"
diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
index a488e1c1f91a..807f754d6e8f 100644
--- a/gcc/tree-ssa-address.h
+++ b/gcc/tree-ssa-address.h
@@ -24,7 +24,16 @@ along with GCC; see the file COPYING3.  If not see
 
 struct mem_address
 {
-  tree symbol, base, index, step, offset;
+  /* When the target architecture is a capability one, one of the following is
+     true:
+       - `!base_cap && tree_is_capability_value (symbol)`
+       - `!symbol && tree_is_capability_value (base_cap)`
+     In all cases, `base`, `index`, `step`, and `offset` are *not* capabilities.
+
+     When the target architecture is not a capability architecture, then
+     base_cap is always NULL_TREE.
+   */
+  tree symbol, base_cap, base, index, step, offset;
 
   bool is_capability () const;
 };
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 04e1cf7dece9..a1017092f819 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2333,7 +2333,12 @@ find_interesting_uses_address (struct ivopts_data *data, gimple *stmt,
 	    goto fail;
 
 	  TMR_INDEX2 (base) = civ->base;
-	  step = civ->step;
+	  /* Can only have non-NULL INDEX2 and a TMR_BASE of an SSA_NAME if the
+	     TMR_BASE is a capability.  */
+	  if (tree_is_capability_value (TMR_BASE (base)))
+	    step = fold_build2 (PLUS_EXPR, type, civ->step, step);
+	  else
+	    step = civ->step;
 	}
       if (TMR_INDEX (base)
 	  && TREE_CODE (TMR_INDEX (base)) == SSA_NAME)
@@ -2918,8 +2923,22 @@ strip_offset_1 (tree expr, bool inside_addr, bool top_compref,
       break;
 
     default:
-      if (ptrdiff_tree_p (expr, offset) && maybe_ne (*offset, 0))
-	return build_int_cst (orig_type, 0);
+      if (ptrdiff_tree_p (fold_drop_capability (expr), offset)
+	  && maybe_ne (*offset, 0))
+	{
+	  /* Ensure we maintain metadata information if this was a capability.
+	     There is no way that a constant capability could be *valid*, so
+	     we don't need to worry about taking our capability out of bounds
+	     and losing validity.  */
+	  if (capability_type_p (TREE_TYPE (expr)))
+	    {
+	      tree null = build_zero_cst (noncapability_type (orig_type));
+	      tree cap_meta = fold_build_replace_address_value (expr, null);
+	      return fold_convert (orig_type, cap_meta);
+	    }
+	  else
+	    return build_int_cst (orig_type, 0);
+	}
       return orig_expr;
     }
 
@@ -3133,8 +3152,8 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important,
 
       if (operand_equal_p (base, cand->iv->base, 0)
 	  && operand_equal_p (step, cand->iv->step, 0)
-	  && (TYPE_PRECISION (TREE_TYPE (base))
-	      == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
+	  && (TYPE_NONCAP_PRECISION (TREE_TYPE (base))
+	      == TYPE_NONCAP_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3948,7 +3967,8 @@ determine_common_wider_type (tree *a, tree *b)
       suba = TREE_OPERAND (*a, 0);
       wider_type = TREE_TYPE (suba);
       if (! fold_convertible_p (wider_type, *a)
-	  || TYPE_PRECISION (wider_type) < TYPE_PRECISION (atype))
+	  || TYPE_NONCAP_PRECISION (wider_type) < TYPE_NONCAP_PRECISION (atype)
+	  || (INTCAP_TYPE_P (wider_type) && !INTCAP_TYPE_P (atype)))
 	return atype;
     }
   else
@@ -3961,7 +3981,9 @@ determine_common_wider_type (tree *a, tree *b)
 	  || ! fold_convertible_p (wider_type, *b)
 	  || capability_type_p (TREE_TYPE (subb))
 		!= capability_type_p (wider_type)
-	  || TYPE_PRECISION (wider_type) != TYPE_PRECISION (TREE_TYPE (subb)))
+	  || TYPE_NONCAP_PRECISION (wider_type)
+		!= TYPE_NONCAP_PRECISION (TREE_TYPE (subb))
+	  || (INTCAP_TYPE_P (wider_type) && !INTCAP_TYPE_P (TREE_TYPE (subb))))
 	return atype;
     }
   else
@@ -3981,7 +4003,8 @@ determine_common_wider_type (tree *a, tree *b)
 static bool
 get_computation_aff_1 (class loop *loop, gimple *at, struct iv_use *use,
 		       struct iv_cand *cand, class aff_tree *aff_inv,
-		       class aff_tree *aff_var, widest_int *prat = NULL)
+		       class aff_tree *aff_var, widest_int *prat = NULL,
+		       comp_cost *cap_complication = NULL)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase = cand->iv->base, cstep = cand->iv->step;
@@ -3990,24 +4013,19 @@ get_computation_aff_1 (class loop *loop, gimple *at, struct iv_use *use,
   aff_tree aff_cbase;
   widest_int rat;
 
-  /* We must have a precision to express the values of use (and the conversion
-     must be valid including other things like capabilities).
-     MORELLO TODO (OPTIMISATION) There is a comment in rewrite_use_address
-     below about using unsigned integer types to avoid overflow problems.
-     This means that if an induction variable has a base of a pointer all
-     candidates that we will get will use unsigned int values, which will never
-     match.
-     Handling types other than unsigned integer (maybe using pointers all the
-     time?) would avoid the problem and enable these optimisations for
-     pointers.    */
-  if (! fold_convertible_p (utype, cbase))
+  /* We must have a precision to express the values of use.  */
+  if (TYPE_NONCAP_PRECISION (utype) > TYPE_NONCAP_PRECISION (ctype))
     return false;
 
   var = var_at_stmt (loop, cand, at);
-  uutype = unsigned_type_for (utype);
+  if (capability_type_p (utype))
+    uutype = uintcap_type_node;
+  else
+    uutype = unsigned_type_for (utype);
+
 
   /* If the conversion is not noop, perform it.  */
-  if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
+  if (TYPE_NONCAP_PRECISION (utype) < TYPE_NONCAP_PRECISION (ctype))
     {
       if (cand->orig_iv != NULL && CONVERT_EXPR_P (cbase)
 	  && (CONVERT_EXPR_P (cstep) || poly_int_tree_p (cstep)))
@@ -4025,15 +4043,19 @@ get_computation_aff_1 (class loop *loop, gimple *at, struct iv_use *use,
 	     In this case, it's safe to skip the convertion in candidate.
 	     As an example, (unsigned short)((unsigned long)A) equals to
 	     (unsigned short)A, if A has a type no larger than short.  */
-	  if (TYPE_PRECISION (inner_type) <= TYPE_PRECISION (uutype))
+	  if (TYPE_NONCAP_PRECISION (inner_type)
+	      <= TYPE_NONCAP_PRECISION (uutype))
 	    {
 	      cbase = inner_base;
 	      cstep = inner_step;
 	    }
 	}
-      cbase = fold_convert (uutype, cbase);
-      cstep = fold_convert (uutype, cstep);
-      var = fold_convert (uutype, var);
+      tree newstep_type = capability_type_p (ctype)
+	? uutype
+	: noncapability_type (uutype);
+      cbase = fold_convert (newstep_type, cbase);
+      cstep = fold_convert (newstep_type, cstep);
+      var = fold_convert (newstep_type, var);
     }
 
   /* Ratio is 1 when computing the value of biv cand by itself.
@@ -4060,32 +4082,130 @@ get_computation_aff_1 (class loop *loop, gimple *at, struct iv_use *use,
      overflows, as all the arithmetics will in the end be performed in UUTYPE
      anyway.  */
   common_type = determine_common_wider_type (&ubase, &cbase);
+  bool cbase_offset = capability_type_p (utype)
+	  && !tree_is_capability_value (cbase);
 
   /* use = ubase - ratio * cbase + ratio * var.  */
   tree_to_aff_combination (ubase, common_type, aff_inv);
-  tree_to_aff_combination (cbase, common_type, &aff_cbase);
-  tree_to_aff_combination (var, uutype, aff_var);
+  tree_to_aff_combination (cbase, cbase_offset
+			   ? noncapability_type (common_type)
+			   : common_type, &aff_cbase);
+  tree_to_aff_combination (var, cbase_offset
+		  ? noncapability_type (uutype)
+		  : uutype, aff_var);
 
   /* We need to shift the value if we are after the increment.  */
   if (stmt_after_increment (loop, cand, at))
     {
       aff_tree cstep_aff;
+      tree common_step_type = noncapability_type (common_type);
 
       if (common_type != uutype)
-	cstep_common = fold_convert (common_type, cstep);
+	cstep_common = fold_convert (common_step_type, cstep);
       else
 	cstep_common = cstep;
 
-      tree_to_aff_combination (cstep_common, common_type, &cstep_aff);
+      tree_to_aff_combination (cstep_common, common_step_type, &cstep_aff);
       aff_combination_add (&aff_cbase, &cstep_aff);
     }
 
+  /* Check for provenance of our capabilities.  */
+  if (aff_cap_provenance_p (aff_inv) && aff_cap_provenance_p (&aff_cbase))
+    {
+      widest_int tmp = aff_inv->elts[0].coef - (rat*aff_cbase.elts[0].coef);
+      bool ops_equal
+	= operand_equal_p (aff_inv->elts[0].val, aff_cbase.elts[0].val, 0);
+
+      if (tmp == 0 && ops_equal)
+	{
+	  /* Metadata in bases cancels out, aff_var has the metadata.
+
+	     The assertions below ensure that assumptions made in
+	     rewrite_use_nonlinear_expr about when metadata needs to be kept
+	     "to the side" in a calculation are valid.
+
+	     Assertions must be true since:
+	       1) aff_cbase has a capability element, and aff_var is the value
+		  of aff_cbase + step at the point that `use` is used.
+	       2) aff_var is the affine expansion of a variable in source code,
+		  hence it can not have more than one capability element.
+	       3) aff_inv and aff_cbase are expansions of bases of induction
+		  variables, hence they must have only one capability element.
+		  This also means that if their capability elements cancel out,
+		  `rat` must be 1.  */
+	  gcc_assert (aff_cap_provenance_p (aff_var));
+	  gcc_assert (aff_var->elts[0].coef == 1);
+	  gcc_assert (rat == 1);
+	}
+      else if (ops_equal)
+	{
+	  /* Ensure that any metadata on aff_var is removed and we're just left
+	     with metadata on the invariant part.  */
+	  aff_combination_drop_capability (aff_var);
+	  if (cap_complication)
+	    *cap_complication += 5;
+	}
+      else
+	{
+	  /* We have to drop the metadata from somewhere in order to make it
+	     clear where metadata is coming from.
+
+	     The capability metadata on these bases could be the same or could
+	     be different.  If it's the same then it doesn't really matter
+	     where we take the metadata from, but having metadata on the use
+	     base seems like a nice mental model.
+
+	     If the metadata is different between these two objects, then we
+	     *must* take the metadata from the use, since that's the metadata
+	     that this use would use.
+
+	     This will not always leave us with a valid capability (the
+	     possible difference in `rat` means that we could have huge
+	     constant offsets leading to invalid capabilities), so everywhere
+	     that this function is used must be careful about handling
+	     temporaries in calculations.  */
+	  aff_combination_drop_capability (&aff_cbase);
+	  aff_combination_drop_capability (aff_var);
+	  if (cap_complication)
+	    *cap_complication += 5;
+	}
+    }
+  else if (aff_cap_provenance_p (aff_inv) && !aff_cap_provenance_p (&aff_cbase))
+    {
+      /* Use has metadata, candidate does not.
+	 Need to maintain metadata on use, that happens naturally.  */
+      if (cap_complication)
+	*cap_complication += 5;
+    }
+  else if (!aff_cap_provenance_p (aff_inv) && aff_cap_provenance_p (&aff_cbase))
+    {
+      /* Use does not have metadata, candidate does.
+	 Metadata is same on aff_cbase and aff_var, but may not appear the same
+	 (e.g. may be stored in different capability SSA_NAME's).  */
+      gcc_assert (aff_cap_provenance_p (aff_var));
+      aff_combination_drop_capability (&aff_cbase);
+      aff_combination_drop_capability (aff_var);
+    }
+
   aff_combination_scale (&aff_cbase, -rat);
   aff_combination_add (aff_inv, &aff_cbase);
+  /* The capability type of aff_inv may have been lost after adding aff_cbase
+     (the `tmp == 0 && ops_equal` case in the clause above).
+     In that case we don't want to be attempting to convert it back to a
+     capability.  */
+  if (!capability_type_p (aff_inv->type) && capability_type_p (uutype))
+    uutype = noncapability_type (uutype);
   if (common_type != uutype)
     aff_combination_convert (aff_inv, uutype);
 
   aff_combination_scale (aff_var, rat);
+
+  /* Can not both have metadata (otherwise the resulting variable would need to
+     get metadata from two sources -- i.e. get_computation_aff_1 has split the
+     value into separate parts badly).  However could easily both not have
+     metadata (if the variable is not a capability).  */
+  gcc_assert (!aff_cap_provenance_p (aff_inv)
+	      || ! aff_cap_provenance_p (aff_var));
   return true;
 }
 
@@ -4161,7 +4281,7 @@ get_debug_computation_at (class loop *loop, gimple *at,
   widest_int rat;
 
   /* We must have a precision to express the values of use.  */
-  if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
+  if (TYPE_NONCAP_PRECISION (utype) >= TYPE_NONCAP_PRECISION (ctype))
     return NULL_TREE;
 
   /* Try to handle the case that get_computation_at doesn't,
@@ -4194,14 +4314,14 @@ get_debug_computation_at (class loop *loop, gimple *at,
   if (bits == -1)
     bits = wi::floor_log2 (rat) + 1;
   if (!cand->iv->no_overflow
-      && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
+      && TYPE_NONCAP_PRECISION (utype) + bits > TYPE_NONCAP_PRECISION (ctype))
     return NULL_TREE;
 
   var = var_at_stmt (loop, cand, at);
-
-  if (POINTER_TYPE_P (ctype))
+  if (POINTER_TYPE_P (ctype) || capability_type_p (utype)
+      || capability_type_p (ctype))
     {
-      ctype = unsigned_type_for (ctype);
+      ctype = unsigned_type_for (noncapability_type (ctype));
       cbase = fold_convert (ctype, cbase);
       cstep = fold_convert (ctype, cstep);
       var = fold_convert (ctype, var);
@@ -4221,6 +4341,21 @@ get_debug_computation_at (class loop *loop, gimple *at,
 	var = fold_build1 (NEGATE_EXPR, sizetype, var);
       var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
     }
+  else if (capability_type_p (utype))
+    {
+      gcc_assert (INTCAP_TYPE_P (utype));
+      /* `var` can't have side-effects (as evidenced by the two asserts).
+	 If it were to have side-effects in the future then we would need to
+	 avoid the duplication of it while building our replace_address_value.
+	 */
+      gcc_assert (TREE_CODE (var) == SSA_NAME);
+      gcc_assert (!TREE_SIDE_EFFECTS (var));
+      tree tmp_type = noncapability_type (utype);
+      var = fold_convert (tmp_type, var);
+      var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, tmp_type,
+			 fold_drop_capability (ubase), var);
+      var = fold_build_replace_address_value (ubase, var);
+    }
   else
     {
       var = fold_convert (utype, var);
@@ -4689,8 +4824,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   bool simple_inv = true;
   tree comp_inv = NULL_TREE, type = aff_var->type;
   comp_cost var_cost = no_cost, cost = no_cost;
-  struct mem_address parts = {NULL_TREE, integer_one_node,
-			      NULL_TREE, NULL_TREE, NULL_TREE};
+  struct mem_address parts
+    = {NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE};
+  if (aff_cap_provenance_p (aff_inv) || aff_cap_provenance_p (aff_var))
+    parts.base_cap = build_int_cst (ptr_type_node, 1);
+  else
+    parts.base = integer_one_node;
   machine_mode addr_mode = TYPE_MODE (type);
   machine_mode mem_mode = TYPE_MODE (use->mem_type);
   addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
@@ -4727,18 +4866,24 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
 	  /* Base is fixed address and is moved to symbol part.  */
 	  if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
 	    parts.base = NULL_TREE;
+	  if (parts.symbol != NULL_TREE)
+	    parts.base_cap = NULL_TREE;
 
 	  /* Addressing mode "symbol + base + index [<< scale] [+ offset]".  */
 	  if (parts.symbol != NULL_TREE
 	      && !valid_mem_ref_p (mem_mode, as, &parts))
 	    {
 	      aff_combination_add_elt (aff_inv, parts.symbol, 1);
+	      bool into_base_cap = tree_is_capability_value (parts.symbol);
 	      parts.symbol = NULL_TREE;
 	      /* Reset SIMPLE_INV since symbol address needs to be computed
 		 outside of address expression in this case.  */
 	      simple_inv = false;
 	      /* Symbol part is moved back to base part, it can't be NULL.  */
-	      parts.base = integer_one_node;
+	      if (into_base_cap)
+		parts.base_cap = build_int_cst (ptr_type_node, 1);
+	      else
+		parts.base = integer_one_node;
 	    }
 	}
       else
@@ -4784,9 +4929,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   if (comp_inv != NULL_TREE)
     cost = force_var_cost (data, comp_inv, inv_vars);
   if (ratio != 1 && parts.step == NULL_TREE)
-    var_cost += mult_by_coeff_cost (ratio, addr_mode, speed);
+    var_cost += mult_by_coeff_cost (ratio, noncapability_mode (addr_mode),
+				    speed);
   if (comp_inv != NULL_TREE && parts.index == NULL_TREE)
-    var_cost += add_cost (speed, addr_mode);
+    var_cost += CAPABILITY_MODE_P (addr_mode)
+	    ? pointer_add_cost (speed, addr_mode)
+	    : add_cost (speed, addr_mode);
 
   if (comp_inv && inv_expr && !simple_inv)
     {
@@ -4816,6 +4964,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
      the only kind of index that the target allows.  */
   if (parts.step != NULL_TREE && ok_without_ratio_p)
     cost.complexity += 1;
+  if (parts.base_cap != NULL_TREE && parts.index != NULL_TREE)
+    cost.complexity += 1;
   if (parts.base != NULL_TREE && parts.index != NULL_TREE)
     cost.complexity += 1;
   if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
@@ -4854,6 +5004,15 @@ get_scaled_computation_cost_at (ivopts_data *data, gimple *at, comp_cost cost)
   return cost;
 }
 
+static bool
+type_requires_same_base (tree type)
+{
+  /* Can represent a capability use in terms of a non-capability candidate by
+     simply extracting the capability part out and using the "rest" as an
+     addend.  */
+  return !capability_type_p (type) && POINTER_TYPE_P (type);
+}
+
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
    to create an address from it, otherwise we want to get it into
@@ -4872,7 +5031,7 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
   tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
   tree comp_inv = NULL_TREE;
   HOST_WIDE_INT ratio, aratio;
-  comp_cost cost;
+  comp_cost cost, cap_transform_cost = no_cost;
   widest_int rat;
   aff_tree aff_inv, aff_var;
   bool speed = optimize_bb_for_speed_p (gimple_bb (at));
@@ -4884,29 +5043,23 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
   if (inv_expr)
     *inv_expr = NULL;
 
-  /* MORELLO TODO
-     Was trying to get these induction variables working for capabilities.
-     This function should eventually be modified to handle them (especially how
-     we should use TYPE_PRECISION here).
-     However, skipping for now (would like to do in the future).  */
-  if (capability_type_p (ctype) || capability_type_p (utype))
-    return infinite_cost;
-
   /* Check if we have enough precision to express the values of use.  */
-  if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
+  if (TYPE_NONCAP_PRECISION (utype) > TYPE_NONCAP_PRECISION (ctype))
     return infinite_cost;
 
   if (address_p
       || (use->iv->base_object
 	  && cand->iv->base_object
-	  && POINTER_TYPE_P (TREE_TYPE (use->iv->base_object))
-	  && POINTER_TYPE_P (TREE_TYPE (cand->iv->base_object))))
+	  && type_requires_same_base (TREE_TYPE (use->iv->base_object))
+	  && type_requires_same_base (TREE_TYPE (cand->iv->base_object))))
     {
       /* Do not try to express address of an object with computation based
 	 on address of a different object.  This may cause problems in rtl
 	 level alias analysis (that does not expect this to be happening,
 	 as this is illegal in C), and would be unlikely to be useful
-	 anyway.  */
+	 anyway.
+	 Similar for intcap typed variables where we're concerned about
+	 provenance of the capability.  */
       if (use->iv->base_object
 	  && cand->iv->base_object
 	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
@@ -4914,7 +5067,8 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
     }
 
   if (!get_computation_aff_1 (data->current_loop, at, use,
-			      cand, &aff_inv, &aff_var, &rat)
+			      cand, &aff_inv, &aff_var, &rat,
+			      &cap_transform_cost)
       || !wi::fits_shwi_p (rat))
     return infinite_cost;
 
@@ -4926,12 +5080,18 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
       cost = get_scaled_computation_cost_at (data, at, cost);
       /* For doloop IV cand, add on the extra cost.  */
       cost += cand->doloop_p ? targetm.doloop_cost_for_address : 0;
-      return cost;
+      return cost + cap_transform_cost;
     }
 
   bool simple_inv = (aff_combination_const_p (&aff_inv)
 		     || aff_combination_singleton_var_p (&aff_inv));
-  tree signed_type = signed_type_for (aff_combination_type (&aff_inv));
+  tree signed_type;
+  if (capability_type_p (aff_combination_type (&aff_inv)))
+    signed_type = intcap_type_node;
+  else
+    signed_type = signed_type_for (aff_combination_type (&aff_inv));
+
+
   aff_combination_convert (&aff_inv, signed_type);
   if (!aff_combination_zero_p (&aff_inv))
     comp_inv = aff_combination_to_tree (&aff_inv);
@@ -4954,7 +5114,7 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
     cost = no_cost;
 
   /* Need type narrowing to represent use with cand.  */
-  if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
+  if (TYPE_NONCAP_PRECISION (utype) < TYPE_NONCAP_PRECISION (ctype))
     {
       machine_mode outer_mode = TYPE_MODE (utype);
       machine_mode inner_mode = TYPE_MODE (ctype);
@@ -4968,13 +5128,16 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
     aratio = ratio;
 
   if (ratio != 1)
-    cost += mult_by_coeff_cost (aratio, TYPE_MODE (utype), speed);
+    cost += mult_by_coeff_cost (aratio,
+				noncapability_mode (TYPE_MODE (utype)), speed);
 
   /* TODO: We may also need to check if we can compute  a + i * 4 in one
      instruction.  */
   /* Need to add up the invariant and variant parts.  */
   if (comp_inv && !integer_zerop (comp_inv))
-    cost += add_cost (speed, TYPE_MODE (utype));
+    cost += capability_type_p (utype)
+      ? pointer_add_cost (speed, TYPE_MODE (utype))
+      : add_cost (speed, TYPE_MODE (utype));
 
   cost = get_scaled_computation_cost_at (data, at, cost);
 
@@ -4982,7 +5145,7 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
   if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR)
     cost += targetm.doloop_cost_for_generic;
 
-  return cost;
+  return cost + cap_transform_cost;
 }
 
 /* Determines cost of computing the use in GROUP with CAND in a generic
@@ -7217,6 +7380,46 @@ create_new_ivs (struct ivopts_data *data, class iv_ca *set)
     }
 }
 
+static inline tree
+rewrite_combine_base_and_step (tree comp_op1, tree comp_op2,
+			       poly_widest_int offset)
+{
+  tree comp;
+  if (POINTER_TYPE_P (TREE_TYPE (comp_op2))
+      || capability_type_p (TREE_TYPE (comp_op2)))
+    std::swap (comp_op1, comp_op2);
+
+  if (POINTER_TYPE_P (TREE_TYPE (comp_op1)))
+    {
+      comp = fold_build_pointer_plus (comp_op1,
+				      fold_convert (sizetype, comp_op2));
+      comp = fold_build_pointer_plus (comp,
+				      wide_int_to_tree (sizetype, offset));
+    }
+  else if (INTCAP_TYPE_P (TREE_TYPE (comp_op1)))
+    {
+      /* If this ever did have side effects we would need to avoid its
+	 duplication in the below replace_address_value.  */
+      gcc_assert (!TREE_SIDE_EFFECTS (comp_op1));
+      tree noncap_type = noncapability_type (TREE_TYPE (comp_op1));
+      tree temp_op1 = fold_convert (noncap_type, comp_op1);
+      comp_op2 = fold_convert (noncap_type, comp_op2);
+      comp = fold_build2 (PLUS_EXPR, noncap_type, temp_op1, comp_op2);
+      comp = fold_build2 (PLUS_EXPR, noncap_type, comp,
+			  wide_int_to_tree (noncap_type, offset));
+      comp = fold_build_replace_address_value (comp_op1, comp);
+    }
+  else
+    {
+      comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp_op1,
+			  fold_convert (TREE_TYPE (comp_op1), comp_op2));
+      comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp,
+			  wide_int_to_tree (TREE_TYPE (comp_op1), offset));
+    }
+
+  return comp;
+}
+
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
 
@@ -7304,6 +7507,36 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
   aff_inv.offset = 0;
 
   gimple_seq stmt_list = NULL, seq = NULL;
+  /* Have to do some mangling in order to maintain capability validity.
+     Sometimes we have something like the invariant being 2*&array and aff_var
+     representing -1*&array.  If we were to calculate both values and sum them
+     together the temporaries would be invalid.
+     Similar can happen where we have different SSA_NAME's representing
+     essentially the same metadata.  Hence we check based on whether there is
+     metadata on both the aff_inv and the candidate base rather than on the
+     affine tree directly.
+
+     Another case that can happen is we're representing a capability use in
+     terms of a non-capability candidate.  When this happens the non-capability
+     candidate could leave us with an aff_inv which is essentially "just the
+     metadata" of the use.  That would invalidate the capability due to being
+     far out of bounds.  */
+  tree saved_metadata = NULL_TREE;
+  tree metadata_final_type = NULL_TREE;
+  if (aff_cap_provenance_p (&aff_inv)
+      && (aff_inv.n != 1 || aff_inv.elts[0].coef != 1))
+    {
+      /* N.b. we maintain a separation between the element type and the affine
+	 tree type so that if our element is a pointer and we've generated an
+	 intcap typed induction variable we can avoid using INTCAP typed
+	 operations and the associated indirection via REPLACE_ADDRESS_VALUE.
+
+	 Must do this whenever there is any offset for aff_inv since we can't
+	 tell when we have a very large integral offset in our "candidate" that
+	 was subtracted from the "use" and would take us out of range.  */
+      metadata_final_type = aff_inv.type;
+      saved_metadata = aff_combination_extract_provenance_cap (&aff_inv);
+    }
   tree comp_op1 = aff_combination_to_tree (&aff_inv);
   tree comp_op2 = aff_combination_to_tree (&aff_var);
   gcc_assert (comp_op1 && comp_op2);
@@ -7313,22 +7546,11 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
   comp_op2 = force_gimple_operand (comp_op2, &seq, true, NULL);
   gimple_seq_add_seq (&stmt_list, seq);
 
-  if (POINTER_TYPE_P (TREE_TYPE (comp_op2)))
-    std::swap (comp_op1, comp_op2);
-
-  if (POINTER_TYPE_P (TREE_TYPE (comp_op1)))
-    {
-      comp = fold_build_pointer_plus (comp_op1,
-				      fold_convert (sizetype, comp_op2));
-      comp = fold_build_pointer_plus (comp,
-				      wide_int_to_tree (sizetype, offset));
-    }
-  else
+  comp = rewrite_combine_base_and_step (comp_op1, comp_op2, offset);
+  if (saved_metadata)
     {
-      comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp_op1,
-			  fold_convert (TREE_TYPE (comp_op1), comp_op2));
-      comp = fold_build2 (PLUS_EXPR, TREE_TYPE (comp_op1), comp,
-			  wide_int_to_tree (TREE_TYPE (comp_op1), offset));
+      comp = rewrite_combine_base_and_step (saved_metadata, comp, 0);
+      comp = fold_convert (metadata_final_type, comp);
     }
 
   comp = fold_convert (type, comp);
@@ -7550,15 +7772,20 @@ rewrite_use_compare (struct ivopts_data *data,
     {
       tree var = var_at_stmt (data->current_loop, cand, use->stmt);
       tree var_type = TREE_TYPE (var);
-      gimple_seq stmts;
+      gimple_seq stmts, var_stmts;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "Replacing exit test: ");
 	  print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
 	}
+      var = force_gimple_operand (fold_drop_capability (var),
+				  &var_stmts, true, NULL_TREE);
+      if (var_stmts)
+	gsi_insert_seq_before (&bsi, var_stmts, GSI_SAME_STMT);
       compare = cp->comp;
-      bound = unshare_expr (fold_convert (var_type, bound));
+      bound = unshare_expr (fold_convert (noncapability_type (var_type),
+					  bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
       if (stmts)
 	gsi_insert_seq_on_edge_immediate (
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index ef35a62f2619..0296db9be84e 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4865,7 +4865,7 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
       standard_iv_increment_position (loop, &incr_gsi, &insert_after);
 
       create_iv (aggr_ptr_init,
-		 fold_convert (aggr_ptr_type, iv_step),
+		 fold_convert (noncapability_type (aggr_ptr_type), iv_step),
 		 aggr_ptr, loop, &incr_gsi, insert_after,
 		 &indx_before_incr, &indx_after_incr);
       incr = gsi_stmt (incr_gsi);
@@ -4894,9 +4894,11 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
     {
       standard_iv_increment_position (containing_loop, &incr_gsi,
 				      &insert_after);
-      create_iv (aptr, fold_convert (aggr_ptr_type, DR_STEP (dr)), aggr_ptr,
-		 containing_loop, &incr_gsi, insert_after, &indx_before_incr,
-		 &indx_after_incr);
+      create_iv (aptr,
+		 fold_convert (noncapability_type (aggr_ptr_type),
+			       DR_STEP (dr)),
+		 aggr_ptr, containing_loop, &incr_gsi, insert_after,
+		 &indx_before_incr, &indx_after_incr);
       incr = gsi_stmt (incr_gsi);
 
       /* Copy the points-to information if it exists. */
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 47cfa6f4061a..adbc85492adf 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1605,7 +1605,7 @@ get_misalign_in_elems (gimple **seq, loop_vec_info loop_vinfo)
   tree start_addr = vect_create_addr_base_for_vector_ref (loop_vinfo,
 							  stmt_info, seq,
 							  offset);
-  tree type = unsigned_type_for (TREE_TYPE (start_addr));
+  tree type = unsigned_type_for (noncapability_type (TREE_TYPE (start_addr)));
   if (target_align.is_constant (&target_align_c))
     target_align_minus_1 = build_int_cst (type, target_align_c - 1);
   else
@@ -3073,7 +3073,9 @@ vect_create_cond_for_align_checks (loop_vec_info loop_vinfo,
      all zeros followed by all ones.  */
   gcc_assert ((mask != 0) && ((mask & (mask+1)) == 0));
 
-  int_ptrsize_type = signed_type_for (ptr_type_node);
+  /* We want to calculate alignment, this is a property solely of the address
+     value and not of any capability metadata.  */
+  int_ptrsize_type = signed_type_for (noncapability_type (ptr_type_node));
 
   /* Create expression (mask & (dr_1 || ... || dr_n)) where dr_i is the address
      of the first vector of the i'th data reference. */
@@ -3151,8 +3153,8 @@ vect_create_cond_for_unequal_addrs (loop_vec_info loop_vinfo, tree *cond_expr)
   vec_object_pair *pair;
   FOR_EACH_VEC_ELT (pairs, i, pair)
     {
-      tree addr1 = build_fold_addr_expr (pair->first);
-      tree addr2 = build_fold_addr_expr (pair->second);
+      tree addr1 = fold_drop_capability (build_fold_addr_expr (pair->first));
+      tree addr2 = fold_drop_capability (build_fold_addr_expr (pair->second));
       tree part_cond_expr = fold_build2 (NE_EXPR, boolean_type_node,
 					 addr1, addr2);
       chain_cond_expr (cond_expr, part_cond_expr);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d24187016279..424b4ce51a57 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2154,7 +2154,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 	     non-gap element in the same B-sized block.  */
 	  if (overrun_p
 	      && gap < (vect_known_alignment_in_bytes (first_dr_info)
-			/ vect_get_scalar_dr_size (first_dr_info)))
+			/ vect_get_scalar_dr_size (first_dr_info))
+	      && !targetm.capabilities_in_hardware ())
 	    overrun_p = false;
 
 	  /* If the gap splits the vector in half and the target
@@ -2250,7 +2251,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
       if (would_overrun_p
 	  && !masked_p
 	  && gap < (vect_known_alignment_in_bytes (first_dr_info)
-		    / vect_get_scalar_dr_size (first_dr_info)))
+		    / vect_get_scalar_dr_size (first_dr_info))
+	  && !targetm.capabilities_in_hardware ())
 	would_overrun_p = false;
 
       if (!STMT_VINFO_STRIDED_P (first_stmt_info)
@@ -6906,7 +6908,7 @@ vectorizable_scan_store (vec_info *vinfo,
   tree vec_oprnd2 = NULL_TREE;
   tree vec_oprnd3 = NULL_TREE;
   tree dataref_ptr = DR_BASE_ADDRESS (dr_info->dr);
-  tree dataref_offset = build_int_cst (ref_type, 0);
+  tree dataref_offset = build_int_cst (noncapability_type (ref_type), 0);
   tree bump = vect_get_data_ptr_increment (vinfo, dr_info,
 					   vectype, VMAT_CONTIGUOUS);
   tree ldataref_ptr = NULL_TREE;
@@ -6934,9 +6936,10 @@ vectorizable_scan_store (vec_info *vinfo,
       if (ldataref_ptr)
 	{
 	  vec_oprnd2 = make_ssa_name (vectype);
+	  tree tmp = fold_convert_for_mem_ref (ref_type, dataref_offset);
 	  tree data_ref = fold_build2 (MEM_REF, vectype,
 				       unshare_expr (ldataref_ptr),
-				       dataref_offset);
+				       tmp);
 	  vect_copy_ref_info (data_ref, DR_REF (load1_dr_info->dr));
 	  gimple *g = gimple_build_assign (vec_oprnd2, data_ref);
 	  vect_finish_stmt_generation (vinfo, stmt_info, g, gsi);
@@ -7021,9 +7024,10 @@ vectorizable_scan_store (vec_info *vinfo,
 
       if (!inscan_var_store)
 	{
+	  tree tmp = fold_convert_for_mem_ref (ref_type, dataref_offset);
 	  tree data_ref = fold_build2 (MEM_REF, vectype,
 				       unshare_expr (dataref_ptr),
-				       dataref_offset);
+				       tmp);
 	  vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
 	  g = gimple_build_assign (data_ref, new_temp);
 	  vect_finish_stmt_generation (vinfo, stmt_info, g, gsi);
@@ -7036,10 +7040,10 @@ vectorizable_scan_store (vec_info *vinfo,
       {
 	if (j != 0)
 	  dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
-
+	tree tmp = fold_convert_for_mem_ref (ref_type, dataref_offset);
 	tree data_ref = fold_build2 (MEM_REF, vectype,
 				     unshare_expr (dataref_ptr),
-				     dataref_offset);
+				     tmp);
 	vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
 	gimple *g = gimple_build_assign (data_ref, orig);
 	vect_finish_stmt_generation (vinfo, stmt_info, g, gsi);
@@ -7924,7 +7928,7 @@ vectorizable_store (vec_info *vinfo,
 					get_alias_set (TREE_TYPE (ref_type))))
 	    {
 	      dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr));
-	      dataref_offset = build_int_cst (ref_type, 0);
+	      dataref_offset = build_zero_cst (noncapability_type (ref_type));
 	    }
 	  else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 	    {
@@ -8159,11 +8163,12 @@ vectorizable_store (vec_info *vinfo,
 		}
 	      else
 		{
+		  tree tmp = dataref_offset
+		    ? fold_convert_for_mem_ref (ref_type, dataref_offset)
+		    : build_int_cst (ref_type, 0);
 		  data_ref = fold_build2 (MEM_REF, vectype,
 					  dataref_ptr,
-					  dataref_offset
-					  ? dataref_offset
-					  : build_int_cst (ref_type, 0));
+					  tmp);
 		  if (aligned_access_p (first_dr_info))
 		    ;
 		  else if (DR_MISALIGNMENT (first_dr_info) == -1)
@@ -9186,7 +9191,7 @@ vectorizable_load (vec_info *vinfo,
 		  || alignment_support_scheme == dr_unaligned_supported))
 	    {
 	      dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr));
-	      dataref_offset = build_int_cst (ref_type, 0);
+	      dataref_offset = build_int_cst (noncapability_type (ref_type), 0);
 	    }
 	  else if (diff_first_stmt_info)
 	    {
@@ -9436,7 +9441,12 @@ vectorizable_load (vec_info *vinfo,
 			    && gap != 0
 			    && known_eq (nunits, (group_size - gap) * 2)
 			    && known_eq (nunits, group_size)
-			    && gap >= (vect_align / scalar_dr_size))
+			    && (gap >= (vect_align / scalar_dr_size)
+				/* Without capabilities we're fine to over-read
+				   if we know it won't cross a page boundary.
+				   With capabilities any overrun must be
+				   avoided.  */
+				|| targetm.capabilities_in_hardware ()))
 			  {
 			    tree half_vtype;
 			    new_vtype
@@ -9445,9 +9455,10 @@ vectorizable_load (vec_info *vinfo,
 			    if (new_vtype != NULL_TREE)
 			      ltype = half_vtype;
 			  }
+			tree tmp_type = noncapability_type (ref_type);
 			tree offset
 			  = (dataref_offset ? dataref_offset
-					    : build_int_cst (ref_type, 0));
+					    : build_int_cst (tmp_type, 0));
 			if (ltype != vectype
 			    && memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 			  {
@@ -9456,6 +9467,7 @@ vectorizable_load (vec_info *vinfo,
 			    tree gapcst = build_int_cst (ref_type, gap_offset);
 			    offset = size_binop (PLUS_EXPR, offset, gapcst);
 			  }
+			offset = fold_convert_for_mem_ref (ref_type, offset);
 			data_ref
 			  = fold_build2 (MEM_REF, ltype, dataref_ptr, offset);
 			if (alignment_support_scheme == dr_aligned)
diff --git a/gcc/tree.def b/gcc/tree.def
index b0920f06d658..06158167da03 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1074,7 +1074,8 @@ DEFTREECODE (REALIGN_LOAD_EXPR, "realign_load", tcc_expression, 3)
    The type of STEP, INDEX and INDEX2 is sizetype.
 
    The type of BASE is a pointer type.  If BASE is not an address of
-   a static or global variable INDEX2 will be NULL.
+   a static or global variable, and is not of capability type, INDEX2 will be
+   NULL.
 
    The type of OFFSET is a pointer type and determines TBAA the same as
    the constant offset operand in MEM_REF.  */