More questions on points-to analysis

Fri Mar 19 10:29:19 GMT 2021

Hello Richard,

I have a related question. I have the following GCC program (which is
again an altered version of ipa-pta-16.c:

```

struct X
{
  long l1;
  struct Y
    {
      long l2;
      int *p;
    } y;
};
int i;
static int __attribute__((noinline))
foo (struct X *x)
{
  struct Y y = x->y;
  *y.p = 0;
   //i = 1;
  return *y.p;
}
extern void abort (void);
int main()
{
  struct X x;
  struct X t;
  x.y.p = &i;
  long* e = &t.l1;
  if (foo(&x) != 1)
    abort ();
  return e;
}
```

I am compiling it with gcc -O2 -ftree-pta -fdump-tree-alias
-ftree-salias ipa-pta-16.c

and I get the following constraints for function foo.

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
```

I am just ordering to make this easier:

```
ANYTHING = &ANYTHING
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL

ESCAPED = ESCAPED + UNKNOWN

ESCAPED = *ESCAPED
*ESCAPED = NONLOCAL
*ISRA.4 = derefaddrtmp(9)
```

I now construct a constraint graph. It is basically a collection of
unconnected nodes, except for ESCAPED which points to itself.

I then walk over this graph and first encounter the constraint
"ESCAPED = ESCAPED + UNKNOWN". Since Sol(ESCAPED) is the empty set
nothing happens.
Similar for the constraints "ESCAPED = *ESCAPED" and "*ESCAPED = NONLOCAL".

I then walk to "*ISRA.4 = derefaddrtmp(9)". Sol(ISRA.4) = { NONLOCAL
}. Then there's possibly an edge made from NONLOCAL to derefaddrtmp.
(I am not sure of the implementation, since NONLOCAL doesn't change it
might not require an added edge.). And because NONLOCAL is a special
variable, Sol(NONLOCAL) does not change.

Now, none of the solutions have changed. We have possibly added an
edge to NONLOCAL, so let's just process the graph once more. Again
Sol(ESCAPED) is still empty so "ESCAPED = ESCAPED + UNKNOWN", "ESCAPED
= *ESCAPED", and "*ESCAPED = NONLOCAL" should have no effect ."Then
*ISRA.4 = derefaddrtmp(9)" also has no effect since nothing can
happen.

My question here is, why if we look at the points-to sets dumped by GCC we see

ESCAPED = { NULL }

Here is the full points-to sets:

computed for foo:

```
ANYTHING = { ANYTHING }
ESCAPED = { NULL }
NONLOCAL = { ESCAPED NONLOCAL }
STOREDANYTHING = { }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
foo.constprop.0.isra.0 = { }
```

I think this might have to do with "+ UNKNOWN" but I am unsure about
its semantics in the constraint set.

Thanks, any help is appreciated!

On Thu, 18 Mar 2021 at 13:36, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Perfect, thank you! This is exactly the answer that I was looking for.
>
>
> On Thu, 18 Mar 2021 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> > >
> > > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > > little bit. But now I want to focus on this:
> > >
> > > > > * the process in GCC that generates the constraints for NULL works
> > > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > > solver should yield a conservatively correct answer) but the process
> > > > > that solves the constraints relaxes the solutions for the NULL
> > > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > > solving algorithm somehow)
> > > >
> > > > No, that part should work OK.
> > > >
> > >
> > > So, let's ignore the other solver for now and instead focus on the
> > > concrete example I presented on the previous email. If GCC is solving
> > > these constraints:
> > >
> > > ```
> > > ANYTHING = &ANYTHING
> > > ESCAPED = *ESCAPED
> > > ESCAPED = ESCAPED + UNKNOWN
> > > *ESCAPED = NONLOCAL
> > > NONLOCAL = &NONLOCAL
> > > NONLOCAL = &ESCAPED
> > > INTEGER = &ANYTHING
> > > ISRA.4 = &NONLOCAL
> > > derefaddrtmp(9) = &NULL
> > > *ISRA.4 = derefaddrtmp(9)
> > > i = NONLOCAL
> > > i = &NONLOCAL
> > > ESCAPED = &NONLOCAL
> > > _2 = *ISRA.4
> > > ```
> > >
> > > What would a hand calculated solution gives us vs what was the
> > > solution given by GCC?
> > >
> > > I am following the algorithm stated on Section 3.3 of Structure
> > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > > UNKNOWN constraint since there isn't any other field offset that needs
> > > to be calculated.
> > >
> > > First, I want to make some adjustments. I am going to be using "=" to
> > > signify the \supseteq symbol and I will be adding curly braces to
> > > specify the element in a set as opposed to the whole set. Therefore
> > > the constraints will now become (ordered slightly differently):
> > >
> > > ```
> > > ====direct contraints========
> > > ANYTHING = { ANYTHING }
> > > ESCAPED = { NONLOCAL }
> > > NONLOCAL = { NONLOCAL }
> > > NONLOCAL =  { ESCAPED }
> > > INTEGER = { ANYTHING }
> > > ISRA.4 = { NONLOCAL }
> > > derefaddrtmp(9) = { NULL }
> > > i = { NONLOCAL }
> > >
> > > ====complex constraints======
> > > ESCAPED = *ESCAPED
> > > *ESCAPED = NONLOCAL
> > > *ISRA.4 = derefaddrtmp(9)
> > > _2 = *ISRA.4
> > >
> > > ===== copy-constraints======
> > > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > > it will matter...
> > > i = NONLOCAL
> > > ```
> > >
> > > Solution sets are basically the direct constraints at the moment.
> > >
> > > Let's now create the graph
> > >
> > > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has no outgoing edge
> > >
> > > Now, we can iterate over this set of nodes
> > >
> > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > > edges, but it has complex-constraints. Let's modify the graph.
> > >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > > edge from ESCAPED to NONLOCAL.
> > >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > > edge from NONLOCAL to NONLOCAL
> > >
> > > The graph is now transformed to
> > >
> > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has an edge going to NONLOCAL
> > >
> > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> > >
> > > Now we continue walking
> > >
> > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > > There are no edges, but it has complex-constraints. Let's modify the
> > > graph.
> > >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > > a copy edge from NONLOCAL to derefaddrtmp(9).
> > >
> > > The graph is now transformed to
> > >
> > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> > >
> > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > > ESCAPED, NULL}.
> > >
> > > Now I could continue, but here is already something that is not shown
> > > in the points-to sets in the dump. It shows that
> > >
> > > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> > >
> > > Looking at the data that I showed yesterday:
> > >
> > > ```
> > > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > > ```
> > >
> > > we see that NULL is not in the solution set of NONLOCAL.
> > >
> > > Now, yesterday you said that NULL is not conservatively correctly
> > > represented in the constraints. You also said today the points-to
> > > analysis should be solving the constraints fine. What I now understand
> > > from this is that while NULL may be pointed to by some constraints, it
> > > doesn't mean that not being on the set means that a pointer will not
> > > point to NULL. However, it should still be shown in the dumps where
> > > the points-to sets are shown for the constraint variables since it is
> > > solved using the same analysis? Is this correct? Am I doing the points
> > > to analysis by hand wrong somehow? Why would NULL not be in
> > > Sol(NONLOCAL) if it is solving the same constraints that I am solving
> > > by hand?
> >
> > Because NONLOCAL is a special var and those do not get "solved".
> > Their points-to set is fixed.  NONLOCAL is the set of "global"
> > variables _at function entry_ (otherwise it would be the same as
> > ESCAPED).
> >
> > Richard.