Bug 65752 - Too strong optimizations int -> pointer casts
Summary: Too strong optimizations int -> pointer casts
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 82177 (view as bug list)
Depends on: 49330 61502
Blocks: 82177
  Show dependency treegraph
 
Reported: 2015-04-13 14:04 UTC by Robbert
Modified: 2018-01-08 08:31 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-10-20 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robbert 2015-04-13 14:04:08 UTC
The following program prints "0" instead of the expected "15":

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main() {
  int x = 0, *p = 0;
  for (uintptr_t i = 0; ; i++) {
    if (i == (uintptr_t)&x) { p = (int*)i; break; }
  }
  *p = 15;
  printf("%d\n", x);
}

gcc -O2 makes too strict assumptions about non-aliasing here: it removes the loop entirely (which is perfectly fine), but then assumes that the pointers p and &x are unrelated.

NB 1: I do not think that DR #260 applies here
NB 2: When compiled with clang, it also optimizes out the loop, but it does print the expected "15"
Comment 1 joseph@codesourcery.com 2015-04-13 17:38:54 UTC
On Mon, 13 Apr 2015, gcc at robbertkrebbers dot nl wrote:

> NB 1: I do not think that DR #260 applies here

Why not?  It seems clear enough that optimizations based on pointer 
origins are intended to be allowed; that it's not allowed to produce a 
pointer based on another pointer out of thin air, because the 
optimizations are more useful in practice than code such as this.
Comment 2 Robbert 2015-04-13 23:25:05 UTC
This example may seem academic, but there is a real problem underneath. 

Of course, I do agree that optimizations based on pointer origins are useful, but it is not an "all or nothing situation". As long as representations of pointers are kept opaque (i.e. the pointer has not been cast to an integer and the bit representation has not been inspected), I cannot think of any objection against such optimizations. They cannot affect the behavior of the code in any obvervable way.

However, in case the representation of the pointer is made transparent, the programmer generally has a good reason for doing so. In such cases the compiler should not perform unexpected optimizations.

Typical examples are:
* Pointer tagging (using some bits of pointers representations to store additional information, for example for pointer hardening techniques or garbage collection).
* Using of pointer representations as keys in hash tables.
* Writing the representation of a chunk of memory containing pointers to memory.
Comment 3 Robbert 2015-04-13 23:25:35 UTC
(In reply to Robbert from comment #2)
> * Writing the representation of a chunk of memory containing pointers to
> memory.
"to memory" should be "to disk"
Comment 4 Richard Biener 2015-04-14 08:39:29 UTC
Optimization reasons as follows.  points-to analysis considers even integers as
possible pointer (parts) and thus starts with i pointing to the NULL object. Incrementing that get's you to any _global_ object (but not automatic vars
as those do not need to have a fixed location). Thus p ends up as

p_8 = { NULL NONLOCAL }

and dereferencing that aliases with any global but not x as that is an automatic
var.

I think the compiler is free to re-locate x right before

  *p = 15;

and load x from a different location.  Computing the value (uintptr_t)&x
isn't preventing that as the address is not "live" after that.
Comment 5 Robbert 2015-04-14 08:46:32 UTC
(In reply to Richard Biener from comment #4)
> as the address is not "live" after that.
Why doesn't gcc consider casting a pointer to an integer as making it "live"? The concrete representation of address has escaped, which means anything can happen to the storage it points to.
Comment 6 Richard Biener 2015-04-14 09:01:46 UTC
(In reply to Robbert from comment #5)
> (In reply to Richard Biener from comment #4)
> > as the address is not "live" after that.
> Why doesn't gcc consider casting a pointer to an integer as making it
> "live"? The concrete representation of address has escaped, which means
> anything can happen to the storage it points to.

Because it isn't stored anywhere, you only performed the implicit
no-op assignment to i via the comparison.

PTA could certainly be told to add an effective

  i = (uintptr_t)&x

constraint when seeing an equality compare.  I'm not sure how pessimizing
that would be to code (and the validity of the testcase is disputed).

Sth like

Index: gcc/tree-ssa-structalias.c
===================================================================
--- gcc/tree-ssa-structalias.c  (revision 222076)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -4771,6 +4771,19 @@ find_func_aliases (struct function *fn,
              || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
        make_escape_constraint (rhsop);
     }
+  else if (gimple_code (t) == GIMPLE_COND)
+    {
+      /* On one path both EQ and NE perform an equality relation
+         and thus code may consider pointer equivalence.  */
+      if (gimple_cond_code (t) == EQ_EXPR
+         || gimple_cond_code (t) == NE_EXPR)
+       {
+         get_constraint_for (gimple_cond_lhs (t), &lhsc);
+         get_constraint_for (gimple_cond_rhs (t), &rhsc);
+         process_all_all_constraints (lhsc, rhsc);
+         process_all_all_constraints (rhsc, lhsc);
+       }
+    }
   /* Handle escapes through return.  */
   else if (gimple_code (t) == GIMPLE_RETURN
           && gimple_return_retval (as_a <greturn *> (t)) != NULL_TREE)

which fixes the testcase (but is incomplete - equivalences built on
gimple assignment RHS need to be considered as well).

It's hard to "conservatively" catch all implicit equivalences (inline
asms, function calls), so I don't think that is a workable solution.
A conservative solution would basically mean to disable most PTA in
practice.
Comment 7 Richard Biener 2015-04-14 09:06:16 UTC
Other testcase:

int main()
{
  int *p;
  int x = 0;
  if (p == &x)
   *p = 1;
  return x;
}

we optimize this to return 0.  You probably wouldn't consider that invalid I guess?
Comment 8 Robbert 2015-04-14 09:23:20 UTC
(In reply to Richard Biener from comment #7)
> Other testcase:
> 
> int main()
> {
>   int *p;
>   int x = 0;
>   if (p == &x)
>    *p = 1;
>   return x;
> }
> 
> we optimize this to return 0.  You probably wouldn't consider that invalid I
> guess?
This is undoubtedly undefined behavior, an indeterminate pointer is used in a comparison ==.

But even in the more controversial

int main() {
  int x = 0, y, *p = &y + 1;
  if (p == &x) { printf("compared succeeded"); *p = 1; }
  return x;
}

I am fine with any of the following behaviors:
* Nothing is printed, 0 is returned
* "compared succeeded" is printed, 0 is returned
* "compared succeeded" is printed, 1 is returned
And anything else because it has undefined behavior. Here a pointer p whose origin/provenance does not involve its representation is used out of its bounds.

The point is that when performing a pointer-to-int cast or when fiddling with object representations of pointers, the representation of the pointer has become transparent and *abstraction is broken* (generally deliberately!). One could have gathered information to reconstruct the pointer in some other context and then the compiler should not perform unexpected formalizations.
Comment 9 Robbert 2015-04-14 09:26:17 UTC
(In reply to Richard Biener from comment #6)
> which fixes the testcase (but is incomplete - equivalences built on
> gimple assignment RHS need to be considered as well).
Can you give an example?
Comment 10 Richard Biener 2015-04-14 10:50:11 UTC
(In reply to Robbert from comment #9)
> (In reply to Richard Biener from comment #6)
> > which fixes the testcase (but is incomplete - equivalences built on
> > gimple assignment RHS need to be considered as well).
> Can you give an example?

just do

  _Bool cmp = (i == (uintptr_t)&x);
  if (cmp) { .... }

in your original testcase (ok, optimization will probably make them equivalent,
but you can't rely on that one).

Similar for function calls

  if (equal (i, (uintptr_t)&x))) { ... }

with the obvious implementation of equal.  Or

  global = (uintptr_t)&x;
  if (equal (i)) { .... }

and see how this will make PTA useless (all pointers passed to a function
whose result might be used in a way to take advantage of an equality relation
need to be considered pointing to anything).  [and then thorougly specify
"take advantage of an equality relation"]
Comment 11 Marek Polacek 2015-04-20 13:45:03 UTC
Looks more like a tree-optimization problem rather than C FE one.
Comment 12 Robbert 2015-04-20 13:53:05 UTC
(In reply to Richard Biener from comment #10)
> and see how this will make PTA useless (all pointers passed to a function
> whose result might be used in a way to take advantage of an equality relation
> need to be considered pointing to anything).  [and then thorougly specify
> "take advantage of an equality relation"]
That is undesired indeed.

Only in case a pointer has been obtained via a construction that breaks abstraction (for example, if it has been obtained via an int -> pointer casts, or by poking bytes somewhere and then reinterpreting these as a pointer) convervative PTA assumptions should be made.
Comment 13 Chung-Kil Hur 2015-05-18 17:13:37 UTC
Hi, I have the following modified code.

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main() {
  int x = 0, *p = 0;
  uintptr_t i;
  uintptr_t j = (uintptr_t) &x;
  uintptr_t k = j+j;
  uintptr_t l = 2*j - j - j;
  for (i = j+j-k+l; ; i++) {
    if (i == (uintptr_t)&x) { p = (int*)i; break; }
  }
  *p = 15;

  printf("%d\n", x);
}

This example still prints out "0" instead of "15".
In this example, it seems that the integer "j+j-k+l" has no provenance.
It is unclear to me how the provenance is calculated.
Is there any concrete rule for calculating provenance?
Comment 14 Richard Biener 2015-05-19 09:21:13 UTC
(In reply to Chung-Kil Hur from comment #13)
> Hi, I have the following modified code.
> 
> #include <stdio.h>
> #include <stdint.h>
> #include <limits.h>
> 
> int main() {
>   int x = 0, *p = 0;
>   uintptr_t i;
>   uintptr_t j = (uintptr_t) &x;
>   uintptr_t k = j+j;
>   uintptr_t l = 2*j - j - j;
>   for (i = j+j-k+l; ; i++) {
>     if (i == (uintptr_t)&x) { p = (int*)i; break; }
>   }
>   *p = 15;
> 
>   printf("%d\n", x);
> }
> 
> This example still prints out "0" instead of "15".
> In this example, it seems that the integer "j+j-k+l" has no provenance.
> It is unclear to me how the provenance is calculated.
> Is there any concrete rule for calculating provenance?

early PTA computes

p_13, points-to non-local, points-to vars: { D.2349 }

  p_13 = (intD.6 *) i_1;
  *p_13 = 15;
  x.1_15 = xD.2349;

while late PTA has an IL with just the equivalency (the rest is optimized
away)

p_6, points-to non-local, points-to NULL, points-to vars: { }

  j_4 = (uintptr_t) &x;

  <bb 3>:
  # i_1 = PHI <0(2), i_5(5)>
  if (i_1 == j_4)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  p_6 = (int *) i_1;
  *p_6 = 15;
  x.1_8 = x;

so it hits essentially the same issue (the testcase is equivalent to the original one).
Comment 15 Chung-Kil Hur 2015-05-19 12:08:22 UTC
Hi Richard,

Thanks for the explanation.
But, what I wonder was how to justify such an optimization, rather than how it works.

I have a better example. This might be a real bug of GCC.

#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t pi = (uintptr_t) &x;
  uintptr_t i, j;

  for (i = 0; i < pi; i++) { }
  j = i;
  /* Note that the following "if" statement is never executed because j == pi. */
  if (j != pi) {
    j = pi;
  }

  *(int*)((pi+i)-j) = 15;

  printf("%d\n", x);
}

This program prints out "0" instead of "15".
Here, "pi" contains the address of the variable x; and "i" and "j" contain the same integer.
So, it seems that "(pi+i)-j" should have a proper provenance of "x" and thus the variable "x" should be updated to 15.
However, GCC seems to think that "(pi+i)-j" has no provenance.

So, as a programmer, I wonder how I should calculate the provenance of an integer in order to see whether casting it to a pointer is valid or not.

Thanks.
Comment 16 Richard Biener 2015-05-19 12:33:16 UTC
(In reply to Chung-Kil Hur from comment #15)
> Hi Richard,
> 
> Thanks for the explanation.
> But, what I wonder was how to justify such an optimization, rather than how
> it works.
> 
> I have a better example. This might be a real bug of GCC.
> 
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t pi = (uintptr_t) &x;
>   uintptr_t i, j;
> 
>   for (i = 0; i < pi; i++) { }
>   j = i;
>   /* Note that the following "if" statement is never executed because j ==
> pi. */

Wrong, j == i != pi.

>   if (j != pi) {
>     j = pi;
>   }
> 
>   *(int*)((pi+i)-j) = 15;
> 
>   printf("%d\n", x);
> }
> 
> This program prints out "0" instead of "15".
> Here, "pi" contains the address of the variable x; and "i" and "j" contain
> the same integer.
> So, it seems that "(pi+i)-j" should have a proper provenance of "x" and thus
> the variable "x" should be updated to 15.
> However, GCC seems to think that "(pi+i)-j" has no provenance.
> 
> So, as a programmer, I wonder how I should calculate the provenance of an
> integer in order to see whether casting it to a pointer is valid or not.
> 
> Thanks.
Comment 17 Chung-Kil Hur 2015-05-19 13:14:43 UTC
Hi Richard,

I modified the example further.

#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i, j;

  for (i = 0; i < xp; i++) { }
  j = i;
  /* The following "if" statement is never executed because j == xp */
  if (j != xp) { 
    printf("hello\n");
    j = xp; 
  }

  *(int*)((xp+i)-j) = 15;

  printf("%d\n", x);
}

The above example does not print "hello", so i can assume that "j = xp" is not executed.
However, the program prints "0" instead of "15".
Can you explain this?
Comment 18 rguenther@suse.de 2015-05-19 14:21:43 UTC
On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> Hi Richard,
> 
> I modified the example further.
> 
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i, j;
> 
>   for (i = 0; i < xp; i++) { }
>   j = i;
>   /* The following "if" statement is never executed because j == xp */
>   if (j != xp) { 
>     printf("hello\n");
>     j = xp; 
>   }

Here j is always xp and thus ...

>   *(int*)((xp+i)-j) = 15;

... this can (and is) simplified to *(int *)i = 15; making it the same
testcase again.

>   printf("%d\n", x);
> }
> 
> The above example does not print "hello", so i can assume that "j = xp" is not
> executed.
> However, the program prints "0" instead of "15".
> Can you explain this?
Comment 19 Chung-Kil Hur 2015-05-19 14:29:15 UTC
(In reply to rguenther@suse.de from comment #18)
> On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > 
> > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > Hi Richard,
> > 
> > I modified the example further.
> > 
> > #include <stdio.h>
> > 
> > int main() {
> >   int x = 0;
> >   uintptr_t xp = (uintptr_t) &x;
> >   uintptr_t i, j;
> > 
> >   for (i = 0; i < xp; i++) { }
> >   j = i;
> >   /* The following "if" statement is never executed because j == xp */
> >   if (j != xp) { 
> >     printf("hello\n");
> >     j = xp; 
> >   }
> 
> Here j is always xp and thus ...
> 

Why is "j" always "xp"?
Since "hello" is not printed, "j = xp;" is not executed.
Is there some special semantics of C?
If so, please let me know a reference.

Thanks!
Comment 20 Marek Polacek 2015-05-19 14:32:15 UTC
(In reply to Chung-Kil Hur from comment #19)
> (In reply to rguenther@suse.de from comment #18)
> > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > 
> > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > > Hi Richard,
> > > 
> > > I modified the example further.
> > > 
> > > #include <stdio.h>
> > > 
> > > int main() {
> > >   int x = 0;
> > >   uintptr_t xp = (uintptr_t) &x;
> > >   uintptr_t i, j;
> > > 
> > >   for (i = 0; i < xp; i++) { }
> > >   j = i;
> > >   /* The following "if" statement is never executed because j == xp */
> > >   if (j != xp) { 
> > >     printf("hello\n");
> > >     j = xp; 
> > >   }
> > 
> > Here j is always xp and thus ...
> > 
> 
> Why is "j" always "xp"?
> Since "hello" is not printed, "j = xp;" is not executed.

Because that "if (j != xp)" guarantees it.
Comment 21 Chung-Kil Hur 2015-05-19 14:56:07 UTC
(In reply to Marek Polacek from comment #20)
> (In reply to Chung-Kil Hur from comment #19)
> > (In reply to rguenther@suse.de from comment #18)
> > > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > > 
> > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > > > Hi Richard,
> > > > 
> > > > I modified the example further.
> > > > 
> > > > #include <stdio.h>
> > > > 
> > > > int main() {
> > > >   int x = 0;
> > > >   uintptr_t xp = (uintptr_t) &x;
> > > >   uintptr_t i, j;
> > > > 
> > > >   for (i = 0; i < xp; i++) { }
> > > >   j = i;
> > > >   /* The following "if" statement is never executed because j == xp */
> > > >   if (j != xp) { 
> > > >     printf("hello\n");
> > > >     j = xp; 
> > > >   }
> > > 
> > > Here j is always xp and thus ...
> > > 
> > 
> > Why is "j" always "xp"?
> > Since "hello" is not printed, "j = xp;" is not executed.
> 
> Because that "if (j != xp)" guarantees it.

OK. here is another modification.

#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i, j;

  for (i = 0; i < xp; i++) { }
  j = i;

  *(int*)j = 15;

  /* The following "if" statement is never executed because j == xp */
  if (j != xp) { 
    printf("hello\n");
    j = xp; 
  }

  *(int*)((xp+i)-j) = 15;

  printf("%d\n", x);
}

This program just prints "0".

So we know that "*(int*)j = 15;" is not executed and thus "j == xp" is not true.

Then, can the following statement change "j" even if the printf is not executed?

if (j != xp) {
   printf("hello\n");
   j = xp;
}

If not, how can "j == xp" suddenly hold?
Comment 22 Chung-Kil Hur 2015-05-19 15:12:15 UTC
(In reply to Chung-Kil Hur from comment #21)
> (In reply to Marek Polacek from comment #20)
> > (In reply to Chung-Kil Hur from comment #19)
> > > (In reply to rguenther@suse.de from comment #18)
> > > > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> > > > 
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > > > 
> > > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > > > > Hi Richard,
> > > > > 
> > > > > I modified the example further.
> > > > > 
> > > > > #include <stdio.h>
> > > > > 
> > > > > int main() {
> > > > >   int x = 0;
> > > > >   uintptr_t xp = (uintptr_t) &x;
> > > > >   uintptr_t i, j;
> > > > > 
> > > > >   for (i = 0; i < xp; i++) { }
> > > > >   j = i;
> > > > >   /* The following "if" statement is never executed because j == xp */
> > > > >   if (j != xp) { 
> > > > >     printf("hello\n");
> > > > >     j = xp; 
> > > > >   }
> > > > 
> > > > Here j is always xp and thus ...
> > > > 
> > > 
> > > Why is "j" always "xp"?
> > > Since "hello" is not printed, "j = xp;" is not executed.
> > 
> > Because that "if (j != xp)" guarantees it.
> 
> OK. here is another modification.
> 
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i, j;
> 
>   for (i = 0; i < xp; i++) { }
>   j = i;
> 
>   *(int*)j = 15;
> 
>   /* The following "if" statement is never executed because j == xp */
>   if (j != xp) { 
>     printf("hello\n");
>     j = xp; 
>   }
> 
>   *(int*)((xp+i)-j) = 15;
> 
>   printf("%d\n", x);
> }
> 
> This program just prints "0".
> 
> So we know that "*(int*)j = 15;" is not executed and thus "j == xp" is not
> true.
> 
> Then, can the following statement change "j" even if the printf is not
> executed?
> 
> if (j != xp) {
>    printf("hello\n");
>    j = xp;
> }
> 
> If not, how can "j == xp" suddenly hold?

One more thing.

If you remove the if-statement, then it prints "15" with GCC -O2.

Since "hello" is not printed, I think the if-statement is the same as no-op.
Thus, removing the if-statement should not change the behavior of the program according to ISO C11.

But, they print different values.

Can you explain this?
Comment 23 schwab 2015-05-19 15:23:32 UTC
"gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes:

> Since "hello" is not printed, I think the if-statement is the same as no-op.
> Thus, removing the if-statement should not change the behavior of the program
> according to ISO C11.

Unless you are invoking undefined behaviour.

Andreas.
Comment 24 Chung-Kil Hur 2015-05-19 15:29:12 UTC
(In reply to schwab from comment #23)
> "gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes:
> 
> > Since "hello" is not printed, I think the if-statement is the same as no-op.
> > Thus, removing the if-statement should not change the behavior of the program
> > according to ISO C11.
> 
> Unless you are invoking undefined behaviour.
> 
> Andreas.

==============================
#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i, j;

  for (i = 0; i < xp; i++) { }
  j = i;

  *(int*)((xp+i)-j) = 15;

  printf("%d\n", x);
}
=============================

This prints "15".
And I do not think there is any UB.
Please correct me if I am wrong.

Then, I add the if-statement.

==============================
#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i, j;

  for (i = 0; i < xp; i++) { }
  j = i;

  /****** begin *******/
  if (j != xp) { 
    printf("hello\n");
    j = xp; 
  }
  /****** end *********/

  *(int*)((xp+i)-j) = 15;

  printf("%d\n", x);
}
=============================

This prints "0" without printing "hello".

Thus, this raises no UB unless "j != xp" raises UB.

If you think "j != xp" raises UB, please explain why and give some reference.

Otherwise, I think it is a bug of GCC.
Comment 25 Richard Biener 2015-05-20 08:01:48 UTC
(In reply to Chung-Kil Hur from comment #24)
> (In reply to schwab from comment #23)
> > "gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes:
> > 
> > > Since "hello" is not printed, I think the if-statement is the same as no-op.
> > > Thus, removing the if-statement should not change the behavior of the program
> > > according to ISO C11.
> > 
> > Unless you are invoking undefined behaviour.
> > 
> > Andreas.
> 
> ==============================
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i, j;
> 
>   for (i = 0; i < xp; i++) { }
>   j = i;
> 
>   *(int*)((xp+i)-j) = 15;
> 
>   printf("%d\n", x);
> }
> =============================
> 
> This prints "15".
> And I do not think there is any UB.
> Please correct me if I am wrong.
> 
> Then, I add the if-statement.
> 
> ==============================
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i, j;
> 
>   for (i = 0; i < xp; i++) { }
>   j = i;
> 
>   /****** begin *******/
>   if (j != xp) { 
>     printf("hello\n");
>     j = xp; 
>   }
>   /****** end *********/
> 
>   *(int*)((xp+i)-j) = 15;
> 
>   printf("%d\n", x);
> }
> =============================
> 
> This prints "0" without printing "hello".
> 
> Thus, this raises no UB unless "j != xp" raises UB.
> 
> If you think "j != xp" raises UB, please explain why and give some reference.
> 
> Otherwise, I think it is a bug of GCC.

The C standard only guarantees that you can convert a pointer to uintptr_t and back, it doesn't guarantee that you can convert a modified uintptr_t back to
a pointer that is valid.

Thus, doing (int *)((xp + i) - j) is invoking undefined behavior.

That you see an effect of this undefined behavior just with the added if
is because that if confuses early GCC optimizations which would have
cancelled i - j to zero, retaining (int *)xp.  Instead it enables later
optimization to see that xp - j cancels and thus it is left with (int *)i.

This is because you are essentially computing (xp + xp) - xp == xp but
dependent on what two pieces get cancelled the pointer is based on
either xp (ok) or on i (not ok - that is related to xp only via an
implicit equivalency).

The net result is that I can't see how to "detect" this kind of situation
in points-to analysis in a way that does not pessimize all pointer-to-integer / integer-to-pointer conversions.  In theory it would be possible to add a
flag similar to -fno-strict-aliasing to do this pessimization (but there
is already -fno-tree-pta which avoids the issue as well).

So in the end my conclusion is that either the testcase invokes undefined
behavior or the C standard has a defect.  Thus the bug is WONTFIX unless
somebody can come up with a way to handle these kind of equivalences
in the points-to algorithm in GCC in a way not pessimizing everything.
One might consider an incomplete approach like that in comment #6 but
I am not convinced about this hack (and one would need to evaluate its
effects on code generation).
Comment 26 Chung-Kil Hur 2015-05-20 10:54:51 UTC
Thanks for the detailed explanations.

> The C standard only guarantees that you can convert a pointer to uintptr_t
> and back, it doesn't guarantee that you can convert a modified uintptr_t
> back to
> a pointer that is valid.
> 
> Thus, doing (int *)((xp + i) - j) is invoking undefined behavior.
> 

I didn't know about this rule.
I thought this cast is valid because "(xp+i)-j" is even "safely-derived".

Could you give a reference for that rule in the standard?

Thanks!
Comment 27 Chung-Kil Hur 2015-05-20 11:20:05 UTC
(In reply to Chung-Kil Hur from comment #26)
> Thanks for the detailed explanations.
> 
> > The C standard only guarantees that you can convert a pointer to uintptr_t
> > and back, it doesn't guarantee that you can convert a modified uintptr_t
> > back to
> > a pointer that is valid.
> > 
> > Thus, doing (int *)((xp + i) - j) is invoking undefined behavior.
> > 
> 
> I didn't know about this rule.
> I thought this cast is valid because "(xp+i)-j" is even "safely-derived".
> 
> Could you give a reference for that rule in the standard?
> 
> Thanks!

It seems that the following rule might be the one.

=================================
7.20.1.4 Integer types capable of holding object pointers

The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
=================================

Right. This does not say that we are allowed to cast a modified integer back to a pointer.

What I remember might be from the C++ standard, where "safely derived" integers are allowed to be cast back to pointers.
Umm. This might also be implementation-defined.

Anyway, thanks very much for taking your time to respond to my questions!!

Best,
Gil
Comment 28 rguenther@suse.de 2015-05-20 11:33:35 UTC
On Wed, 20 May 2015, gil.hur at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #27 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> (In reply to Chung-Kil Hur from comment #26)
> > Thanks for the detailed explanations.
> > 
> > > The C standard only guarantees that you can convert a pointer to uintptr_t
> > > and back, it doesn't guarantee that you can convert a modified uintptr_t
> > > back to
> > > a pointer that is valid.
> > > 
> > > Thus, doing (int *)((xp + i) - j) is invoking undefined behavior.
> > > 
> > 
> > I didn't know about this rule.
> > I thought this cast is valid because "(xp+i)-j" is even "safely-derived".
> > 
> > Could you give a reference for that rule in the standard?
> > 
> > Thanks!
> 
> It seems that the following rule might be the one.
> 
> =================================
> 7.20.1.4 Integer types capable of holding object pointers
> 
> The following type designates a signed integer type with the property that any
> valid pointer to void can be converted to this type, then converted back to
> pointer to void, and the result will compare equal to the original pointer:
> intptr_t
> 
> The following type designates an unsigned integer type with the property that
> any valid pointer to void can be converted to this type, then converted back to
> pointer to void, and the result will compare equal to the original pointer:
> uintptr_t
> These types are optional.
> =================================

Yes, that's the one I remember.

> Right. This does not say that we are allowed to cast a modified integer back to
> a pointer.
> 
> What I remember might be from the C++ standard, where "safely derived" integers
> are allowed to be cast back to pointers.
> Umm. This might also be implementation-defined.

Yeah, what is "safely derived" is the question here (you might not break
the dependency chain in any (non-)obvious way).
Comment 29 Chung-Kil Hur 2015-05-22 02:12:04 UTC
Dear Richard,

This time, I think I constructed a real bug.
Please have a look and correct me if I am wrong.

=====================
#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i;

  for (i = 0; i < xp; i++) { }

  *(int*)xp = 15;

  printf("%d\n", x);
}
=====================

This program prints "15" and I do not think this raises UB.

Now I add an if-statement to the program.

=====================
#include <stdio.h>

int main() {
  int x = 0;
  uintptr_t xp = (uintptr_t) &x;
  uintptr_t i;

  for (i = 0; i < xp; i++) { }

  /*** begin ***/
  if (xp != i) {
    printf("hello\n");
    xp = i;
  }
  /*** end ***/

  *(int*)xp = 15;

  printf("%d\n", x);
}
=====================

This program just prints "0".

Since "hello" is not printed, the if-statement is not executed.
However, it prints a different result than before, which I think is a bug.
Comment 30 mail 2015-05-22 04:55:29 UTC
Hi Gil,

Nice example! I am a bit occupied lately, and thus have not read all 
comments at the bug report in detail. I will be away for the weekend, 
but will read those quickly after.

Robbert

On 05/22/2015 04:12 AM, gil.hur at sf dot snu.ac.kr wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
>
> --- Comment #29 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> Dear Richard,
>
> This time, I think I constructed a real bug.
> Please have a look and correct me if I am wrong.
>
> =====================
> #include <stdio.h>
>
> int main() {
>    int x = 0;
>    uintptr_t xp = (uintptr_t) &x;
>    uintptr_t i;
>
>    for (i = 0; i < xp; i++) { }
>
>    *(int*)xp = 15;
>
>    printf("%d\n", x);
> }
> =====================
>
> This program prints "15" and I do not think this raises UB.
>
> Now I add an if-statement to the program.
>
> =====================
> #include <stdio.h>
>
> int main() {
>    int x = 0;
>    uintptr_t xp = (uintptr_t) &x;
>    uintptr_t i;
>
>    for (i = 0; i < xp; i++) { }
>
>    /*** begin ***/
>    if (xp != i) {
>      printf("hello\n");
>      xp = i;
>    }
>    /*** end ***/
>
>    *(int*)xp = 15;
>
>    printf("%d\n", x);
> }
> =====================
>
> This program just prints "0".
>
> Since "hello" is not printed, the if-statement is not executed.
> However, it prints a different result than before, which I think is a bug.
>
Comment 31 Robbert 2015-05-22 04:57:28 UTC
[oops, that was meant to be private, please remove my last comment]
Comment 32 Richard Biener 2015-05-22 08:52:05 UTC
(In reply to Chung-Kil Hur from comment #29)
> Dear Richard,
> 
> This time, I think I constructed a real bug.
> Please have a look and correct me if I am wrong.
> 
> =====================
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i;
> 
>   for (i = 0; i < xp; i++) { }
> 
>   *(int*)xp = 15;
> 
>   printf("%d\n", x);
> }
> =====================
> 
> This program prints "15" and I do not think this raises UB.
> 
> Now I add an if-statement to the program.
> 
> =====================
> #include <stdio.h>
> 
> int main() {
>   int x = 0;
>   uintptr_t xp = (uintptr_t) &x;
>   uintptr_t i;
> 
>   for (i = 0; i < xp; i++) { }
> 
>   /*** begin ***/
>   if (xp != i) {
>     printf("hello\n");
>     xp = i;
>   }
>   /*** end ***/
> 
>   *(int*)xp = 15;
> 
>   printf("%d\n", x);
> }
> =====================
> 
> This program just prints "0".
> 
> Since "hello" is not printed, the if-statement is not executed.
> However, it prints a different result than before, which I think is a bug.

It indeed is a more unfortunate case but you are still breaking the dependency
chain in the if (xp != i) code by assigning i to xp.  The code is never
executed (which is why this is unfortunate) and I am less than 100% sure
it still invokes undefined behavior.

The unfortunate thing is that the equivalence you build on the 'else'
path (xp == i) is used by the compiler to replace xp by i on the
*(int*)xp = 15 line getting us into the very same situation as in all
other cases.  That is, we have

  if (xp != i)
...
  # xp = PHI <xp, i>
  *(int *)xp = 15;

because of the conditional and in this case our phiopt pass optimizes that
to

  # xp = PHI <i, i>

instead of the equally valid

  # xp = PHI <xp, xp>

other passes (dom) may end up doing a similar thing (at least for GCC 5 and
the particular testcase we are lucky here though), but for GCC 5
-fdisable-tree-phiopt1 -fdisable-tree-phiopt2 avoids the issue.

Generally there is no good way to determine which choice is better.

What the PTA code does is sensible btw.  For

  # xp_20 = PHI <0(2), xp_7(7)>
  xp_7 = xp_20 + 1;
  if (xp_6 > xp_7)
    goto <bb 7>;
  else
    goto <bb 4>;

  <bb 7>:
  goto <bb 3>;

the PTA constraints are

xp_6 = &x
xp_20 = &NULL
xp_20 = xp_7
xp_7 = xp_20
xp_7 = &NONLOCAL

which means PTA considers that all pointers coming from integer constants
point to global memory only (that's to support fixed address objects).
That helps to avoid false aliasing to stack objects and avoids the need
to make all locals escaped when you have code passing an integer to a
function (that integer, converted to a pointer _could_ point to a stack
slot in the caller, no?!).

So 'i' is considered to eventually point to arbitrary global memory.
But _not_ to arbitrary address-taken locals.
Comment 33 Chung-Kil Hur 2015-05-23 08:14:26 UTC
Dear Richard,

Thanks for the detailed response.

I have a suggestion for a solution of the problem, which is based on my paper to appear at PLDI 2015.

* A Formal C Memory Model Supporting Integer-Pointer Casts.
  Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, Viktor Vafeiadis.
  http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf

The suggestion is simple.
You do not need to turn off the phiopt optimizations.
We propose to slightly change the following assumption.

> PTA considers that all pointers coming from integer constants
> point to global memory only.

Here, if you change this as follows, you can solve the problem.

* All pointers coming from integer constants can point to only global memory and
  local variables whose addresses have been cast to integers.

Also, we expect that this would not decrease the optimization performance of GCC very much because those variables whose addresses have been cast to integers tend to be escaped (e.g. passed to a hash function, or stored in the memory).
Comment 34 rguenther@suse.de 2015-05-26 11:25:38 UTC
On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> Dear Richard,
> 
> Thanks for the detailed response.
> 
> I have a suggestion for a solution of the problem, which is based on my paper
> to appear at PLDI 2015.
> 
> * A Formal C Memory Model Supporting Integer-Pointer Casts.
>   Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve
> Zdancewic, Viktor Vafeiadis.
>   http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf
> 
> The suggestion is simple.
> You do not need to turn off the phiopt optimizations.
> We propose to slightly change the following assumption.
> 
> > PTA considers that all pointers coming from integer constants
> > point to global memory only.
> 
> Here, if you change this as follows, you can solve the problem.
> 
> * All pointers coming from integer constants can point to only global memory
> and
>   local variables whose addresses have been cast to integers.

Ok, so you basically add a 2nd class of "escaping".  So in GCC PTA
terms you'd add a new ESCAPE-like 'INTEGER' variable with

INTEGER = NONLOCAL

and add

INTEGER = x

constraints for each

.. = (integer-type) &x

conversion and for the reverse

 ptr = (pointer-type) i

add

ptr = INTEGER

> Also, we expect that this would not decrease the optimization performance of
> GCC very much because those variables whose addresses have been cast to
> integers tend to be escaped (e.g. passed to a hash function, or stored in the
> memory).

Well - the above basically makes _all_ pointers converted from integers
point to non-local memory, it also basically globs all pointers
converted from integers into a single equivalence class.

So I think you underestimate the effect on optimization (but I may
overestimate the effect on optimization of not simply making all
pointers converted from integers point to all globals and all
address-taken locals, aka ANYTHING in GCC PTA terms)
Comment 35 Chung-Kil Hur 2015-05-26 13:55:48 UTC
(In reply to rguenther@suse.de from comment #34)
> On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > 
> > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > Dear Richard,
> > 
> > Thanks for the detailed response.
> > 
> > I have a suggestion for a solution of the problem, which is based on my paper
> > to appear at PLDI 2015.
> > 
> > * A Formal C Memory Model Supporting Integer-Pointer Casts.
> >   Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve
> > Zdancewic, Viktor Vafeiadis.
> >   http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf
> > 
> > The suggestion is simple.
> > You do not need to turn off the phiopt optimizations.
> > We propose to slightly change the following assumption.
> > 
> > > PTA considers that all pointers coming from integer constants
> > > point to global memory only.
> > 
> > Here, if you change this as follows, you can solve the problem.
> > 
> > * All pointers coming from integer constants can point to only global memory
> > and
> >   local variables whose addresses have been cast to integers.
> 
> Ok, so you basically add a 2nd class of "escaping".  So in GCC PTA
> terms you'd add a new ESCAPE-like 'INTEGER' variable with
> 
> INTEGER = NONLOCAL
> 
> and add
> 
> INTEGER = x
> 
> constraints for each
> 
> .. = (integer-type) &x
> 
> conversion and for the reverse
> 
>  ptr = (pointer-type) i
> 
> add
> 
> ptr = INTEGER
> 
> > Also, we expect that this would not decrease the optimization performance of
> > GCC very much because those variables whose addresses have been cast to
> > integers tend to be escaped (e.g. passed to a hash function, or stored in the
> > memory).
> 
> Well - the above basically makes _all_ pointers converted from integers
> point to non-local memory, it also basically globs all pointers
> converted from integers into a single equivalence class.

Yes, this is right.

> So I think you underestimate the effect on optimization (but I may
> overestimate the effect on optimization of not simply making all
> pointers converted from integers point to all globals and all
> address-taken locals, aka ANYTHING in GCC PTA terms)

Just one minor correction:
all address-taken locals -> all address-taken-and-cast-to-integer locals

Yes, I agree. 
In order to understand the effect, we need some empirical evidence.
I am interested in this direction.
So, I wonder what benchmarks you usually use to check the effect of compiler optimizations.
More specifically, are SPEC benchmarks enough? or do you use some other benchmarks too?

Thanks!
Comment 36 rguenther@suse.de 2015-05-26 14:00:40 UTC
On Tue, 26 May 2015, gil.hur at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #35 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> (In reply to rguenther@suse.de from comment #34)
> > On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > 
> > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > > Dear Richard,
> > > 
> > > Thanks for the detailed response.
> > > 
> > > I have a suggestion for a solution of the problem, which is based on my paper
> > > to appear at PLDI 2015.
> > > 
> > > * A Formal C Memory Model Supporting Integer-Pointer Casts.
> > >   Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve
> > > Zdancewic, Viktor Vafeiadis.
> > >   http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf
> > > 
> > > The suggestion is simple.
> > > You do not need to turn off the phiopt optimizations.
> > > We propose to slightly change the following assumption.
> > > 
> > > > PTA considers that all pointers coming from integer constants
> > > > point to global memory only.
> > > 
> > > Here, if you change this as follows, you can solve the problem.
> > > 
> > > * All pointers coming from integer constants can point to only global memory
> > > and
> > >   local variables whose addresses have been cast to integers.
> > 
> > Ok, so you basically add a 2nd class of "escaping".  So in GCC PTA
> > terms you'd add a new ESCAPE-like 'INTEGER' variable with
> > 
> > INTEGER = NONLOCAL
> > 
> > and add
> > 
> > INTEGER = x
> > 
> > constraints for each
> > 
> > .. = (integer-type) &x
> > 
> > conversion and for the reverse
> > 
> >  ptr = (pointer-type) i
> > 
> > add
> > 
> > ptr = INTEGER
> > 
> > > Also, we expect that this would not decrease the optimization performance of
> > > GCC very much because those variables whose addresses have been cast to
> > > integers tend to be escaped (e.g. passed to a hash function, or stored in the
> > > memory).
> > 
> > Well - the above basically makes _all_ pointers converted from integers
> > point to non-local memory, it also basically globs all pointers
> > converted from integers into a single equivalence class.
> 
> Yes, this is right.
> 
> > So I think you underestimate the effect on optimization (but I may
> > overestimate the effect on optimization of not simply making all
> > pointers converted from integers point to all globals and all
> > address-taken locals, aka ANYTHING in GCC PTA terms)
> 
> Just one minor correction:
> all address-taken locals -> all address-taken-and-cast-to-integer locals
> 
> Yes, I agree. 
> In order to understand the effect, we need some empirical evidence.
> I am interested in this direction.
> So, I wonder what benchmarks you usually use to check the effect of compiler
> optimizations.
> More specifically, are SPEC benchmarks enough? or do you use some other
> benchmarks too?

SPEC CPU tends to capture most of this though we also periodically
check other "benchmarks" such as firefox and its few performance
tests or similar big C++ programs.
Comment 37 Chung-Kil Hur 2015-05-26 14:06:00 UTC
(In reply to rguenther@suse.de from comment #36)
> On Tue, 26 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > 
> > --- Comment #35 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > (In reply to rguenther@suse.de from comment #34)
> > > On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> > > > 
> > > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> ---
> > > > Dear Richard,
> > > > 
> > > > Thanks for the detailed response.
> > > > 
> > > > I have a suggestion for a solution of the problem, which is based on my paper
> > > > to appear at PLDI 2015.
> > > > 
> > > > * A Formal C Memory Model Supporting Integer-Pointer Casts.
> > > >   Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve
> > > > Zdancewic, Viktor Vafeiadis.
> > > >   http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf
> > > > 
> > > > The suggestion is simple.
> > > > You do not need to turn off the phiopt optimizations.
> > > > We propose to slightly change the following assumption.
> > > > 
> > > > > PTA considers that all pointers coming from integer constants
> > > > > point to global memory only.
> > > > 
> > > > Here, if you change this as follows, you can solve the problem.
> > > > 
> > > > * All pointers coming from integer constants can point to only global memory
> > > > and
> > > >   local variables whose addresses have been cast to integers.
> > > 
> > > Ok, so you basically add a 2nd class of "escaping".  So in GCC PTA
> > > terms you'd add a new ESCAPE-like 'INTEGER' variable with
> > > 
> > > INTEGER = NONLOCAL
> > > 
> > > and add
> > > 
> > > INTEGER = x
> > > 
> > > constraints for each
> > > 
> > > .. = (integer-type) &x
> > > 
> > > conversion and for the reverse
> > > 
> > >  ptr = (pointer-type) i
> > > 
> > > add
> > > 
> > > ptr = INTEGER
> > > 
> > > > Also, we expect that this would not decrease the optimization performance of
> > > > GCC very much because those variables whose addresses have been cast to
> > > > integers tend to be escaped (e.g. passed to a hash function, or stored in the
> > > > memory).
> > > 
> > > Well - the above basically makes _all_ pointers converted from integers
> > > point to non-local memory, it also basically globs all pointers
> > > converted from integers into a single equivalence class.
> > 
> > Yes, this is right.
> > 
> > > So I think you underestimate the effect on optimization (but I may
> > > overestimate the effect on optimization of not simply making all
> > > pointers converted from integers point to all globals and all
> > > address-taken locals, aka ANYTHING in GCC PTA terms)
> > 
> > Just one minor correction:
> > all address-taken locals -> all address-taken-and-cast-to-integer locals
> > 
> > Yes, I agree. 
> > In order to understand the effect, we need some empirical evidence.
> > I am interested in this direction.
> > So, I wonder what benchmarks you usually use to check the effect of compiler
> > optimizations.
> > More specifically, are SPEC benchmarks enough? or do you use some other
> > benchmarks too?
> 
> SPEC CPU tends to capture most of this though we also periodically
> check other "benchmarks" such as firefox and its few performance
> tests or similar big C++ programs.

Thanks for the information!
Comment 38 Alexander Cherepanov 2015-11-15 21:06:18 UTC
IMHO this bug is not specific to integers and boils down to this: when a check for equality ignores provenance for some reason, phiopt nevertheless will replace one variable by another with the wrong provenance.

Integers are surely compared without regard to prevenance. That's one case. Another case is a comparison of two pointers when one of the lost its provenance info. E.g. the program (somewhat based on pr61502):

  #include <stdint.h>
  #include <stdio.h>
  
  int main()
  {
    int y, x = 0;
    int *volatile v = &x;
    int *xp = v;
    int *i = &y + 1;
  
    if (xp != i) {
      printf("hello\n");
      xp = i;
    }
  
    *xp = 15;
  
    printf("%d\n", x);
  }
  
prints 0 for me with gcc 5.2.0 -O2.

The evident solution is to not apply this optimization when provenance info of the two variables differs. I guess for most integers it will be the same.

Additionally, this optimization could be applied when provenance info for the first variable is known but it's unknown for the second one. This leads to the loss of provenance info and can prevent other optimizations.

Maybe a more complex solution is possible, like tracking provenance info separately from the core value, I don't know.
Comment 39 Andrew Pinski 2015-11-15 21:24:39 UTC
(In reply to Alexander Cherepanov from comment #38)
> IMHO this bug is not specific to integers and boils down to this: when a
> check for equality ignores provenance for some reason, phiopt nevertheless
> will replace one variable by another with the wrong provenance.
> 
> Integers are surely compared without regard to prevenance. That's one case.
> Another case is a comparison of two pointers when one of the lost its
> provenance info. E.g. the program (somewhat based on pr61502):
> 
>   #include <stdint.h>
>   #include <stdio.h>
>   
>   int main()
>   {
>     int y, x = 0;
>     int *volatile v = &x;
>     int *xp = v;
>     int *i = &y + 1;
>   
>     if (xp != i) {
>       printf("hello\n");
>       xp = i;
>     }
>   
>     *xp = 15;
>   
>     printf("%d\n", x);
>   }
>   
> prints 0 for me with gcc 5.2.0 -O2.

Except the above testcase is invalid/undefined as &y + 1 is undefined if deferenced and &x and &y + 1 cannot be compared in a defined sense as both deals with two different arrays.
Comment 40 Alexander Cherepanov 2015-11-15 21:38:40 UTC
Ok, this program:

#include <stdint.h>
#include <stdio.h>

int main() {
   int y, x = 0;
   int *volatile v = &x;
   int *xp = v;
   int *i = &y + 1;

   if (xp != i) {
     printf("hello\n");
     xp = i;
   }

   printf("%d\n", xp == &x);
}

print 0 too even though it should print 1.
Comment 41 Andrew Pinski 2015-11-15 21:48:29 UTC
(In reply to Alexander Cherepanov from comment #40)
> Ok, this program:
> 
> #include <stdint.h>
> #include <stdio.h>
> 
> int main() {
>    int y, x = 0;
>    int *volatile v = &x;
>    int *xp = v;
>    int *i = &y + 1;
> 
>    if (xp != i) {
>      printf("hello\n");
>      xp = i;
>    }
> 
>    printf("%d\n", xp == &x);
> }
> 

Still undefined as &x and &y + 1 are not comparable.
Comment 42 Alexander Cherepanov 2015-11-15 21:53:41 UTC
On 2015-11-16 00:48, pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
>
> --- Comment #41 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> (In reply to Alexander Cherepanov from comment #40)
>> Ok, this program:
>>
>> #include <stdint.h>
>> #include <stdio.h>
>>
>> int main() {
>>     int y, x = 0;
>>     int *volatile v = &x;
>>     int *xp = v;
>>     int *i = &y + 1;
>>
>>     if (xp != i) {
>>       printf("hello\n");
>>       xp = i;
>>     }
>>
>>     printf("%d\n", xp == &x);
>> }

Small correction: it prints 0 and doesn't print "hello" even though it 
should print 1 without "hello" or an unspecified value with "hello".

> Still undefined as &x and &y + 1 are not comparable.

They cannot be compared with the relational operators ("<" etc.) but you 
can compare any pointers with the equality operators ("==" and "!=").
Comment 43 Jeehoon Kang 2015-11-16 10:25:48 UTC
(In reply to Alexander Cherepanov from comment #38)
> The evident solution is to not apply this optimization when provenance info
> of the two variables differs. I guess for most integers it will be the same.

IMO tracking provenance info is not a good idea, since it is really complicated.  First, since integers and pointers can be casted to each other, not only pointers but also integers should carry provenance information.  Second, tracking provenance info may work for simple examples, but it is even hard to define the provenance info itself for complex expressions.  For e.g., what is the provenance of "e1-e2"?  "2*e"?  "e1 XOR e2"?  "e1 * e2"?  (even given the provenance info for integer expressions "e*")



I would rather prefer marking pointers casted to integers as *escaped*, and forgetting about the provenance at all.  Here are several reasons why this works well:

- Standard optimizations are supported.  Say we want to support the following constant propagation example:

    char f() {
      char a = '0';
      g();      // unknown function; may guess the address of "a" and
                // try to access it (but it is always unsuccessful)

      return a; // -> return '0'
    }

  Since the address of "a" is not casted to integers, "a" is private to the function "f" (i.e., not escaped from "f"), and "g" cannot access "a".  So we know "a = 0" at the return.


- semantics is simple.  No need to track the provenance info for variables.  Once a pointer is casted to integers, it is just integers without any tracked information.

  As a result, the standard integer optimizations of our interest, as the following, are fully supported:

  if (x != y) x = y;   ->   x = y;


- Performance degradation due to "casted pointers as escaped" is insignificant.

  Morally, if a pointer is casted to an integer, the address is regarded as "global": having the integer value of the pointer means you can access the pointer.  So there will be not much optimization opportunity (or intent) for those pointers casted to integers.

  Of course, this argument should be validated by some experiment; yet I am quite convinced it is the case that the performance degradation is insignificant.



I would like to ask how you think about this suggestion.  Note that my argument here is based on my paper on this issue, where you can find the formal memory model we proposed, proofs that optimization examples are correct, and reasoning principle for proving optimizations (see the paper and the slide):

  A Formal C Memory Model Supporting Integer-Pointer Casts.
  Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, Viktor Vafeiadis.
  PLDI 2015.
  http://sf.snu.ac.kr/intptrcast/
Comment 44 rguenther@suse.de 2015-11-16 11:00:00 UTC
On Mon, 16 Nov 2015, jeehoon.kang at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jeehoon.kang at sf dot snu.ac.kr
> 
> --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> ---
> - Performance degradation due to "casted pointers as escaped" is insignificant.

I think this is not true.  For example with MatLab (might be sth else,
if I don't remember correctly) you are required to pass pointers to
arrays in two halves in double(!) values (I believe the only function
argument type they support).  GCC happily makes points-to analysis work 
through those.

The other (unfortunate) thing is that in GCC pointer subtraction
is always performed on integers, thus for the C source code

 int idx = ptr1 - ptr2;

we internally have sth like

 int idx = ((long)ptr1 - (long)ptr2) / 4;

so you can't really treat pointers as "escaped" here without loss.

Note that we've had this (kind of) in the past and it tried to go
without making pointers escaped at these points but only consider
casts from integers to pointers as pointing to anything.  But
that's wrong correctness wise (not then treating the cast to integer
as escape point).

I also don't think doing the above would solve the cases of equality
compares of pointes themselves.  (hopefully undefined)

I added the current handling of pointers vs. integers for a
missed-optimization bug that said a hand-written memcpy loop
didn't properly transfer points-to info (properly as in
optimially for optimization).  GCC can now do that ;)
Comment 45 Jeehoon Kang 2015-11-16 12:16:21 UTC
> I think this is not true.  For example with MatLab (might be sth else,
> if I don't remember correctly) you are required to pass pointers to
> arrays in two halves in double(!) values (I believe the only function
> argument type they support).  GCC happily makes points-to analysis work 
> through those.

Thank you for giving me an example.  Yet, I think it is a little bit unfortunate for MatLab (or sth else) to pass pointers by packing two into a double, at least due to the readability problem.  I think it is beyond the intended usage of the C/C++ language, but I understand that GCC is the time-honored compiler for various software and systems.



> The other (unfortunate) thing is that in GCC pointer subtraction
> is always performed on integers, thus for the C source code
> 
>  int idx = ptr1 - ptr2;
> 
> we internally have sth like
> 
>  int idx = ((long)ptr1 - (long)ptr2) / 4;
> 
> so you can't really treat pointers as "escaped" here without loss.

Thank you for giving me the information.  I don't know the GCC internals, so I would like to ask how much it would cost to introduce the syntax for pointer subtractions.  I hope it is not that huge, but I really don't have any idea.



> Note that we've had this (kind of) in the past and it tried to go
> without making pointers escaped at these points but only consider
> casts from integers to pointers as pointing to anything.  But
> that's wrong correctness wise (not then treating the cast to integer
> as escape point).

Treating the cast to integer as escape point is proven-correct by a machine-checked proof (in Coq) for various standard optimization examples, such as CP, DCE, dead allocation elimination, etc.  For more detail, please see the paper above-mentioned.



> I also don't think doing the above would solve the cases of equality
> compares of pointes themselves.  (hopefully undefined)

The formal memory model in the paper I mentioned invokes undefined behavior for the pointer equality comparison example above.  In the formal model, a pointer is represented as a pair of a memory block identifier (l) and an offset (o).  (cf. the CompCert memory model)  When a memory is malloc-ed or alloca-ed, a new memory block identifier is assigned.  A pointer equality, say of (l, o) and (l', o'), invokes undefined behavior when l != l'.

So for the following example (by Alexander Cherepanov):

    #include <stdint.h>
    #include <stdio.h>

    int main() {
       int y, x = 0;
       int *volatile v = &x;
       int *xp = v;
       int *i = &y + 1;

       if (xp != i) {
         printf("hello\n");
         xp = i;
       }

       printf("%d\n", xp == &x);
    }

Say y and x are allocated at l1 and l2, respectively.  Then xp = (l2, 0), and i = (l1, 4).  Thus comparing xp and i invokes undefined behavior, since l1 != l2.
Comment 46 rguenther@suse.de 2015-11-16 12:51:11 UTC
On Mon, 16 Nov 2015, jeehoon.kang at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #45 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> ---
> > I think this is not true.  For example with MatLab (might be sth else,
> > if I don't remember correctly) you are required to pass pointers to
> > arrays in two halves in double(!) values (I believe the only function
> > argument type they support).  GCC happily makes points-to analysis work 
> > through those.
> 
> Thank you for giving me an example.  Yet, I think it is a little bit
> unfortunate for MatLab (or sth else) to pass pointers by packing two into a
> double, at least due to the readability problem.  I think it is beyond the
> intended usage of the C/C++ language, but I understand that GCC is the
> time-honored compiler for various software and systems.
> 
> 
> 
> > The other (unfortunate) thing is that in GCC pointer subtraction
> > is always performed on integers, thus for the C source code
> > 
> >  int idx = ptr1 - ptr2;
> > 
> > we internally have sth like
> > 
> >  int idx = ((long)ptr1 - (long)ptr2) / 4;
> > 
> > so you can't really treat pointers as "escaped" here without loss.
> 
> Thank you for giving me the information.  I don't know the GCC internals, so I
> would like to ask how much it would cost to introduce the syntax for pointer
> subtractions.  I hope it is not that huge, but I really don't have any idea.

It would be quite some (mechanical) work but otherwise not too difficult.
There is the choice whether to embed the division implicitely here or 
not.

> > Note that we've had this (kind of) in the past and it tried to go
> > without making pointers escaped at these points but only consider
> > casts from integers to pointers as pointing to anything.  But
> > that's wrong correctness wise (not then treating the cast to integer
> > as escape point).
> 
> Treating the cast to integer as escape point is proven-correct by a
> machine-checked proof (in Coq) for various standard optimization examples, such
> as CP, DCE, dead allocation elimination, etc.  For more detail, please see the
> paper above-mentioned.

Yes, I agree that making this an escape point is enough.
Comment 47 Alexander Cherepanov 2015-11-16 19:49:08 UTC
On 2015-11-16 14:00, rguenther at suse dot de wrote:
>> --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> ---
>> - Performance degradation due to "casted pointers as escaped" is insignificant.
>
> I think this is not true.  For example with MatLab (might be sth else,
> if I don't remember correctly) you are required to pass pointers to
> arrays in two halves in double(!) values (I believe the only function
> argument type they support).  GCC happily makes points-to analysis work
> through those.

But this is invalid C. First, it breaks strict aliasing rules. Second, 
the representations of these doubles are free to change at any time 
given their values are kept intact (e.g. change one NaN to another). 
That is, unrelated improvements in other optimizations in gcc will break 
all of this in the future, right?

> I also don't think doing the above would solve the cases of equality
> compares of pointes themselves.  (hopefully undefined)

Hm, undefining comparisons between a pointer pointing past the end of an 
object and a pointer to another object could be a way forward. You 
propose to undefine it for any objects (similar to "<") or only when 
they are not parts (including recursively) of the same aggregate?

In any case this will fix some theoretical problems, e.g. transitivity 
of == as described in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c17 .

OTOH, again in any case, it will not solve all problems. For instance, 
if you want to track bounds of arrays in runtime you still can get equal 
pointers with different additional info -- pointers to an object and a 
subobject. Granted, this is kinda hypothetical (IIUC UBSAN works a bit 
different right now).

> I added the current handling of pointers vs. integers for a
> missed-optimization bug that said a hand-written memcpy loop
> didn't properly transfer points-to info (properly as in
> optimially for optimization).  GCC can now do that ;)

Nice! Does gcc properly transfer effective type info too, over a 
hand-written memcpy loop? Just curious.

On 2015-11-16 15:51, rguenther at suse dot de wrote:
 >> Thank you for giving me the information.  I don't know the GCC 
internals, so I
 >> would like to ask how much it would cost to introduce the syntax for 
pointer
 >> subtractions.  I hope it is not that huge, but I really don't have 
any idea.
 >
 > It would be quite some (mechanical) work but otherwise not too difficult.
 > There is the choice whether to embed the division implicitely here or
 > not.

If you choose to fix it please fix pr45779 on the way (see pr67999 for a 
context).
Comment 48 rguenther@suse.de 2015-11-17 09:10:46 UTC
On Mon, 16 Nov 2015, ch3root at openwall dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #47 from Alexander Cherepanov <ch3root at openwall dot com> ---
> On 2015-11-16 14:00, rguenther at suse dot de wrote:
> >> --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> ---
> >> - Performance degradation due to "casted pointers as escaped" is insignificant.
> >
> > I think this is not true.  For example with MatLab (might be sth else,
> > if I don't remember correctly) you are required to pass pointers to
> > arrays in two halves in double(!) values (I believe the only function
> > argument type they support).  GCC happily makes points-to analysis work
> > through those.
> 
> But this is invalid C. First, it breaks strict aliasing rules. Second, 
> the representations of these doubles are free to change at any time 
> given their values are kept intact (e.g. change one NaN to another). 
> That is, unrelated improvements in other optimizations in gcc will break 
> all of this in the future, right?

You misunderstood, they marshall a pointer 'T *p' like

  unsigned int high = ((uintptr_t)p) >> 32;
  unsigned int low = ((uintptr_t)p) & (1 << 32 - 1);
  foo ((double) high, (double) low);
 
and in foo then do

foo (double hdouble, double ldouble)
{ 
  unsigned int high = (unsigned int) hdouble;
  unsigned int low = (unsigned int) ldouble;
  T *p = (T *)(((uintptr_t)high << 32) | (uintptr_t)low);

the important part here is to recognize that frobbing in points-to
analysis so you still see what 'p' points to in foo.

> > I added the current handling of pointers vs. integers for a
> > missed-optimization bug that said a hand-written memcpy loop
> > didn't properly transfer points-to info (properly as in
> > optimially for optimization).  GCC can now do that ;)
> 
> Nice! Does gcc properly transfer effective type info too, over a 
> hand-written memcpy loop? Just curious.

No, GCC doesn't have any effective type analysis / propagation.  It
only has the traditional type-based disambiguations of accesses
using the access type.

> On 2015-11-16 15:51, rguenther at suse dot de wrote:
>  >> Thank you for giving me the information.  I don't know the GCC 
> internals, so I
>  >> would like to ask how much it would cost to introduce the syntax for 
> pointer
>  >> subtractions.  I hope it is not that huge, but I really don't have 
> any idea.
>  >
>  > It would be quite some (mechanical) work but otherwise not too difficult.
>  > There is the choice whether to embed the division implicitely here or
>  > not.
> 
> If you choose to fix it please fix pr45779 on the way (see pr67999 for a 
> context).
Comment 49 Richard Biener 2016-10-19 13:47:23 UTC
Related testcase from PR61502:

#include <stdio.h>

int main()
{
   int x, y = 1;
   int *volatile v;
   int *p;

   v = &y;
   p = v;
   if (p == &x + 1) {
     *p = 2;
     printf("y = %d\n", y);
   }
}

which shows how propagating conditional equivalences (&x+1 into *p = 2) breaks
alias analysis.
Comment 50 Jeffrey A. Law 2016-10-19 14:49:15 UTC
Richi,
I haven't followed this BZ at all, but I absolutely trust you on issues WRT alias analysis.  If we can't propagate these conditional equivalences for pointers, I'll happily tweak DOM to avoid that.
Comment 51 Richard Biener 2016-10-20 08:21:41 UTC
(In reply to Jeffrey A. Law from comment #50)
> Richi,
> I haven't followed this BZ at all, but I absolutely trust you on issues WRT
> alias analysis.  If we can't propagate these conditional equivalences for
> pointers, I'll happily tweak DOM to avoid that.

Unfortunately it isn't that easy - even propagating equivalences for integers
may cause the same issue.  And even if we fix all issues on GIMPLE we're
still left with the fundamental brokeness of RTL alias analysis (PR49330).
Comment 52 Richard Biener 2016-10-20 08:40:25 UTC
Testcase with integers involving propagation that still "works" on trunk:

#include <stdio.h>

int main()
{
  int x, y = 1;
  int *volatile v;
  int *p;

  v = &y;
  p = v;
  unsigned long k = (unsigned long)(&x + 1);
  unsigned long pi = (unsigned long)p;
  if (pi == k) {
      pi+=4;
      p = (int *)pi;
      *(p-1) = 2;
      printf("y = %d\n", y);
  }
}

it needs enough obfuscation (before the equivalency propagation which has
to happen before another PTA pass happens).  Either via IPA inlining
if we'd ever propagate such equivalences before inlining or as above via offsetting.

Here we replace pi with k in pi = pi + 4; which makes PTA consider pi
to point to x.  The propagation essentially introduces undefined behavior.

You can expose the same issue by piecewise decomposing the pointer to
chars, and having them equivalency propagated in a bogus way, then
reconstruct the pointer from the chars.  So it's not enough to disable
pointer and uintptr_t propagations either.

It's not enough to put points-to information in the dereference site
(which would fix some related issues) as this issue appears as part
of PTA analysis itself (it doesn't consider an equivalency relation
to form a dependency, see the discussion in this PR).
Comment 53 Jeffrey A. Law 2016-10-21 18:02:06 UTC
And presumably walking forward/backward from the equivalency point to determine if an object is derived from or is used to derive a pointer is insufficient because the equivalency point and casting to/from pointer types might be in different functions?

Is this the final straw and do we just stop recording these conditional equivalences when both sides are SSA_NAMEs and declare the fallout unavoidable?

And note I believe we do similar stuff in the RTL optimizers too.
Comment 54 rguenther@suse.de 2016-10-24 07:44:52 UTC
On Fri, 21 Oct 2016, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752
> 
> --- Comment #53 from Jeffrey A. Law <law at redhat dot com> ---
> And presumably walking forward/backward from the equivalency point to determine
> if an object is derived from or is used to derive a pointer is insufficient
> because the equivalency point and casting to/from pointer types might be in
> different functions?

Yes.

> Is this the final straw and do we just stop recording these conditional 
> equivalences when both sides are SSA_NAMEs and declare the fallout 
> unavoidable?

I haven't made up my mind on a possible good solution here.  But yes,
it would mean to not propagate any equivalences derived from conditionals,
not even symbolic constants such as &a btw.

> And note I believe we do similar stuff in the RTL optimizers too.

RTL has it's own share of issues, see the linked PR about its
very optimistic treating of "base value"s.
Comment 55 Richard Biener 2018-01-08 08:30:59 UTC
*** Bug 82177 has been marked as a duplicate of this bug. ***
Comment 56 Richard Biener 2018-01-08 08:31:30 UTC
Testcase from PR82177:

#include <stdio.h>
#include <stdint.h>

void f(int*, int*);

int main()
{
  int a=0, y[1], x = 0;
  uintptr_t pi = (uintptr_t) &x;
  uintptr_t yi = (uintptr_t) (y+1);
  uintptr_t n = pi != yi;

  if (n) {
    a = 100;
    pi = yi;
  }

  if (n) {
    a = 100;
    pi = (uintptr_t) y;
  }

  *(int *)pi = 15;

  printf("a=%d x=%d\n", a, x);

  f(&x,y);

  return 0;
}