The following program prints "0" instead of the expected "15": #include <stdio.h> #include <stdint.h> #include <limits.h> int main() { int x = 0, *p = 0; for (uintptr_t i = 0; ; i++) { if (i == (uintptr_t)&x) { p = (int*)i; break; } } *p = 15; printf("%d\n", x); } gcc -O2 makes too strict assumptions about non-aliasing here: it removes the loop entirely (which is perfectly fine), but then assumes that the pointers p and &x are unrelated. NB 1: I do not think that DR #260 applies here NB 2: When compiled with clang, it also optimizes out the loop, but it does print the expected "15"
On Mon, 13 Apr 2015, gcc at robbertkrebbers dot nl wrote: > NB 1: I do not think that DR #260 applies here Why not? It seems clear enough that optimizations based on pointer origins are intended to be allowed; that it's not allowed to produce a pointer based on another pointer out of thin air, because the optimizations are more useful in practice than code such as this.
This example may seem academic, but there is a real problem underneath. Of course, I do agree that optimizations based on pointer origins are useful, but it is not an "all or nothing situation". As long as representations of pointers are kept opaque (i.e. the pointer has not been cast to an integer and the bit representation has not been inspected), I cannot think of any objection against such optimizations. They cannot affect the behavior of the code in any obvervable way. However, in case the representation of the pointer is made transparent, the programmer generally has a good reason for doing so. In such cases the compiler should not perform unexpected optimizations. Typical examples are: * Pointer tagging (using some bits of pointers representations to store additional information, for example for pointer hardening techniques or garbage collection). * Using of pointer representations as keys in hash tables. * Writing the representation of a chunk of memory containing pointers to memory.
(In reply to Robbert from comment #2) > * Writing the representation of a chunk of memory containing pointers to > memory. "to memory" should be "to disk"
Optimization reasons as follows. points-to analysis considers even integers as possible pointer (parts) and thus starts with i pointing to the NULL object. Incrementing that get's you to any _global_ object (but not automatic vars as those do not need to have a fixed location). Thus p ends up as p_8 = { NULL NONLOCAL } and dereferencing that aliases with any global but not x as that is an automatic var. I think the compiler is free to re-locate x right before *p = 15; and load x from a different location. Computing the value (uintptr_t)&x isn't preventing that as the address is not "live" after that.
(In reply to Richard Biener from comment #4) > as the address is not "live" after that. Why doesn't gcc consider casting a pointer to an integer as making it "live"? The concrete representation of address has escaped, which means anything can happen to the storage it points to.
(In reply to Robbert from comment #5) > (In reply to Richard Biener from comment #4) > > as the address is not "live" after that. > Why doesn't gcc consider casting a pointer to an integer as making it > "live"? The concrete representation of address has escaped, which means > anything can happen to the storage it points to. Because it isn't stored anywhere, you only performed the implicit no-op assignment to i via the comparison. PTA could certainly be told to add an effective i = (uintptr_t)&x constraint when seeing an equality compare. I'm not sure how pessimizing that would be to code (and the validity of the testcase is disputed). Sth like Index: gcc/tree-ssa-structalias.c =================================================================== --- gcc/tree-ssa-structalias.c (revision 222076) +++ gcc/tree-ssa-structalias.c (working copy) @@ -4771,6 +4771,19 @@ find_func_aliases (struct function *fn, || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop))) make_escape_constraint (rhsop); } + else if (gimple_code (t) == GIMPLE_COND) + { + /* On one path both EQ and NE perform an equality relation + and thus code may consider pointer equivalence. */ + if (gimple_cond_code (t) == EQ_EXPR + || gimple_cond_code (t) == NE_EXPR) + { + get_constraint_for (gimple_cond_lhs (t), &lhsc); + get_constraint_for (gimple_cond_rhs (t), &rhsc); + process_all_all_constraints (lhsc, rhsc); + process_all_all_constraints (rhsc, lhsc); + } + } /* Handle escapes through return. */ else if (gimple_code (t) == GIMPLE_RETURN && gimple_return_retval (as_a <greturn *> (t)) != NULL_TREE) which fixes the testcase (but is incomplete - equivalences built on gimple assignment RHS need to be considered as well). It's hard to "conservatively" catch all implicit equivalences (inline asms, function calls), so I don't think that is a workable solution. A conservative solution would basically mean to disable most PTA in practice.
Other testcase: int main() { int *p; int x = 0; if (p == &x) *p = 1; return x; } we optimize this to return 0. You probably wouldn't consider that invalid I guess?
(In reply to Richard Biener from comment #7) > Other testcase: > > int main() > { > int *p; > int x = 0; > if (p == &x) > *p = 1; > return x; > } > > we optimize this to return 0. You probably wouldn't consider that invalid I > guess? This is undoubtedly undefined behavior, an indeterminate pointer is used in a comparison ==. But even in the more controversial int main() { int x = 0, y, *p = &y + 1; if (p == &x) { printf("compared succeeded"); *p = 1; } return x; } I am fine with any of the following behaviors: * Nothing is printed, 0 is returned * "compared succeeded" is printed, 0 is returned * "compared succeeded" is printed, 1 is returned And anything else because it has undefined behavior. Here a pointer p whose origin/provenance does not involve its representation is used out of its bounds. The point is that when performing a pointer-to-int cast or when fiddling with object representations of pointers, the representation of the pointer has become transparent and *abstraction is broken* (generally deliberately!). One could have gathered information to reconstruct the pointer in some other context and then the compiler should not perform unexpected formalizations.
(In reply to Richard Biener from comment #6) > which fixes the testcase (but is incomplete - equivalences built on > gimple assignment RHS need to be considered as well). Can you give an example?
(In reply to Robbert from comment #9) > (In reply to Richard Biener from comment #6) > > which fixes the testcase (but is incomplete - equivalences built on > > gimple assignment RHS need to be considered as well). > Can you give an example? just do _Bool cmp = (i == (uintptr_t)&x); if (cmp) { .... } in your original testcase (ok, optimization will probably make them equivalent, but you can't rely on that one). Similar for function calls if (equal (i, (uintptr_t)&x))) { ... } with the obvious implementation of equal. Or global = (uintptr_t)&x; if (equal (i)) { .... } and see how this will make PTA useless (all pointers passed to a function whose result might be used in a way to take advantage of an equality relation need to be considered pointing to anything). [and then thorougly specify "take advantage of an equality relation"]
Looks more like a tree-optimization problem rather than C FE one.
(In reply to Richard Biener from comment #10) > and see how this will make PTA useless (all pointers passed to a function > whose result might be used in a way to take advantage of an equality relation > need to be considered pointing to anything). [and then thorougly specify > "take advantage of an equality relation"] That is undesired indeed. Only in case a pointer has been obtained via a construction that breaks abstraction (for example, if it has been obtained via an int -> pointer casts, or by poking bytes somewhere and then reinterpreting these as a pointer) convervative PTA assumptions should be made.
Hi, I have the following modified code. #include <stdio.h> #include <stdint.h> #include <limits.h> int main() { int x = 0, *p = 0; uintptr_t i; uintptr_t j = (uintptr_t) &x; uintptr_t k = j+j; uintptr_t l = 2*j - j - j; for (i = j+j-k+l; ; i++) { if (i == (uintptr_t)&x) { p = (int*)i; break; } } *p = 15; printf("%d\n", x); } This example still prints out "0" instead of "15". In this example, it seems that the integer "j+j-k+l" has no provenance. It is unclear to me how the provenance is calculated. Is there any concrete rule for calculating provenance?
(In reply to Chung-Kil Hur from comment #13) > Hi, I have the following modified code. > > #include <stdio.h> > #include <stdint.h> > #include <limits.h> > > int main() { > int x = 0, *p = 0; > uintptr_t i; > uintptr_t j = (uintptr_t) &x; > uintptr_t k = j+j; > uintptr_t l = 2*j - j - j; > for (i = j+j-k+l; ; i++) { > if (i == (uintptr_t)&x) { p = (int*)i; break; } > } > *p = 15; > > printf("%d\n", x); > } > > This example still prints out "0" instead of "15". > In this example, it seems that the integer "j+j-k+l" has no provenance. > It is unclear to me how the provenance is calculated. > Is there any concrete rule for calculating provenance? early PTA computes p_13, points-to non-local, points-to vars: { D.2349 } p_13 = (intD.6 *) i_1; *p_13 = 15; x.1_15 = xD.2349; while late PTA has an IL with just the equivalency (the rest is optimized away) p_6, points-to non-local, points-to NULL, points-to vars: { } j_4 = (uintptr_t) &x; <bb 3>: # i_1 = PHI <0(2), i_5(5)> if (i_1 == j_4) goto <bb 4>; else goto <bb 5>; <bb 4>: p_6 = (int *) i_1; *p_6 = 15; x.1_8 = x; so it hits essentially the same issue (the testcase is equivalent to the original one).
Hi Richard, Thanks for the explanation. But, what I wonder was how to justify such an optimization, rather than how it works. I have a better example. This might be a real bug of GCC. #include <stdio.h> int main() { int x = 0; uintptr_t pi = (uintptr_t) &x; uintptr_t i, j; for (i = 0; i < pi; i++) { } j = i; /* Note that the following "if" statement is never executed because j == pi. */ if (j != pi) { j = pi; } *(int*)((pi+i)-j) = 15; printf("%d\n", x); } This program prints out "0" instead of "15". Here, "pi" contains the address of the variable x; and "i" and "j" contain the same integer. So, it seems that "(pi+i)-j" should have a proper provenance of "x" and thus the variable "x" should be updated to 15. However, GCC seems to think that "(pi+i)-j" has no provenance. So, as a programmer, I wonder how I should calculate the provenance of an integer in order to see whether casting it to a pointer is valid or not. Thanks.
(In reply to Chung-Kil Hur from comment #15) > Hi Richard, > > Thanks for the explanation. > But, what I wonder was how to justify such an optimization, rather than how > it works. > > I have a better example. This might be a real bug of GCC. > > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t pi = (uintptr_t) &x; > uintptr_t i, j; > > for (i = 0; i < pi; i++) { } > j = i; > /* Note that the following "if" statement is never executed because j == > pi. */ Wrong, j == i != pi. > if (j != pi) { > j = pi; > } > > *(int*)((pi+i)-j) = 15; > > printf("%d\n", x); > } > > This program prints out "0" instead of "15". > Here, "pi" contains the address of the variable x; and "i" and "j" contain > the same integer. > So, it seems that "(pi+i)-j" should have a proper provenance of "x" and thus > the variable "x" should be updated to 15. > However, GCC seems to think that "(pi+i)-j" has no provenance. > > So, as a programmer, I wonder how I should calculate the provenance of an > integer in order to see whether casting it to a pointer is valid or not. > > Thanks.
Hi Richard, I modified the example further. #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i, j; for (i = 0; i < xp; i++) { } j = i; /* The following "if" statement is never executed because j == xp */ if (j != xp) { printf("hello\n"); j = xp; } *(int*)((xp+i)-j) = 15; printf("%d\n", x); } The above example does not print "hello", so i can assume that "j = xp" is not executed. However, the program prints "0" instead of "15". Can you explain this?
On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > Hi Richard, > > I modified the example further. > > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i, j; > > for (i = 0; i < xp; i++) { } > j = i; > /* The following "if" statement is never executed because j == xp */ > if (j != xp) { > printf("hello\n"); > j = xp; > } Here j is always xp and thus ... > *(int*)((xp+i)-j) = 15; ... this can (and is) simplified to *(int *)i = 15; making it the same testcase again. > printf("%d\n", x); > } > > The above example does not print "hello", so i can assume that "j = xp" is not > executed. > However, the program prints "0" instead of "15". > Can you explain this?
(In reply to rguenther@suse.de from comment #18) > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > Hi Richard, > > > > I modified the example further. > > > > #include <stdio.h> > > > > int main() { > > int x = 0; > > uintptr_t xp = (uintptr_t) &x; > > uintptr_t i, j; > > > > for (i = 0; i < xp; i++) { } > > j = i; > > /* The following "if" statement is never executed because j == xp */ > > if (j != xp) { > > printf("hello\n"); > > j = xp; > > } > > Here j is always xp and thus ... > Why is "j" always "xp"? Since "hello" is not printed, "j = xp;" is not executed. Is there some special semantics of C? If so, please let me know a reference. Thanks!
(In reply to Chung-Kil Hur from comment #19) > (In reply to rguenther@suse.de from comment #18) > > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > > Hi Richard, > > > > > > I modified the example further. > > > > > > #include <stdio.h> > > > > > > int main() { > > > int x = 0; > > > uintptr_t xp = (uintptr_t) &x; > > > uintptr_t i, j; > > > > > > for (i = 0; i < xp; i++) { } > > > j = i; > > > /* The following "if" statement is never executed because j == xp */ > > > if (j != xp) { > > > printf("hello\n"); > > > j = xp; > > > } > > > > Here j is always xp and thus ... > > > > Why is "j" always "xp"? > Since "hello" is not printed, "j = xp;" is not executed. Because that "if (j != xp)" guarantees it.
(In reply to Marek Polacek from comment #20) > (In reply to Chung-Kil Hur from comment #19) > > (In reply to rguenther@suse.de from comment #18) > > > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > > > Hi Richard, > > > > > > > > I modified the example further. > > > > > > > > #include <stdio.h> > > > > > > > > int main() { > > > > int x = 0; > > > > uintptr_t xp = (uintptr_t) &x; > > > > uintptr_t i, j; > > > > > > > > for (i = 0; i < xp; i++) { } > > > > j = i; > > > > /* The following "if" statement is never executed because j == xp */ > > > > if (j != xp) { > > > > printf("hello\n"); > > > > j = xp; > > > > } > > > > > > Here j is always xp and thus ... > > > > > > > Why is "j" always "xp"? > > Since "hello" is not printed, "j = xp;" is not executed. > > Because that "if (j != xp)" guarantees it. OK. here is another modification. #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i, j; for (i = 0; i < xp; i++) { } j = i; *(int*)j = 15; /* The following "if" statement is never executed because j == xp */ if (j != xp) { printf("hello\n"); j = xp; } *(int*)((xp+i)-j) = 15; printf("%d\n", x); } This program just prints "0". So we know that "*(int*)j = 15;" is not executed and thus "j == xp" is not true. Then, can the following statement change "j" even if the printf is not executed? if (j != xp) { printf("hello\n"); j = xp; } If not, how can "j == xp" suddenly hold?
(In reply to Chung-Kil Hur from comment #21) > (In reply to Marek Polacek from comment #20) > > (In reply to Chung-Kil Hur from comment #19) > > > (In reply to rguenther@suse.de from comment #18) > > > > On Tue, 19 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > > > > > --- Comment #17 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > > > > Hi Richard, > > > > > > > > > > I modified the example further. > > > > > > > > > > #include <stdio.h> > > > > > > > > > > int main() { > > > > > int x = 0; > > > > > uintptr_t xp = (uintptr_t) &x; > > > > > uintptr_t i, j; > > > > > > > > > > for (i = 0; i < xp; i++) { } > > > > > j = i; > > > > > /* The following "if" statement is never executed because j == xp */ > > > > > if (j != xp) { > > > > > printf("hello\n"); > > > > > j = xp; > > > > > } > > > > > > > > Here j is always xp and thus ... > > > > > > > > > > Why is "j" always "xp"? > > > Since "hello" is not printed, "j = xp;" is not executed. > > > > Because that "if (j != xp)" guarantees it. > > OK. here is another modification. > > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i, j; > > for (i = 0; i < xp; i++) { } > j = i; > > *(int*)j = 15; > > /* The following "if" statement is never executed because j == xp */ > if (j != xp) { > printf("hello\n"); > j = xp; > } > > *(int*)((xp+i)-j) = 15; > > printf("%d\n", x); > } > > This program just prints "0". > > So we know that "*(int*)j = 15;" is not executed and thus "j == xp" is not > true. > > Then, can the following statement change "j" even if the printf is not > executed? > > if (j != xp) { > printf("hello\n"); > j = xp; > } > > If not, how can "j == xp" suddenly hold? One more thing. If you remove the if-statement, then it prints "15" with GCC -O2. Since "hello" is not printed, I think the if-statement is the same as no-op. Thus, removing the if-statement should not change the behavior of the program according to ISO C11. But, they print different values. Can you explain this?
"gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes: > Since "hello" is not printed, I think the if-statement is the same as no-op. > Thus, removing the if-statement should not change the behavior of the program > according to ISO C11. Unless you are invoking undefined behaviour. Andreas.
(In reply to schwab from comment #23) > "gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes: > > > Since "hello" is not printed, I think the if-statement is the same as no-op. > > Thus, removing the if-statement should not change the behavior of the program > > according to ISO C11. > > Unless you are invoking undefined behaviour. > > Andreas. ============================== #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i, j; for (i = 0; i < xp; i++) { } j = i; *(int*)((xp+i)-j) = 15; printf("%d\n", x); } ============================= This prints "15". And I do not think there is any UB. Please correct me if I am wrong. Then, I add the if-statement. ============================== #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i, j; for (i = 0; i < xp; i++) { } j = i; /****** begin *******/ if (j != xp) { printf("hello\n"); j = xp; } /****** end *********/ *(int*)((xp+i)-j) = 15; printf("%d\n", x); } ============================= This prints "0" without printing "hello". Thus, this raises no UB unless "j != xp" raises UB. If you think "j != xp" raises UB, please explain why and give some reference. Otherwise, I think it is a bug of GCC.
(In reply to Chung-Kil Hur from comment #24) > (In reply to schwab from comment #23) > > "gil.hur at sf dot snu.ac.kr" <gcc-bugzilla@gcc.gnu.org> writes: > > > > > Since "hello" is not printed, I think the if-statement is the same as no-op. > > > Thus, removing the if-statement should not change the behavior of the program > > > according to ISO C11. > > > > Unless you are invoking undefined behaviour. > > > > Andreas. > > ============================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i, j; > > for (i = 0; i < xp; i++) { } > j = i; > > *(int*)((xp+i)-j) = 15; > > printf("%d\n", x); > } > ============================= > > This prints "15". > And I do not think there is any UB. > Please correct me if I am wrong. > > Then, I add the if-statement. > > ============================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i, j; > > for (i = 0; i < xp; i++) { } > j = i; > > /****** begin *******/ > if (j != xp) { > printf("hello\n"); > j = xp; > } > /****** end *********/ > > *(int*)((xp+i)-j) = 15; > > printf("%d\n", x); > } > ============================= > > This prints "0" without printing "hello". > > Thus, this raises no UB unless "j != xp" raises UB. > > If you think "j != xp" raises UB, please explain why and give some reference. > > Otherwise, I think it is a bug of GCC. The C standard only guarantees that you can convert a pointer to uintptr_t and back, it doesn't guarantee that you can convert a modified uintptr_t back to a pointer that is valid. Thus, doing (int *)((xp + i) - j) is invoking undefined behavior. That you see an effect of this undefined behavior just with the added if is because that if confuses early GCC optimizations which would have cancelled i - j to zero, retaining (int *)xp. Instead it enables later optimization to see that xp - j cancels and thus it is left with (int *)i. This is because you are essentially computing (xp + xp) - xp == xp but dependent on what two pieces get cancelled the pointer is based on either xp (ok) or on i (not ok - that is related to xp only via an implicit equivalency). The net result is that I can't see how to "detect" this kind of situation in points-to analysis in a way that does not pessimize all pointer-to-integer / integer-to-pointer conversions. In theory it would be possible to add a flag similar to -fno-strict-aliasing to do this pessimization (but there is already -fno-tree-pta which avoids the issue as well). So in the end my conclusion is that either the testcase invokes undefined behavior or the C standard has a defect. Thus the bug is WONTFIX unless somebody can come up with a way to handle these kind of equivalences in the points-to algorithm in GCC in a way not pessimizing everything. One might consider an incomplete approach like that in comment #6 but I am not convinced about this hack (and one would need to evaluate its effects on code generation).
Thanks for the detailed explanations. > The C standard only guarantees that you can convert a pointer to uintptr_t > and back, it doesn't guarantee that you can convert a modified uintptr_t > back to > a pointer that is valid. > > Thus, doing (int *)((xp + i) - j) is invoking undefined behavior. > I didn't know about this rule. I thought this cast is valid because "(xp+i)-j" is even "safely-derived". Could you give a reference for that rule in the standard? Thanks!
(In reply to Chung-Kil Hur from comment #26) > Thanks for the detailed explanations. > > > The C standard only guarantees that you can convert a pointer to uintptr_t > > and back, it doesn't guarantee that you can convert a modified uintptr_t > > back to > > a pointer that is valid. > > > > Thus, doing (int *)((xp + i) - j) is invoking undefined behavior. > > > > I didn't know about this rule. > I thought this cast is valid because "(xp+i)-j" is even "safely-derived". > > Could you give a reference for that rule in the standard? > > Thanks! It seems that the following rule might be the one. ================================= 7.20.1.4 Integer types capable of holding object pointers The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer: intptr_t The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer: uintptr_t These types are optional. ================================= Right. This does not say that we are allowed to cast a modified integer back to a pointer. What I remember might be from the C++ standard, where "safely derived" integers are allowed to be cast back to pointers. Umm. This might also be implementation-defined. Anyway, thanks very much for taking your time to respond to my questions!! Best, Gil
On Wed, 20 May 2015, gil.hur at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #27 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > (In reply to Chung-Kil Hur from comment #26) > > Thanks for the detailed explanations. > > > > > The C standard only guarantees that you can convert a pointer to uintptr_t > > > and back, it doesn't guarantee that you can convert a modified uintptr_t > > > back to > > > a pointer that is valid. > > > > > > Thus, doing (int *)((xp + i) - j) is invoking undefined behavior. > > > > > > > I didn't know about this rule. > > I thought this cast is valid because "(xp+i)-j" is even "safely-derived". > > > > Could you give a reference for that rule in the standard? > > > > Thanks! > > It seems that the following rule might be the one. > > ================================= > 7.20.1.4 Integer types capable of holding object pointers > > The following type designates a signed integer type with the property that any > valid pointer to void can be converted to this type, then converted back to > pointer to void, and the result will compare equal to the original pointer: > intptr_t > > The following type designates an unsigned integer type with the property that > any valid pointer to void can be converted to this type, then converted back to > pointer to void, and the result will compare equal to the original pointer: > uintptr_t > These types are optional. > ================================= Yes, that's the one I remember. > Right. This does not say that we are allowed to cast a modified integer back to > a pointer. > > What I remember might be from the C++ standard, where "safely derived" integers > are allowed to be cast back to pointers. > Umm. This might also be implementation-defined. Yeah, what is "safely derived" is the question here (you might not break the dependency chain in any (non-)obvious way).
Dear Richard, This time, I think I constructed a real bug. Please have a look and correct me if I am wrong. ===================== #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i; for (i = 0; i < xp; i++) { } *(int*)xp = 15; printf("%d\n", x); } ===================== This program prints "15" and I do not think this raises UB. Now I add an if-statement to the program. ===================== #include <stdio.h> int main() { int x = 0; uintptr_t xp = (uintptr_t) &x; uintptr_t i; for (i = 0; i < xp; i++) { } /*** begin ***/ if (xp != i) { printf("hello\n"); xp = i; } /*** end ***/ *(int*)xp = 15; printf("%d\n", x); } ===================== This program just prints "0". Since "hello" is not printed, the if-statement is not executed. However, it prints a different result than before, which I think is a bug.
Hi Gil, Nice example! I am a bit occupied lately, and thus have not read all comments at the bug report in detail. I will be away for the weekend, but will read those quickly after. Robbert On 05/22/2015 04:12 AM, gil.hur at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #29 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > Dear Richard, > > This time, I think I constructed a real bug. > Please have a look and correct me if I am wrong. > > ===================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i; > > for (i = 0; i < xp; i++) { } > > *(int*)xp = 15; > > printf("%d\n", x); > } > ===================== > > This program prints "15" and I do not think this raises UB. > > Now I add an if-statement to the program. > > ===================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i; > > for (i = 0; i < xp; i++) { } > > /*** begin ***/ > if (xp != i) { > printf("hello\n"); > xp = i; > } > /*** end ***/ > > *(int*)xp = 15; > > printf("%d\n", x); > } > ===================== > > This program just prints "0". > > Since "hello" is not printed, the if-statement is not executed. > However, it prints a different result than before, which I think is a bug. >
[oops, that was meant to be private, please remove my last comment]
(In reply to Chung-Kil Hur from comment #29) > Dear Richard, > > This time, I think I constructed a real bug. > Please have a look and correct me if I am wrong. > > ===================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i; > > for (i = 0; i < xp; i++) { } > > *(int*)xp = 15; > > printf("%d\n", x); > } > ===================== > > This program prints "15" and I do not think this raises UB. > > Now I add an if-statement to the program. > > ===================== > #include <stdio.h> > > int main() { > int x = 0; > uintptr_t xp = (uintptr_t) &x; > uintptr_t i; > > for (i = 0; i < xp; i++) { } > > /*** begin ***/ > if (xp != i) { > printf("hello\n"); > xp = i; > } > /*** end ***/ > > *(int*)xp = 15; > > printf("%d\n", x); > } > ===================== > > This program just prints "0". > > Since "hello" is not printed, the if-statement is not executed. > However, it prints a different result than before, which I think is a bug. It indeed is a more unfortunate case but you are still breaking the dependency chain in the if (xp != i) code by assigning i to xp. The code is never executed (which is why this is unfortunate) and I am less than 100% sure it still invokes undefined behavior. The unfortunate thing is that the equivalence you build on the 'else' path (xp == i) is used by the compiler to replace xp by i on the *(int*)xp = 15 line getting us into the very same situation as in all other cases. That is, we have if (xp != i) ... # xp = PHI <xp, i> *(int *)xp = 15; because of the conditional and in this case our phiopt pass optimizes that to # xp = PHI <i, i> instead of the equally valid # xp = PHI <xp, xp> other passes (dom) may end up doing a similar thing (at least for GCC 5 and the particular testcase we are lucky here though), but for GCC 5 -fdisable-tree-phiopt1 -fdisable-tree-phiopt2 avoids the issue. Generally there is no good way to determine which choice is better. What the PTA code does is sensible btw. For # xp_20 = PHI <0(2), xp_7(7)> xp_7 = xp_20 + 1; if (xp_6 > xp_7) goto <bb 7>; else goto <bb 4>; <bb 7>: goto <bb 3>; the PTA constraints are xp_6 = &x xp_20 = &NULL xp_20 = xp_7 xp_7 = xp_20 xp_7 = &NONLOCAL which means PTA considers that all pointers coming from integer constants point to global memory only (that's to support fixed address objects). That helps to avoid false aliasing to stack objects and avoids the need to make all locals escaped when you have code passing an integer to a function (that integer, converted to a pointer _could_ point to a stack slot in the caller, no?!). So 'i' is considered to eventually point to arbitrary global memory. But _not_ to arbitrary address-taken locals.
Dear Richard, Thanks for the detailed response. I have a suggestion for a solution of the problem, which is based on my paper to appear at PLDI 2015. * A Formal C Memory Model Supporting Integer-Pointer Casts. Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, Viktor Vafeiadis. http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf The suggestion is simple. You do not need to turn off the phiopt optimizations. We propose to slightly change the following assumption. > PTA considers that all pointers coming from integer constants > point to global memory only. Here, if you change this as follows, you can solve the problem. * All pointers coming from integer constants can point to only global memory and local variables whose addresses have been cast to integers. Also, we expect that this would not decrease the optimization performance of GCC very much because those variables whose addresses have been cast to integers tend to be escaped (e.g. passed to a hash function, or stored in the memory).
On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > Dear Richard, > > Thanks for the detailed response. > > I have a suggestion for a solution of the problem, which is based on my paper > to appear at PLDI 2015. > > * A Formal C Memory Model Supporting Integer-Pointer Casts. > Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve > Zdancewic, Viktor Vafeiadis. > http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf > > The suggestion is simple. > You do not need to turn off the phiopt optimizations. > We propose to slightly change the following assumption. > > > PTA considers that all pointers coming from integer constants > > point to global memory only. > > Here, if you change this as follows, you can solve the problem. > > * All pointers coming from integer constants can point to only global memory > and > local variables whose addresses have been cast to integers. Ok, so you basically add a 2nd class of "escaping". So in GCC PTA terms you'd add a new ESCAPE-like 'INTEGER' variable with INTEGER = NONLOCAL and add INTEGER = x constraints for each .. = (integer-type) &x conversion and for the reverse ptr = (pointer-type) i add ptr = INTEGER > Also, we expect that this would not decrease the optimization performance of > GCC very much because those variables whose addresses have been cast to > integers tend to be escaped (e.g. passed to a hash function, or stored in the > memory). Well - the above basically makes _all_ pointers converted from integers point to non-local memory, it also basically globs all pointers converted from integers into a single equivalence class. So I think you underestimate the effect on optimization (but I may overestimate the effect on optimization of not simply making all pointers converted from integers point to all globals and all address-taken locals, aka ANYTHING in GCC PTA terms)
(In reply to rguenther@suse.de from comment #34) > On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > Dear Richard, > > > > Thanks for the detailed response. > > > > I have a suggestion for a solution of the problem, which is based on my paper > > to appear at PLDI 2015. > > > > * A Formal C Memory Model Supporting Integer-Pointer Casts. > > Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve > > Zdancewic, Viktor Vafeiadis. > > http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf > > > > The suggestion is simple. > > You do not need to turn off the phiopt optimizations. > > We propose to slightly change the following assumption. > > > > > PTA considers that all pointers coming from integer constants > > > point to global memory only. > > > > Here, if you change this as follows, you can solve the problem. > > > > * All pointers coming from integer constants can point to only global memory > > and > > local variables whose addresses have been cast to integers. > > Ok, so you basically add a 2nd class of "escaping". So in GCC PTA > terms you'd add a new ESCAPE-like 'INTEGER' variable with > > INTEGER = NONLOCAL > > and add > > INTEGER = x > > constraints for each > > .. = (integer-type) &x > > conversion and for the reverse > > ptr = (pointer-type) i > > add > > ptr = INTEGER > > > Also, we expect that this would not decrease the optimization performance of > > GCC very much because those variables whose addresses have been cast to > > integers tend to be escaped (e.g. passed to a hash function, or stored in the > > memory). > > Well - the above basically makes _all_ pointers converted from integers > point to non-local memory, it also basically globs all pointers > converted from integers into a single equivalence class. Yes, this is right. > So I think you underestimate the effect on optimization (but I may > overestimate the effect on optimization of not simply making all > pointers converted from integers point to all globals and all > address-taken locals, aka ANYTHING in GCC PTA terms) Just one minor correction: all address-taken locals -> all address-taken-and-cast-to-integer locals Yes, I agree. In order to understand the effect, we need some empirical evidence. I am interested in this direction. So, I wonder what benchmarks you usually use to check the effect of compiler optimizations. More specifically, are SPEC benchmarks enough? or do you use some other benchmarks too? Thanks!
On Tue, 26 May 2015, gil.hur at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #35 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > (In reply to rguenther@suse.de from comment #34) > > On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > > Dear Richard, > > > > > > Thanks for the detailed response. > > > > > > I have a suggestion for a solution of the problem, which is based on my paper > > > to appear at PLDI 2015. > > > > > > * A Formal C Memory Model Supporting Integer-Pointer Casts. > > > Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve > > > Zdancewic, Viktor Vafeiadis. > > > http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf > > > > > > The suggestion is simple. > > > You do not need to turn off the phiopt optimizations. > > > We propose to slightly change the following assumption. > > > > > > > PTA considers that all pointers coming from integer constants > > > > point to global memory only. > > > > > > Here, if you change this as follows, you can solve the problem. > > > > > > * All pointers coming from integer constants can point to only global memory > > > and > > > local variables whose addresses have been cast to integers. > > > > Ok, so you basically add a 2nd class of "escaping". So in GCC PTA > > terms you'd add a new ESCAPE-like 'INTEGER' variable with > > > > INTEGER = NONLOCAL > > > > and add > > > > INTEGER = x > > > > constraints for each > > > > .. = (integer-type) &x > > > > conversion and for the reverse > > > > ptr = (pointer-type) i > > > > add > > > > ptr = INTEGER > > > > > Also, we expect that this would not decrease the optimization performance of > > > GCC very much because those variables whose addresses have been cast to > > > integers tend to be escaped (e.g. passed to a hash function, or stored in the > > > memory). > > > > Well - the above basically makes _all_ pointers converted from integers > > point to non-local memory, it also basically globs all pointers > > converted from integers into a single equivalence class. > > Yes, this is right. > > > So I think you underestimate the effect on optimization (but I may > > overestimate the effect on optimization of not simply making all > > pointers converted from integers point to all globals and all > > address-taken locals, aka ANYTHING in GCC PTA terms) > > Just one minor correction: > all address-taken locals -> all address-taken-and-cast-to-integer locals > > Yes, I agree. > In order to understand the effect, we need some empirical evidence. > I am interested in this direction. > So, I wonder what benchmarks you usually use to check the effect of compiler > optimizations. > More specifically, are SPEC benchmarks enough? or do you use some other > benchmarks too? SPEC CPU tends to capture most of this though we also periodically check other "benchmarks" such as firefox and its few performance tests or similar big C++ programs.
(In reply to rguenther@suse.de from comment #36) > On Tue, 26 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > --- Comment #35 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > (In reply to rguenther@suse.de from comment #34) > > > On Sat, 23 May 2015, gil.hur at sf dot snu.ac.kr wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > > > > > > > --- Comment #33 from Chung-Kil Hur <gil.hur at sf dot snu.ac.kr> --- > > > > Dear Richard, > > > > > > > > Thanks for the detailed response. > > > > > > > > I have a suggestion for a solution of the problem, which is based on my paper > > > > to appear at PLDI 2015. > > > > > > > > * A Formal C Memory Model Supporting Integer-Pointer Casts. > > > > Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve > > > > Zdancewic, Viktor Vafeiadis. > > > > http://sf.snu.ac.kr/gil.hur/publications/intptrcast.pdf > > > > > > > > The suggestion is simple. > > > > You do not need to turn off the phiopt optimizations. > > > > We propose to slightly change the following assumption. > > > > > > > > > PTA considers that all pointers coming from integer constants > > > > > point to global memory only. > > > > > > > > Here, if you change this as follows, you can solve the problem. > > > > > > > > * All pointers coming from integer constants can point to only global memory > > > > and > > > > local variables whose addresses have been cast to integers. > > > > > > Ok, so you basically add a 2nd class of "escaping". So in GCC PTA > > > terms you'd add a new ESCAPE-like 'INTEGER' variable with > > > > > > INTEGER = NONLOCAL > > > > > > and add > > > > > > INTEGER = x > > > > > > constraints for each > > > > > > .. = (integer-type) &x > > > > > > conversion and for the reverse > > > > > > ptr = (pointer-type) i > > > > > > add > > > > > > ptr = INTEGER > > > > > > > Also, we expect that this would not decrease the optimization performance of > > > > GCC very much because those variables whose addresses have been cast to > > > > integers tend to be escaped (e.g. passed to a hash function, or stored in the > > > > memory). > > > > > > Well - the above basically makes _all_ pointers converted from integers > > > point to non-local memory, it also basically globs all pointers > > > converted from integers into a single equivalence class. > > > > Yes, this is right. > > > > > So I think you underestimate the effect on optimization (but I may > > > overestimate the effect on optimization of not simply making all > > > pointers converted from integers point to all globals and all > > > address-taken locals, aka ANYTHING in GCC PTA terms) > > > > Just one minor correction: > > all address-taken locals -> all address-taken-and-cast-to-integer locals > > > > Yes, I agree. > > In order to understand the effect, we need some empirical evidence. > > I am interested in this direction. > > So, I wonder what benchmarks you usually use to check the effect of compiler > > optimizations. > > More specifically, are SPEC benchmarks enough? or do you use some other > > benchmarks too? > > SPEC CPU tends to capture most of this though we also periodically > check other "benchmarks" such as firefox and its few performance > tests or similar big C++ programs. Thanks for the information!
IMHO this bug is not specific to integers and boils down to this: when a check for equality ignores provenance for some reason, phiopt nevertheless will replace one variable by another with the wrong provenance. Integers are surely compared without regard to prevenance. That's one case. Another case is a comparison of two pointers when one of the lost its provenance info. E.g. the program (somewhat based on pr61502): #include <stdint.h> #include <stdio.h> int main() { int y, x = 0; int *volatile v = &x; int *xp = v; int *i = &y + 1; if (xp != i) { printf("hello\n"); xp = i; } *xp = 15; printf("%d\n", x); } prints 0 for me with gcc 5.2.0 -O2. The evident solution is to not apply this optimization when provenance info of the two variables differs. I guess for most integers it will be the same. Additionally, this optimization could be applied when provenance info for the first variable is known but it's unknown for the second one. This leads to the loss of provenance info and can prevent other optimizations. Maybe a more complex solution is possible, like tracking provenance info separately from the core value, I don't know.
(In reply to Alexander Cherepanov from comment #38) > IMHO this bug is not specific to integers and boils down to this: when a > check for equality ignores provenance for some reason, phiopt nevertheless > will replace one variable by another with the wrong provenance. > > Integers are surely compared without regard to prevenance. That's one case. > Another case is a comparison of two pointers when one of the lost its > provenance info. E.g. the program (somewhat based on pr61502): > > #include <stdint.h> > #include <stdio.h> > > int main() > { > int y, x = 0; > int *volatile v = &x; > int *xp = v; > int *i = &y + 1; > > if (xp != i) { > printf("hello\n"); > xp = i; > } > > *xp = 15; > > printf("%d\n", x); > } > > prints 0 for me with gcc 5.2.0 -O2. Except the above testcase is invalid/undefined as &y + 1 is undefined if deferenced and &x and &y + 1 cannot be compared in a defined sense as both deals with two different arrays.
Ok, this program: #include <stdint.h> #include <stdio.h> int main() { int y, x = 0; int *volatile v = &x; int *xp = v; int *i = &y + 1; if (xp != i) { printf("hello\n"); xp = i; } printf("%d\n", xp == &x); } print 0 too even though it should print 1.
(In reply to Alexander Cherepanov from comment #40) > Ok, this program: > > #include <stdint.h> > #include <stdio.h> > > int main() { > int y, x = 0; > int *volatile v = &x; > int *xp = v; > int *i = &y + 1; > > if (xp != i) { > printf("hello\n"); > xp = i; > } > > printf("%d\n", xp == &x); > } > Still undefined as &x and &y + 1 are not comparable.
On 2015-11-16 00:48, pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #41 from Andrew Pinski <pinskia at gcc dot gnu.org> --- > (In reply to Alexander Cherepanov from comment #40) >> Ok, this program: >> >> #include <stdint.h> >> #include <stdio.h> >> >> int main() { >> int y, x = 0; >> int *volatile v = &x; >> int *xp = v; >> int *i = &y + 1; >> >> if (xp != i) { >> printf("hello\n"); >> xp = i; >> } >> >> printf("%d\n", xp == &x); >> } Small correction: it prints 0 and doesn't print "hello" even though it should print 1 without "hello" or an unspecified value with "hello". > Still undefined as &x and &y + 1 are not comparable. They cannot be compared with the relational operators ("<" etc.) but you can compare any pointers with the equality operators ("==" and "!=").
(In reply to Alexander Cherepanov from comment #38) > The evident solution is to not apply this optimization when provenance info > of the two variables differs. I guess for most integers it will be the same. IMO tracking provenance info is not a good idea, since it is really complicated. First, since integers and pointers can be casted to each other, not only pointers but also integers should carry provenance information. Second, tracking provenance info may work for simple examples, but it is even hard to define the provenance info itself for complex expressions. For e.g., what is the provenance of "e1-e2"? "2*e"? "e1 XOR e2"? "e1 * e2"? (even given the provenance info for integer expressions "e*") I would rather prefer marking pointers casted to integers as *escaped*, and forgetting about the provenance at all. Here are several reasons why this works well: - Standard optimizations are supported. Say we want to support the following constant propagation example: char f() { char a = '0'; g(); // unknown function; may guess the address of "a" and // try to access it (but it is always unsuccessful) return a; // -> return '0' } Since the address of "a" is not casted to integers, "a" is private to the function "f" (i.e., not escaped from "f"), and "g" cannot access "a". So we know "a = 0" at the return. - semantics is simple. No need to track the provenance info for variables. Once a pointer is casted to integers, it is just integers without any tracked information. As a result, the standard integer optimizations of our interest, as the following, are fully supported: if (x != y) x = y; -> x = y; - Performance degradation due to "casted pointers as escaped" is insignificant. Morally, if a pointer is casted to an integer, the address is regarded as "global": having the integer value of the pointer means you can access the pointer. So there will be not much optimization opportunity (or intent) for those pointers casted to integers. Of course, this argument should be validated by some experiment; yet I am quite convinced it is the case that the performance degradation is insignificant. I would like to ask how you think about this suggestion. Note that my argument here is based on my paper on this issue, where you can find the formal memory model we proposed, proofs that optimization examples are correct, and reasoning principle for proving optimizations (see the paper and the slide): A Formal C Memory Model Supporting Integer-Pointer Casts. Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, Viktor Vafeiadis. PLDI 2015. http://sf.snu.ac.kr/intptrcast/
On Mon, 16 Nov 2015, jeehoon.kang at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jeehoon.kang at sf dot snu.ac.kr > > --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> --- > - Performance degradation due to "casted pointers as escaped" is insignificant. I think this is not true. For example with MatLab (might be sth else, if I don't remember correctly) you are required to pass pointers to arrays in two halves in double(!) values (I believe the only function argument type they support). GCC happily makes points-to analysis work through those. The other (unfortunate) thing is that in GCC pointer subtraction is always performed on integers, thus for the C source code int idx = ptr1 - ptr2; we internally have sth like int idx = ((long)ptr1 - (long)ptr2) / 4; so you can't really treat pointers as "escaped" here without loss. Note that we've had this (kind of) in the past and it tried to go without making pointers escaped at these points but only consider casts from integers to pointers as pointing to anything. But that's wrong correctness wise (not then treating the cast to integer as escape point). I also don't think doing the above would solve the cases of equality compares of pointes themselves. (hopefully undefined) I added the current handling of pointers vs. integers for a missed-optimization bug that said a hand-written memcpy loop didn't properly transfer points-to info (properly as in optimially for optimization). GCC can now do that ;)
> I think this is not true. For example with MatLab (might be sth else, > if I don't remember correctly) you are required to pass pointers to > arrays in two halves in double(!) values (I believe the only function > argument type they support). GCC happily makes points-to analysis work > through those. Thank you for giving me an example. Yet, I think it is a little bit unfortunate for MatLab (or sth else) to pass pointers by packing two into a double, at least due to the readability problem. I think it is beyond the intended usage of the C/C++ language, but I understand that GCC is the time-honored compiler for various software and systems. > The other (unfortunate) thing is that in GCC pointer subtraction > is always performed on integers, thus for the C source code > > int idx = ptr1 - ptr2; > > we internally have sth like > > int idx = ((long)ptr1 - (long)ptr2) / 4; > > so you can't really treat pointers as "escaped" here without loss. Thank you for giving me the information. I don't know the GCC internals, so I would like to ask how much it would cost to introduce the syntax for pointer subtractions. I hope it is not that huge, but I really don't have any idea. > Note that we've had this (kind of) in the past and it tried to go > without making pointers escaped at these points but only consider > casts from integers to pointers as pointing to anything. But > that's wrong correctness wise (not then treating the cast to integer > as escape point). Treating the cast to integer as escape point is proven-correct by a machine-checked proof (in Coq) for various standard optimization examples, such as CP, DCE, dead allocation elimination, etc. For more detail, please see the paper above-mentioned. > I also don't think doing the above would solve the cases of equality > compares of pointes themselves. (hopefully undefined) The formal memory model in the paper I mentioned invokes undefined behavior for the pointer equality comparison example above. In the formal model, a pointer is represented as a pair of a memory block identifier (l) and an offset (o). (cf. the CompCert memory model) When a memory is malloc-ed or alloca-ed, a new memory block identifier is assigned. A pointer equality, say of (l, o) and (l', o'), invokes undefined behavior when l != l'. So for the following example (by Alexander Cherepanov): #include <stdint.h> #include <stdio.h> int main() { int y, x = 0; int *volatile v = &x; int *xp = v; int *i = &y + 1; if (xp != i) { printf("hello\n"); xp = i; } printf("%d\n", xp == &x); } Say y and x are allocated at l1 and l2, respectively. Then xp = (l2, 0), and i = (l1, 4). Thus comparing xp and i invokes undefined behavior, since l1 != l2.
On Mon, 16 Nov 2015, jeehoon.kang at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #45 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> --- > > I think this is not true. For example with MatLab (might be sth else, > > if I don't remember correctly) you are required to pass pointers to > > arrays in two halves in double(!) values (I believe the only function > > argument type they support). GCC happily makes points-to analysis work > > through those. > > Thank you for giving me an example. Yet, I think it is a little bit > unfortunate for MatLab (or sth else) to pass pointers by packing two into a > double, at least due to the readability problem. I think it is beyond the > intended usage of the C/C++ language, but I understand that GCC is the > time-honored compiler for various software and systems. > > > > > The other (unfortunate) thing is that in GCC pointer subtraction > > is always performed on integers, thus for the C source code > > > > int idx = ptr1 - ptr2; > > > > we internally have sth like > > > > int idx = ((long)ptr1 - (long)ptr2) / 4; > > > > so you can't really treat pointers as "escaped" here without loss. > > Thank you for giving me the information. I don't know the GCC internals, so I > would like to ask how much it would cost to introduce the syntax for pointer > subtractions. I hope it is not that huge, but I really don't have any idea. It would be quite some (mechanical) work but otherwise not too difficult. There is the choice whether to embed the division implicitely here or not. > > Note that we've had this (kind of) in the past and it tried to go > > without making pointers escaped at these points but only consider > > casts from integers to pointers as pointing to anything. But > > that's wrong correctness wise (not then treating the cast to integer > > as escape point). > > Treating the cast to integer as escape point is proven-correct by a > machine-checked proof (in Coq) for various standard optimization examples, such > as CP, DCE, dead allocation elimination, etc. For more detail, please see the > paper above-mentioned. Yes, I agree that making this an escape point is enough.
On 2015-11-16 14:00, rguenther at suse dot de wrote: >> --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> --- >> - Performance degradation due to "casted pointers as escaped" is insignificant. > > I think this is not true. For example with MatLab (might be sth else, > if I don't remember correctly) you are required to pass pointers to > arrays in two halves in double(!) values (I believe the only function > argument type they support). GCC happily makes points-to analysis work > through those. But this is invalid C. First, it breaks strict aliasing rules. Second, the representations of these doubles are free to change at any time given their values are kept intact (e.g. change one NaN to another). That is, unrelated improvements in other optimizations in gcc will break all of this in the future, right? > I also don't think doing the above would solve the cases of equality > compares of pointes themselves. (hopefully undefined) Hm, undefining comparisons between a pointer pointing past the end of an object and a pointer to another object could be a way forward. You propose to undefine it for any objects (similar to "<") or only when they are not parts (including recursively) of the same aggregate? In any case this will fix some theoretical problems, e.g. transitivity of == as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c17 . OTOH, again in any case, it will not solve all problems. For instance, if you want to track bounds of arrays in runtime you still can get equal pointers with different additional info -- pointers to an object and a subobject. Granted, this is kinda hypothetical (IIUC UBSAN works a bit different right now). > I added the current handling of pointers vs. integers for a > missed-optimization bug that said a hand-written memcpy loop > didn't properly transfer points-to info (properly as in > optimially for optimization). GCC can now do that ;) Nice! Does gcc properly transfer effective type info too, over a hand-written memcpy loop? Just curious. On 2015-11-16 15:51, rguenther at suse dot de wrote: >> Thank you for giving me the information. I don't know the GCC internals, so I >> would like to ask how much it would cost to introduce the syntax for pointer >> subtractions. I hope it is not that huge, but I really don't have any idea. > > It would be quite some (mechanical) work but otherwise not too difficult. > There is the choice whether to embed the division implicitely here or > not. If you choose to fix it please fix pr45779 on the way (see pr67999 for a context).
On Mon, 16 Nov 2015, ch3root at openwall dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #47 from Alexander Cherepanov <ch3root at openwall dot com> --- > On 2015-11-16 14:00, rguenther at suse dot de wrote: > >> --- Comment #43 from Jeehoon Kang <jeehoon.kang at sf dot snu.ac.kr> --- > >> - Performance degradation due to "casted pointers as escaped" is insignificant. > > > > I think this is not true. For example with MatLab (might be sth else, > > if I don't remember correctly) you are required to pass pointers to > > arrays in two halves in double(!) values (I believe the only function > > argument type they support). GCC happily makes points-to analysis work > > through those. > > But this is invalid C. First, it breaks strict aliasing rules. Second, > the representations of these doubles are free to change at any time > given their values are kept intact (e.g. change one NaN to another). > That is, unrelated improvements in other optimizations in gcc will break > all of this in the future, right? You misunderstood, they marshall a pointer 'T *p' like unsigned int high = ((uintptr_t)p) >> 32; unsigned int low = ((uintptr_t)p) & (1 << 32 - 1); foo ((double) high, (double) low); and in foo then do foo (double hdouble, double ldouble) { unsigned int high = (unsigned int) hdouble; unsigned int low = (unsigned int) ldouble; T *p = (T *)(((uintptr_t)high << 32) | (uintptr_t)low); the important part here is to recognize that frobbing in points-to analysis so you still see what 'p' points to in foo. > > I added the current handling of pointers vs. integers for a > > missed-optimization bug that said a hand-written memcpy loop > > didn't properly transfer points-to info (properly as in > > optimially for optimization). GCC can now do that ;) > > Nice! Does gcc properly transfer effective type info too, over a > hand-written memcpy loop? Just curious. No, GCC doesn't have any effective type analysis / propagation. It only has the traditional type-based disambiguations of accesses using the access type. > On 2015-11-16 15:51, rguenther at suse dot de wrote: > >> Thank you for giving me the information. I don't know the GCC > internals, so I > >> would like to ask how much it would cost to introduce the syntax for > pointer > >> subtractions. I hope it is not that huge, but I really don't have > any idea. > > > > It would be quite some (mechanical) work but otherwise not too difficult. > > There is the choice whether to embed the division implicitely here or > > not. > > If you choose to fix it please fix pr45779 on the way (see pr67999 for a > context).
Related testcase from PR61502: #include <stdio.h> int main() { int x, y = 1; int *volatile v; int *p; v = &y; p = v; if (p == &x + 1) { *p = 2; printf("y = %d\n", y); } } which shows how propagating conditional equivalences (&x+1 into *p = 2) breaks alias analysis.
Richi, I haven't followed this BZ at all, but I absolutely trust you on issues WRT alias analysis. If we can't propagate these conditional equivalences for pointers, I'll happily tweak DOM to avoid that.
(In reply to Jeffrey A. Law from comment #50) > Richi, > I haven't followed this BZ at all, but I absolutely trust you on issues WRT > alias analysis. If we can't propagate these conditional equivalences for > pointers, I'll happily tweak DOM to avoid that. Unfortunately it isn't that easy - even propagating equivalences for integers may cause the same issue. And even if we fix all issues on GIMPLE we're still left with the fundamental brokeness of RTL alias analysis (PR49330).
Testcase with integers involving propagation that still "works" on trunk: #include <stdio.h> int main() { int x, y = 1; int *volatile v; int *p; v = &y; p = v; unsigned long k = (unsigned long)(&x + 1); unsigned long pi = (unsigned long)p; if (pi == k) { pi+=4; p = (int *)pi; *(p-1) = 2; printf("y = %d\n", y); } } it needs enough obfuscation (before the equivalency propagation which has to happen before another PTA pass happens). Either via IPA inlining if we'd ever propagate such equivalences before inlining or as above via offsetting. Here we replace pi with k in pi = pi + 4; which makes PTA consider pi to point to x. The propagation essentially introduces undefined behavior. You can expose the same issue by piecewise decomposing the pointer to chars, and having them equivalency propagated in a bogus way, then reconstruct the pointer from the chars. So it's not enough to disable pointer and uintptr_t propagations either. It's not enough to put points-to information in the dereference site (which would fix some related issues) as this issue appears as part of PTA analysis itself (it doesn't consider an equivalency relation to form a dependency, see the discussion in this PR).
And presumably walking forward/backward from the equivalency point to determine if an object is derived from or is used to derive a pointer is insufficient because the equivalency point and casting to/from pointer types might be in different functions? Is this the final straw and do we just stop recording these conditional equivalences when both sides are SSA_NAMEs and declare the fallout unavoidable? And note I believe we do similar stuff in the RTL optimizers too.
On Fri, 21 Oct 2016, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 > > --- Comment #53 from Jeffrey A. Law <law at redhat dot com> --- > And presumably walking forward/backward from the equivalency point to determine > if an object is derived from or is used to derive a pointer is insufficient > because the equivalency point and casting to/from pointer types might be in > different functions? Yes. > Is this the final straw and do we just stop recording these conditional > equivalences when both sides are SSA_NAMEs and declare the fallout > unavoidable? I haven't made up my mind on a possible good solution here. But yes, it would mean to not propagate any equivalences derived from conditionals, not even symbolic constants such as &a btw. > And note I believe we do similar stuff in the RTL optimizers too. RTL has it's own share of issues, see the linked PR about its very optimistic treating of "base value"s.
*** Bug 82177 has been marked as a duplicate of this bug. ***
Testcase from PR82177: #include <stdio.h> #include <stdint.h> void f(int*, int*); int main() { int a=0, y[1], x = 0; uintptr_t pi = (uintptr_t) &x; uintptr_t yi = (uintptr_t) (y+1); uintptr_t n = pi != yi; if (n) { a = 100; pi = yi; } if (n) { a = 100; pi = (uintptr_t) y; } *(int *)pi = 15; printf("a=%d x=%d\n", a, x); f(&x,y); return 0; }