Bug 61502 - == comparison on "one-past" pointer gives wrong result
Summary: == comparison on "one-past" pointer gives wrong result
Status: SUSPENDED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 63611 65679 (view as bug list)
Depends on:
Blocks: 65752 85800 82177
  Show dependency treegraph
 
Reported: 2014-06-13 15:59 UTC by Peter Sewell
Modified: 2018-05-16 09:04 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-10-19 00:00:00


Attachments
C code as pasted into bug report (296 bytes, text/x-csrc)
2014-06-13 15:59 UTC, Peter Sewell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Sewell 2014-06-13 15:59:09 UTC
Created attachment 32934 [details]
C code as pasted into bug report

The following code can produce a pointer to one-past the x object.  When it does, according to the C11 standard text, the result of the pointer comparison should be true, but gcc gives false.

#include <stdio.h> 
int  y = 2, x=1; 
int main()
{
  int *p;
  p = &x +1 ;  
  printf("&x=%p  &y=%p  p=%p\n",(void*)&x, (void*)&y, (void*)p); 
  _Bool b1 = (p==&y);   
  printf("(p==&y) = %s\n", b1?"true":"false");
  return 0;
}

gcc-4.8 -std=c11 -pedantic -Wall -Wextra -O2 -o a.out pointer_representation_1e.c && ./a.out
&x=0x601020  &y=0x601024  p=0x601024
(p==&y) = false

gcc-4.8 --version
gcc-4.8 (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1

The pointer addition is licensed by 6.5.6 "Additive operators", where:

6.5.6p7 says "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the  type of the object as its element type.", and 

6.5.6p8 says "[...] Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object [...]".

The pointer comparison is licensed by 6.5.9 "Equality operators", where:

6.5.9p7 says "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the  type of the object as its element type.",

6.5.9p6 says "Two pointers compare equal if and only if [...] or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.109)", and

Footnote 109 says "Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. [...]".
Comment 1 joseph@codesourcery.com 2014-06-13 19:29:37 UTC
Just because two pointers print the same and have the same bit-pattern 
doesn't mean they need to compare equal (see the response to DR#260, which 
may be presumed to apply to C11 in the absence of relevant textual changes 
to make it not apply).
Comment 2 Harald van Dijk 2014-06-13 21:52:36 UTC
DR#260 applies when using *p as if it is &y, just because it happens to compare equal to it. For example, attempting to use it to read the value of y is not permitted, even if guarded by an if (p == &y) condition. But that isn't the case here: the pointer value &x + 1 is used in a way described in the standard as the behaviour for the one-past-array values: the standard doesn't just permit them to compare equal to an object immediately following it in memory; what it does (because of the way it is worded) is require them to compare equal to an object immediately following it in memory. (But I cannot even hazard a guess as to whether that is intentional.)
Comment 3 joseph@codesourcery.com 2014-06-13 22:59:21 UTC
Except within a larger object, I'm not aware of any reason the cases of 
two objects following or not following each other in memory must be 
mutually exclusive.  (If the implementation can track the origins of 
bit-patterns and where copies of those bit-patterns have got to, it might 
have a compacting garbage collector that relocates objects and changes 
what's adjacent to what, for example - I think such implementations are 
within the scope of what the C standard is intended to support.  Or if 
you're concerned about how this changes bit-patterns of pointers, imagine 
that a C pointer is a (object key, offset) pair, and that comparison first 
converts the C pointer into a hardware address, where it's the mapping 
from object keys to hardware addresses that changes as a result of garbage 
collection rather than anything about the representation of the pointer.)

So the only way within the C standard you could deduce that two objects 
follow each other in memory is that the address of one compares equal to 
one past the address of the other - but that does not mean they follow 
each other in memory for any other comparison.

An object having a constant address (6.2.4#2) is described non-normatively 
in footnote 33 in terms of comparisons of pointers to that object.  I 
don't think it should be taken to mean comparisons of pointers to 
different objects need to have constant results.
Comment 4 Harald van Dijk 2014-06-14 08:00:11 UTC
That's an interesting argument. You may well be right that the original code, strictly speaking, does not prove that GCC has a bug, but I do think GCC has a bug nonetheless, and have come up with a different example.

#include <stdio.h>
#include <string.h>

int x, y;

char buf[sizeof (int *)];

int main()
{
  int *p = &y;
  memcpy (buf, &p, sizeof p);
  memcpy (&p, buf, sizeof p);
  x = 2, y = 1;
  if (p == &x + 1)
    *p = 2;
  else
    y = 2;
  printf ("x = %d, y = %d\n", x, y);
  return 0;
}

Compiling with -O2, I see "x = 2, y = 1". p has been assigned &y. Whether it compares equal to &x + 1 is unspecified, but it doesn't change its origins: p points to y. Therefore, either the assignment to *p should change y, or in the else branch, the plain assignment to y should change y. Either way, the correct result is "x = 2, y = 2".

It seems like GCC is assuming that if p == &x + 1, and &x + 1 != &y, then p != &y, so the assignment to *p cannot change y. The flaw in that logic is again the optimisation of &x + 1 != &y to a constant.

I see the behaviour I describe in versions 4.9.0 and 4.8.2. This program does print "x = 2, y = 2" in my testing on GCC 4.7.3, but that is because p == &x + 1 happens to not compare as true in that version. Slightly tweaked versions of this fail with versions 4.7.3, 4.6.4, 4.5.4 and 4.4.7, but not 4.3.6.
Comment 5 Joseph S. Myers 2014-10-21 21:42:09 UTC
*** Bug 63611 has been marked as a duplicate of this bug. ***
Comment 6 Keith Thompson 2014-10-21 22:17:24 UTC
In the test case for Bug 63611 (marked as a duplicate of this one)
we have:

    element x[1];
    element y[1];
    element *const x0 = x;
    element *const x1 = x0 + 1;
    element *const y0 = y;
    
When the test case is executed, the condition (x1 == y0) is true
when it's evaluated, but the condition (x + 1 == y) (which I argue is
equivalent) is false when it's evaluated 2 lines later.

I don't believe that DR#260 applies, since there are no indeterminate
values being used here.  Which means, I think, that N1570 6.2.4p2:

    An object exists, has a constant address, and retains its
    last-stored value throughout its lifetime.
  
does apply.

Whether x follows y or y follows x in memory, or neither, is 
unimportant.  The problem is that the "==" comparison is behaving
inconsistently for the same two pointer values.

I'm unconvinced by the argument (if I understand it correctly) that 
the objects x and y might be adjacent when the first comparison is
evaluated, but not when the second is evaluated.  I believe that would
violate the requirement that objects have constant addresses and retain
their last-stored values.  Furthermore, even if relocating objects so
they're no long adjacent is permitted by the language, I don't believe
gcc (or the code that it generates) is actually doing so in this case.
Comment 7 joseph@codesourcery.com 2014-10-21 22:32:43 UTC
On Tue, 21 Oct 2014, Keith.S.Thompson at gmail dot com wrote:

> their last-stored values.  Furthermore, even if relocating objects so
> they're no long adjacent is permitted by the language, I don't believe
> gcc (or the code that it generates) is actually doing so in this case.

Really, it's a bad idea to apply concepts such as "actually doing so" to 
understanding the semantics of C, specified as a high-level language.  
"happens to immediately follow the first array object in the address 
space" in the high-level language need not satisfy any particular rules 
you might expect from thinking of C as relating to particular hardware, 
only the rules that can be deduced from the C standard (which as far as I 
can tell, do not say that "follows" is constant just because the addresses 
of the two objects in question are constant - or anything else such as 
that you can't have x + 1 == y and y + 1 == x, which you might expect from 
relating things to hardware rather than to standard requirements).
Comment 8 Keith Thompson 2014-10-21 23:37:53 UTC
I'm not (deliberately) considering anything other than the requirements
of the C standard.

The standard talks about an array object immediately following another
array object in the address space. Since that phrase is used in
normative wording in the standard, I presume it's meaningful.  Since
the term is not otherwise defined, I presume that the intended meaning
is one that follows reasonably clearly from the wording.

The test program for Bug 63611, when I execute it, prints the string
"y immediately follows x", followed by the string "inconsistent behavior:".

Are you saying it's possible that y immediately follows x in the
address space when that line of output is printed, and that y *doesn't*
immediately follow x in the address space when "inconsistent behavior:"
is printed?

If so, can you describe what the word "follows" means in this context?
If it has a meaning that permits such behavior, can you cite a source
that indicates that that's how the authors of the standard meant it?
Comment 9 joseph@codesourcery.com 2014-10-22 01:23:58 UTC
On Tue, 21 Oct 2014, Keith.S.Thompson at gmail dot com wrote:

> Are you saying it's possible that y immediately follows x in the
> address space when that line of output is printed, and that y *doesn't*
> immediately follow x in the address space when "inconsistent behavior:"
> is printed?

Yes.

> If so, can you describe what the word "follows" means in this context?

"follows" is a binary relation with no constraints except when two objects 
are part of the same declared or allocated larger object.  If part of the 
same declared or allocated larger object, it means that the bytes of the 
latter object immediately follow the bytes of the former object within the 
sequence of bytes making up the representation of the larger object (but 
this does *not* mean that it is necessarily valid to derive pointers to 
one of the smaller objects from pointers to the other, unless you are very 
careful about what sequences of conversions and arithmetic are involved; 
many cases of pointer conversions and arithmetic are less fully defined 
than one might naively expect, and the question of which of multiple 
possible objects is relevant in a particular context is one of the more 
poorly defined areas of C).
Comment 10 Keith Thompson 2014-10-22 01:51:09 UTC
I strongly disagree with your interpretation.

Do you believe that the authors of the standard meant it the way you do?

I suggest that the footnote:

> Two objects may be adjacent in memory because they are adjacent elements
> of a larger array or adjacent members of a structure with no padding
> between them, or because the implementation chose to place them so,
> even though they are unrelated.

implies that the phrase "adjacent in memory" (which appears to
be synonymous with "immediately following in the address space") is
intended to have a *consistent* meaning, even for unrelated objects.
Two given objects may or may not be adjacent, and if they are adjacent
they may appear in either order, entirely at the whim of the compiler.
But I don't see a reasonable interpretation of the standard's wording
that doesn't require "==" to behave consistently. Indeed, I believe
that consistency (which gcc lacks) is the whole point of that wording.

Any two pointer values are either equal or unequal. In the test program,
the pointer values do not change, but they compare both equal and unequal
at different points in the code. In my opinion, that's a clear violation
of the required semantics.

And I don't believe you've fullyl answered my question about what is meant
by "follows", at least not fully. I agree with you about the meaning
for objects that are subobjects of some larger object, but for other cases
you've essentially said that it's meaningless. I actually would have no
problem with that, and I wouldn't complain if the standard left it
unspecified -- but it doesn't.
Comment 11 Richard Biener 2014-10-22 08:37:04 UTC
(In reply to Harald van Dijk from comment #4)
> That's an interesting argument. You may well be right that the original
> code, strictly speaking, does not prove that GCC has a bug, but I do think
> GCC has a bug nonetheless, and have come up with a different example.
> 
> #include <stdio.h>
> #include <string.h>
> 
> int x, y;
> 
> char buf[sizeof (int *)];
> 
> int main()
> {
>   int *p = &y;
>   memcpy (buf, &p, sizeof p);
>   memcpy (&p, buf, sizeof p);
>   x = 2, y = 1;
>   if (p == &x + 1)
>     *p = 2;
>   else
>     y = 2;
>   printf ("x = %d, y = %d\n", x, y);
>   return 0;
> }
> 
> Compiling with -O2, I see "x = 2, y = 1". p has been assigned &y. Whether it
> compares equal to &x + 1 is unspecified, but it doesn't change its origins:
> p points to y. Therefore, either the assignment to *p should change y, or in
> the else branch, the plain assignment to y should change y. Either way, the
> correct result is "x = 2, y = 2".
> 
> It seems like GCC is assuming that if p == &x + 1, and &x + 1 != &y, then p
> != &y, so the assignment to *p cannot change y. The flaw in that logic is
> again the optimisation of &x + 1 != &y to a constant.
> 
> I see the behaviour I describe in versions 4.9.0 and 4.8.2. This program
> does print "x = 2, y = 2" in my testing on GCC 4.7.3, but that is because p
> == &x + 1 happens to not compare as true in that version. Slightly tweaked
> versions of this fail with versions 4.7.3, 4.6.4, 4.5.4 and 4.4.7, but not
> 4.3.6.

I can't reproduce your findings with any of the specified GCC version nor
with any other I tried (I tried on x86_64-linux and x86_64-linux with -m32).
The test program always prints "x = 2, y = 2" as expected.
Comment 12 Harald van Dijk 2014-10-22 16:54:27 UTC
(In reply to Richard Biener from comment #11)
> I can't reproduce your findings with any of the specified GCC version nor
> with any other I tried (I tried on x86_64-linux and x86_64-linux with -m32).
> The test program always prints "x = 2, y = 2" as expected.

The wrong code should be visible by inspecting the generated assembly, but it only actually fails at run-time if y directly follows x in memory. It did for me back when I commented, but it no longer does. Here is a version that should fail more reliably, by having only x and y as global variables, and by covering both the case where y immediately follows x and the case where x immediately follows y:

#include <stdio.h>
#include <string.h>

int x, y;

int main()
{
  int *volatile v;
  int *p;

  v = &y;
  p = v;
  x = 2, y = 1;
  if (p == &x + 1)
    *p = 2;
  else
    y = 2;
  printf ("x = %d, y = %d\n", x, y);

  v = &x;
  p = v;
  x = 2, y = 1;
  if (p == &y + 1)
    *p = 1;
  else
    x = 1;
  printf ("x = %d, y = %d\n", x, y);

  return 0;
}

The only correct output is "x = 2, y = 2" followed by "x = 1, y = 1". On my main system, I get "x = 2, y = 1" followed by "x = 1, y = 1". On another, I get "x = 2, y = 2" followed by "x = 2, y = 1".
Comment 13 joseph@codesourcery.com 2014-10-22 16:55:36 UTC
On Wed, 22 Oct 2014, Keith.S.Thompson at gmail dot com wrote:

> Do you believe that the authors of the standard meant it the way you do?

The "authors of the standard" are an amorphous group over 30 years and I 
don't think a single intent can meaningfully be assigned to them.

In recent years, the general position has included:

* C is a high-level language supporting a wide range of implementations, 
not just ones with a conventional linear address space and otherwise 
conventional direct mappings to machine operations;

* edge cases should generally be resolved in the way that is convenient 
for optimization rather than the way that is simplest to specify.

For the latter, see for example the discussion in the Parma minutes of 
instability of uninitialized variables with automatic storage duration.  
That is, if you have

  unsigned char a; // uninitialized, inside a function
  unsigned char b = a;
  unsigned char c = b;

then even if there isn't undefined behavior, there is no requirement 
(given no further assignments to b or c) for b == c, or for the value of b 
== c to stay unchanged, or for the values of b and c to remain unchanged.

(As another example, C11 chose to make INT_MIN % -1 explicitly undefined 
for implementation convenience, even though users might think the value is 
obviously 0.)
Comment 14 Keith Thompson 2014-10-26 18:13:55 UTC
The C standard requires that, if y "happens to immediately follow"
x in the address space, then a pointer just past the end of x shall
compare equal to a pointer to the beginning of y (C99 and C11 6.5.9p6).

How could I distinguish the current behavior of gcc from the behavior
of a hypothetical C compiler that violates that requirement? In
other words, in what sense does gcc actually obey that requirement?

Or is it your position that the requirement is so vague that it
cannot meaningfully be followed? If so, have you followed up with
the standard committee to clarify or remove it?
Comment 15 joseph@codesourcery.com 2014-10-27 18:25:53 UTC
On Sun, 26 Oct 2014, Keith.S.Thompson at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
> 
> --- Comment #14 from Keith Thompson <Keith.S.Thompson at gmail dot com> ---
> The C standard requires that, if y "happens to immediately follow"
> x in the address space, then a pointer just past the end of x shall
> compare equal to a pointer to the beginning of y (C99 and C11 6.5.9p6).
> 
> How could I distinguish the current behavior of gcc from the behavior
> of a hypothetical C compiler that violates that requirement? In
> other words, in what sense does gcc actually obey that requirement?

They are not distinguishable (unless by implementation documentation 
defining what "happens to immediately follow" means for the given 
implementation - but the meaning of that phrase is unspecified, not 
implementation-defined, so there is no requirement for implementations to 
document anything in that regard).  "happens to immediately follow" is an 
intuitive description that explains *why* such pointers are allowed to 
compare equal at all (to avoid a requirement for otherwise unnecessary 
padding in common classes of implementations), but can only be observed by 
the result of a comparison (an observation that is then only valid for 
that particular comparison).

The natural state would be for such pointers to compare unequal.  The 
standard gives latitude for them to compare equal, but there is never an 
observable requirement that they do, even if some other comparison had 
that result.
Comment 16 Richard Biener 2015-04-07 08:45:10 UTC
*** Bug 65679 has been marked as a duplicate of this bug. ***
Comment 17 Alexander Cherepanov 2015-11-15 22:30:38 UTC
(In reply to joseph@codesourcery.com from comment #1)
> Just because two pointers print the same and have the same bit-pattern 
> doesn't mean they need to compare equal

The standard seems to disagree. C11, 6.2.6.1p4: "Two values (other than NaNs) with the same object representation compare equal".

;-)

(In reply to joseph@codesourcery.com from comment #3)
> Except within a larger object, I'm not aware of any reason the cases of 
> two objects following or not following each other in memory must be 
> mutually exclusive.

I guess it depends on the transitivity of the == operator. After this bug is fixed it will be possible to constuct a third pointer r from two pointer p and q such that r == p and r == q but p != q. For p and q take &x + 1 and &y as above, obtain r by stripping provenance info from p or q (e.g. by printf/scanf with %p).

My impression is that the text of the standard implies interchangability of equal pointers (and hence transitivity of == ) but this area is underspecified and probably could be fixed in a way that doesn't imply transitivity of == . But is gcc ok with this? This bug and pr65752 show some complexities. OTOH == is not reflexive for double and it's ok.
Comment 18 Alexander Cherepanov 2015-11-19 19:11:13 UTC
A bit simplified variant:

#include <stdio.h>

int main()
{
   int x, y = 1;
   int *volatile v;
   int *p;

   v = &y;
   p = v;
   if (p == &x + 1) {
     *p = 2;
     printf("y = %d\n", y);
   }
}

077t.alias dump shows such "Points-to sets" (among others):

v = { y }
p_5 = { y } same as v

and then the code:

   <bb 3>:
   *p_5 = 2;
   y.0_7 = y;
   printf ("y = %d\n", y.0_7);

Seems right.

081t.vrp1 dump shows such "Value ranges after VRP":

p_11: [&MEM[(void *)&x + 4B], &MEM[(void *)&x + 4B]]  EQUIVALENCES: { 
p_5 } (1 elements)

and the code:

   <bb 3>:
   MEM[(int *)&x + 4B] = 2;
   y.0_7 = y;
   printf ("y = %d\n", y.0_7);

Seems wrong.

gcc 5.2.0

On 2015-11-16 01:30, ch3root at openwall dot com wrote:
> I guess it depends on the transitivity of the == operator. After this bug is
> fixed it will be possible to constuct a third pointer r from two pointer p and
> q such that r == p and r == q but p != q. For p and q take &x + 1 and &y as
> above, obtain r by stripping provenance info from p or q (e.g. by printf/scanf
> with %p).

This bug turned out to be not that tricky after all. The program:

#include <stdio.h>

int main()
{
   int x, y;
   void *p = &x + 1, *q = &y, *r;

   /* Strip p of provenance info */
   /* To simplify testing: */
   char s[100]; sprintf(s, "%p", p); sscanf(s, "%p", &r);
   /* Instead, imagine this:
   printf("%p or %p? ", p, q); scanf("%p", &r);
   */

   char *eq[] = {"!=", "=="};
   printf("r %s p, r %s q, p %s q\n", eq[r == p], eq[r == q], eq[p == q]);
}

prints "r == p, r == q, p != q" and the first two equalities are 
essentially mandated by C11 (unless you patch it by making one of them UB).
Comment 19 Alexander Cherepanov 2016-06-27 19:14:14 UTC
(In reply to joseph@codesourcery.com from comment #3)
> Except within a larger object, I'm not aware of any reason the cases of 
> two objects following or not following each other in memory must be 
> mutually exclusive.

Apparently some folks use linker scripts to get a specific arrangement of objects.

A fresh example is a problem in Linux -- https://lkml.org/lkml/2016/6/25/77 . A simplified example from http://pastebin.com/4Qc6pUAA :

extern int __start[];
extern int __end[];
 
extern void bar(int *);
 
void foo()
{
    for (int *x = __start; x != __end; ++x)
        bar(x);
}

This is optimized into an infinite loop by gcc 7 at -O.
Comment 20 Andrew Pinski 2016-07-03 18:34:34 UTC
(In reply to Alexander Cherepanov from comment #19)
> (In reply to joseph@codesourcery.com from comment #3)
> > Except within a larger object, I'm not aware of any reason the cases of 
> > two objects following or not following each other in memory must be 
> > mutually exclusive.
> 
> Apparently some folks use linker scripts to get a specific arrangement of
> objects.
> 
> A fresh example is a problem in Linux -- https://lkml.org/lkml/2016/6/25/77
> . A simplified example from http://pastebin.com/4Qc6pUAA :
> 
> extern int __start[];
> extern int __end[];
>  
> extern void bar(int *);
>  
> void foo()
> {
>     for (int *x = __start; x != __end; ++x)
>         bar(x);
> }


To get around the above example:
extern int __start[];
extern int __end[];
 
extern void bar(int *);
 
void foo()
{
    int *x = __start;
    int *y = __end;
    asm("":"+r"(x));
    asm("":"+r"(y));
    for (; x != y; ++x)
        bar(x);
}

> 
> This is optimized into an infinite loop by gcc 7 at -O.
Comment 21 Andrew Pinski 2016-10-19 07:40:08 UTC
Invalid as mentioned a few times already but never actually closed until now.
Comment 22 Harald van Dijk 2016-10-19 12:00:47 UTC
(In reply to Andrew Pinski from comment #21)
> Invalid as mentioned a few times already but never actually closed until now.

I posted a strictly conforming program that with GCC does not behave as required by the standard. The issue is valid, even if the original test case is not.
Comment 23 Richard Biener 2016-10-19 13:38:04 UTC
.
Comment 24 James Y Knight 2018-05-03 23:08:08 UTC
FWIW, clang did consider this a bug and fixed it in https://bugs.llvm.org/show_bug.cgi?id=21327.
Comment 25 Richard Biener 2018-05-08 07:22:22 UTC
(In reply to Harald van Dijk from comment #22)
> (In reply to Andrew Pinski from comment #21)
> > Invalid as mentioned a few times already but never actually closed until now.
> 
> I posted a strictly conforming program that with GCC does not behave as
> required by the standard. The issue is valid, even if the original test case
> is not.

If you are talking about the one in comment#12 then this is the same issue
as present in a few other "similar" bugs where GCC propagates conditional
equivalences (for example the linked PR65752):

  v = &y;
  p = v;
  x = 2, y = 1;
  if (p == &x + 1)
    *p = 2;

is turned into

  v = &y;
  p = v;
  x = 2, y = 1;
  if (p == &x + 1)
    *(&x + 1) = 2;

by GCC and the store is then no longer possibly aliasing y.

Conditional equivalences are a difficult thing to exploit for optimization
and there's some work in progress for the standard regarding to pointer
provenance which IIRC says that the comparison result of &y == &x + 1
returns an unspecified value.  Not sure if that helps us but then the
only way our for GCC for this particular issue would be to never actually
propagate conditional equivalences.

Sth that might be worth investigating, but within the current structure of
the optimization passes that apply this transform it's impossible to decide
whether a value resulted from conditional equivalences or not...  I'm also
not sure to what extent simplification results using a conditional predicate
like p == &x + 1 are affected as well.

IMHO it's a defect in the language if

  p = &y;
  if (p == &x + 1)
    *p = 2;

is valid but

  p = &y;
  if (p == &x + 1)
    *(&x + 1) = 2;

is invoking undefined behavior.  Or at least a very uncomfortable situation
for a compiler writer.  IMHO the pointer provenance work making the
comparison having unspecified result doesn't really help since that doesn't
make it invoke undefined behavior.
Comment 26 Richard Biener 2018-05-08 07:25:45 UTC
(In reply to James Y Knight from comment #24)
> FWIW, clang did consider this a bug and fixed it in
> https://bugs.llvm.org/show_bug.cgi?id=21327.

Unfortunately it isn't visible _what_ change fixed this and thus if just
some more massaging of the testcase is necessary to make the bug resurface
or if LLVM found a clever way to attack the underlying issue (whatever
underlying issue LLVM had - I'm only guessing it may be the same conditional
propagation).
Comment 27 Harald van Dijk 2018-05-08 09:46:43 UTC
(In reply to Richard Biener from comment #25)
> (In reply to Harald van Dijk from comment #22)
> > (In reply to Andrew Pinski from comment #21)
> > > Invalid as mentioned a few times already but never actually closed until now.
> > 
> > I posted a strictly conforming program that with GCC does not behave as
> > required by the standard. The issue is valid, even if the original test case
> > is not.
> 
> If you are talking about the one in comment#12 then this is the same issue
> as present in a few other "similar" bugs where GCC propagates conditional
> equivalences (for example the linked PR65752):

Right, there are a lot of ways this can come up.

> Conditional equivalences are a difficult thing to exploit for optimization
> and there's some work in progress for the standard regarding to pointer
> provenance which IIRC says that the comparison result of &y == &x + 1
> returns an unspecified value.

For C++ it's http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1652

> Not sure if that helps us

I don't think it does. Although the change would allow p == &x + 1 to evaluate as false even though they have the same address, there is no sane way for GCC to actually let it evaluate it as false when p comes from a volatile variable.

> but then the
> only way our for GCC for this particular issue would be to never actually
> propagate conditional equivalences.

Well, there are two incompatible optimisations. This one could be disabled or restricted, or see below.

> IMHO it's a defect in the language if
> 
>   p = &y;
>   if (p == &x + 1)
>     *p = 2;
> 
> is valid but
> 
>   p = &y;
>   if (p == &x + 1)
>     *(&x + 1) = 2;
> 
> is invoking undefined behavior.

A legitimate reading of C90 and C99 says the second is valid as well, but it's not the reading the committee went with. Allowing this, as an extension to what the standards allow, would be a way to keep the p -> &x + 1 transformation working. It would naturally break some of the current optimisations that GCC performs, but so would the alternative.

(In reply to Richard Biener from comment #26)
> Unfortunately it isn't visible _what_ change fixed this

The revision number is listed in Richard Smith's second comment. The changes can be seen with

  svn diff -c 220343 https://llvm.org/svn/llvm-project/

That's also where I got the C++ issue number from.

> and thus if just
> some more massaging of the testcase is necessary to make the bug resurface
> or if LLVM found a clever way to attack the underlying issue (whatever
> underlying issue LLVM had - I'm only guessing it may be the same conditional
> propagation).

When they turn a comparison between a pointer to an object and a pointer to one past an object into a non-constant expression, that's apparently enough for them to force the comparison to be performed at run-time.
Comment 28 rguenther@suse.de 2018-05-08 10:09:00 UTC
On Tue, 8 May 2018, harald at gigawatt dot nl wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
> 
> --- Comment #27 from Harald van Dijk <harald at gigawatt dot nl> ---
> (In reply to Richard Biener from comment #26)
> > Unfortunately it isn't visible _what_ change fixed this
> 
> The revision number is listed in Richard Smith's second comment. The changes
> can be seen with
> 
>   svn diff -c 220343 https://llvm.org/svn/llvm-project/
> 
> That's also where I got the C++ issue number from.

OK, so that's a frontend only change (constexpr).

> > and thus if just
> > some more massaging of the testcase is necessary to make the bug resurface
> > or if LLVM found a clever way to attack the underlying issue (whatever
> > underlying issue LLVM had - I'm only guessing it may be the same conditional
> > propagation).
> 
> When they turn a comparison between a pointer to an object and a pointer to one
> past an object into a non-constant expression, that's apparently enough for
> them to force the comparison to be performed at run-time.

I'm quite sure they manage to "optimize"

int a, b;
bool foo()
{
  return &a == &b;
}

as well as

int a, b;
bool foo(int i)
{
  if (i == 1)
    return &a == &b + i;
  return true;
}

hmm, clang 3.8 does not.  It even fails to optimize &a == &b + 2
which would be a valid optimization (to false) as this bug is only
about one-past, not about arbitrary compares.

As said the real issue in GCC is the propagation of the
address constant triggered by the conditional equality.
In the other PR that re-surfaces even for integer comparisons.
Comment 29 Peter Sewell 2018-05-08 13:01:41 UTC
On 8 May 2018 at 08:22, rguenth at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org
> wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
>
> --- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Harald van Dijk from comment #22)
> > (In reply to Andrew Pinski from comment #21)
> > > Invalid as mentioned a few times already but never actually closed
> until now.
> >
> > I posted a strictly conforming program that with GCC does not behave as
> > required by the standard. The issue is valid, even if the original test
> case
> > is not.
>
> If you are talking about the one in comment#12 then this is the same issue
> as present in a few other "similar" bugs where GCC propagates conditional
> equivalences (for example the linked PR65752):
>
>   v = &y;
>   p = v;
>   x = 2, y = 1;
>   if (p == &x + 1)
>     *p = 2;
>
> is turned into
>
>   v = &y;
>   p = v;
>   x = 2, y = 1;
>   if (p == &x + 1)
>     *(&x + 1) = 2;
>
> by GCC and the store is then no longer possibly aliasing y.
>
> Conditional equivalences are a difficult thing to exploit for optimization
> and there's some work in progress for the standard regarding to pointer
> provenance which IIRC says that the comparison result of &y == &x + 1
> returns an unspecified value.  Not sure if that helps us


FYI, the current state of that work in progress is here:

https://cdn.rawgit.com/C-memory-object-model-study-group/c-mom-sg/master/notes/cmom-0001-2018-05-04-sewell-clarifying-provenance-v4.html

and comments from a GCC perspective would be much appreciated.
It's been informed by useful discussion at the recent WG14 and EuroLLVM
meetings.

Our current proposal indeed makes that comparison an unspecified value -
more
generally, allowing any pointer equality comparison to either take
provenance
into account or not - exactly because we see GCC do so in some cases.
If that isn't important for optimisation, returning to a fully concrete
semantics
for == would be a simpler choice.

but then the
> only way our for GCC for this particular issue would be to never actually
> propagate conditional equivalences.
>

(Conceivably it could be allowed where the compiler can see that the two
have the same provenance.  We've no idea how useful that would be.)


>
> Sth that might be worth investigating, but within the current structure of
> the optimization passes that apply this transform it's impossible to decide
> whether a value resulted from conditional equivalences or not...  I'm also
> not sure to what extent simplification results using a conditional
> predicate
> like p == &x + 1 are affected as well.
>
> IMHO it's a defect in the language if
>
>   p = &y;
>   if (p == &x + 1)
>     *p = 2;
>
> is valid but
>
>   p = &y;
>   if (p == &x + 1)
>     *(&x + 1) = 2;
>
> is invoking undefined behavior.  Or at least a very uncomfortable situation
> for a compiler writer.  IMHO the pointer provenance work making the
> comparison having unspecified result doesn't really help since that doesn't
> make it invoke undefined behavior.
>

It's not clear how this could be resolved. For the source-language
semantics,
if one wants to be able to do provenance-based alias analysis, we don't see
any clean way in which the second could be allowed.
And forbidding the first would need one to make == of pointers with
different
provenances UB, which we imagine would break a lot of C code.

That said, in general the intermediate-language semantics might be quite
different
from the C-source-language semantics (as we discover in discussion with
Nuno Lopes
and his colleagues about their LLVM semantics), so long as it implements
the source
semantics.

Peter, Kayvan, Victor


>
> --
> You are receiving this mail because:
> You reported the bug.
>
Comment 30 Richard Biener 2018-05-16 09:04:19 UTC
Another interesting example in PR85800 where the offending "bad" transformation is
for char a, b

  if (a == b)
    a[i] = a;
  else
    a[i] = b;

if-convert that to

  a[i] = b;

because a and b have different pointer provenance -- runtime equal pointers
&x and &y+1 (one-after-end) again.  The if-converted result results in
a[i] having same provenance as b rather than "both" (GCC happily tracks
provenance union).

In isolation avoiding this kind of transforms is bad (consider this is isolated
into a separate function and later inlined).