103343 – Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer

Bug 103343 - Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer

Summary: Invalid codegen when comparing pointer to one past the end and then dereferen...

Status:	RESOLVED DUPLICATE of bug 93051

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c (show other bugs)
Version:	12.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2021-11-21 03:01 UTC by Gabriel Ravier
Modified:	2023-09-21 07:44 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gabriel Ravier 2021-11-21 03:01:29 UTC

extern int x[1], y;

int f(int *p, int *q) {
    *q = y;
    if (p == (x + 1)) {
        *p = 2;
        return y;
    }
    return 0;
}

GCC trunk currently outputs the following code with -O3:

f:
        mov     eax, DWORD PTR y[rip]
        mov     DWORD PTR [rsi], eax
        cmp     rdi, OFFSET FLAT:x+4
        je      .L5
        xor     eax, eax
        ret
.L5:
        mov     DWORD PTR x[rip+4], 2
        ret

Which is incorrect because `p` could point to `y`, for example if `f` was called as such:

int whatever;
f(&y, &whatever);

and `y` could happen to be located in memory right after `x`.

Also, although the comparison invokes unspecified behavior, this still means only two results are possible according to the standard:
- if `p == (x + 1)` results in `false`, then the result of `f` is 0
- if `p == (x + 1)` results in `true`, then the result of `f` is 2 since we do `*p = 2` and `p` points to `y`.

GCC's optimization makes it so the result can also be the previous value of `y`, which could be something else than 0 or 2.

It seems that GCC assumes that because `p == (x + 1)` it can replace all occurrences of `p` with `x + 1` without any regard to provenance, and doing that change manually would indeed mean the `return y;` could be optimized to use the previous store (and the store to `x + 1` would be UB, too...), but this isn't the case here: `p` could simultaneously validly point to `y` and be equal to `x + 1`.

PS: This also results in plenty of invalid warnings when compiling with -Wall:

<source>: In function 'f':
<source>:6:9: warning: array subscript 1 is outside array bounds of 'int[1]' [-Warray-bounds]
    6 |         *p = 2;
      |         ^~
<source>:1:12: note: at offset 4 into object 'x' of size 4
    1 | extern int x[1], y;
      |            ^

Comment 1 Andrew Pinski 2021-11-21 04:46:24 UTC

Dup of bug 61502. (or a dup of bug 93052 or many others).

*** This bug has been marked as a duplicate of bug 61502 ***

Comment 2 Martin Sebor 2021-11-22 16:28:23 UTC

(In reply to Gabriel Ravier from comment #0)
> PS: This also results in plenty of invalid warnings when compiling with
> -Wall:
> 
> <source>: In function 'f':
> <source>:6:9: warning: array subscript 1 is outside array bounds of 'int[1]'
> [-Warray-bounds]
>     6 |         *p = 2;
>       |         ^~
> <source>:1:12: note: at offset 4 into object 'x' of size 4
>     1 | extern int x[1], y;
>       |            ^

The warning in this case is valid and helpful: it's undefined to attempt to access an object using a pointer derived from a pointer to an unrelated object (the equality between pointers to adjacent objects notwithstanding).

Comment 3 Gabriel Ravier 2021-11-22 20:54:06 UTC

Well the code does not invoke undefined behavior here, it just so happens that `p == (x + 1)` because `y` happens to be laid out in memory after `x` (note: this isn't a guarantee, of course, but GCC can't prove this isn't the case as it's defined in another TU and it's quite easy to make this happen). The comparison doesn't imply the pointers have the same provenance, and the standard has a specific provision for this exact comparison:

"If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object,72 the result of the comparison is unspecified."
- [expr.eq] (https://eel.is/c++draft/expr.eq#3.1)

Also, `y` isn't accessed through a pointer to `x`: I've already said the case where the function is incorrect is when `f` is called with `&y` as the first argument. If doing `p == (x + 1)` implied they derived from the same object, then that would imply after doing `&y == (x + 1)` doing `*&y` would invoke undefined behavior which is obviously ridiculous.

Although there is a case to be made that this code is stupid and deserves a warning, though... I won't argue with that, this code is just something I wrote to test things after a 3 hour long conversation about DR 260 (<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm>) and a lot of standardese lawyering, so it's not intended to be real life code. I'd say, though, that the warning is quite inaccurate in the details of what it's saying, as `p` isn't actually equivalent to `(x + 1)` just because `p == (x + 1)`.

Comment 4 Lénárd Szolnoki 2021-11-22 23:07:21 UTC

A complete program example:

f.h:
```
#pragma once
extern int x[1];
extern int y;

int f(int* p, int* q);
```

f.cpp:
```
#include "f.h"

int f(int* p, int* q) {
    *q = y;
    if (p == (x + 1)) {
        *p = 2;
        return y;
    }
    return 0;
}
```

x_y.cpp:
```
#include "f.h"

int y;
int x[1];
```

main.cpp:
```
#include "f.h"

int main() {
    y=1;
    int i;
    return f(&y, &i);
}
```

Compile with `g++ -o main main.cpp f.cpp x_y.cpp`. https://godbolt.org/z/G4KTKc7hE

The well-formed program above has two possible evaluations, due to the unspecified comparison. In one evaluation `main` returns 0, in the other it returns 2. Compiled with g++ the program returns 1.

Within the single invocation of `f` `p` is pointer to an object, namely `y`. Even after the unspecified comparison evaluates to true, `p` remains a pointer to `y`. Therefore dereferencing `p` is still valid in that branch.

I don't think that it is a duplicate of bug 61502. The program does not rely on the object representation of the pointer objects, their printed value or their value converted to uintptr_t. The only thing that is questionable is the comparison with pointer past the end of an object, which is merely unspecified.

Comment 5 Andrew Pinski 2021-11-22 23:13:53 UTC

> The only thing that is questionable is the comparison with pointer past the end of an object, which is merely unspecified.

Ok, it is a dup of bug 93051.

*** This bug has been marked as a duplicate of bug 93051 ***