Bug 85800 - A miscompilation bug with unsigned char
Summary: A miscompilation bug with unsigned char
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 7.3.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 61502
Blocks:
  Show dependency treegraph
 
Reported: 2018-05-16 03:10 UTC by Juneyoung Lee
Modified: 2021-05-14 14:59 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-05-16 00:00:00


Attachments
A source file that raises the bug. (574 bytes, text/plain)
2018-05-16 03:10 UTC, Juneyoung Lee
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Juneyoung Lee 2018-05-16 03:10:00 UTC
Created attachment 44136 [details]
A source file that raises the bug.

```
$ cat test-main.c
#include <string.h>
#include <stdio.h>
#include <stdint.h>

// If f(A, B + 4) is given, and integer representation of A and B + 4
// are the same, c1 == c2 in the loop becomes true,
// so arr3 = arr1. Hence r = A, and *A should be 10.
// However, if compiled with -O3, *A is still 1.
void store_10_to_p(int *p, int *q) {
  unsigned char arr1[8];
  unsigned char arr2[8];
  unsigned char arr3[8];
  // Type punning with char* is allowed.
  memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p));
  memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q));
  // Now arr1 is p, arr2 is q.

  for (int i = 0; i < sizeof(q); i++) {
    int c1 = (int)arr1[i], c2 = (int)arr2[i];
    // Note that c1 == c2 is a comparison between _integers_ (not pointers).
    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];
  }
  // Now arr3 is equivalent to arr1, which is p.
  int *r;
  memcpy(&r, (unsigned char *)arr3, sizeof(r));
  // Now r is p.
  *p = 1;
  *r = 10;
}

int main() {
  int B[4], A[4];
  printf("%p %p\n", A, &B[4]);
  store_10_to_p(A, &B[4]);
  printf("%d\n", A[0]);
  return 0;
}
$ gcc -O3 -o test-main test-main.c
$ ./test-main
0x7fffffffe5a0 0x7fffffffe5a0
1
$ gcc -O0 -o test-main test-main.c
$ ./test-main
0x7fffffffe5b0 0x7fffffffe5b0
10
```

The output should be 10 because the integral representation of A and B[4] are the same, but compiling with -O3 gives 1.
Comment 1 Richard Biener 2018-05-16 07:39:21 UTC
Hmm, I can't reproduce it because the arrays happen to be not adjacent for me.
But if I change

    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];

to

    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr2[i];
    else
      arr3[i] = arr1[i];

I get 10 consistently.  This is because then arr3 will indeed be A.

Changing main to

int main() {
  struct { int B[4]; int A[4]; } a;
  printf("%p %p\n", a.A, &a.B[4]);
  store_10_to_p(a.A, &a.B[4]);
  printf("%d\n", a.A[0]);
  return 0;
}

also makes it work reliably for me.

I guess at -O3 you get store_10_to_p inlined.  You can try
changing the function to

void __attribute__((noinline,noclone)) store_10_to_p(int *p, int *q) {

to see if that makes any difference.

As said, I don't see any bug here.
Comment 2 Juneyoung Lee 2018-05-16 08:34:07 UTC
If address is not adjacent - you can reorder the definition of A and B and try again.
```
  int A[4], B[4];
```

If changing

    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];

to

    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr2[i];
    else
      arr3[i] = arr1[i];

removed miscompilation, I think this implies that the former one is folded into `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but the latter one is folded into `arr[3] = arr1[i]`.

This explains the generated assembly as well: Compiling store_10_to_p with -O3 generates

```
store_10_to_p(int*, int*):
  mov DWORD PTR [rdi], 1
  mov DWORD PTR [rsi], 10
  ret
```

meaning that it is actually trying to store 10 to *q, which is UB if the function is inlined. (B[4] is out-of-bounds)
Comment 3 rguenther@suse.de 2018-05-16 09:00:28 UTC
On Wed, 16 May 2018, juneyoung.lee at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800
> 
> --- Comment #2 from Juneyoung Lee <juneyoung.lee at sf dot snu.ac.kr> ---
> If address is not adjacent - you can reorder the definition of A and B and try
> again.
> ```
>   int A[4], B[4];
> ```
> 
> If changing
> 
>     if (c1 == c2)
>       // Always true if p and q have same integer representation.
>       arr3[i] = arr1[i];
>     else
>       arr3[i] = arr2[i];
> 
> to
> 
>     if (c1 == c2)
>       // Always true if p and q have same integer representation.
>       arr3[i] = arr2[i];
>     else
>       arr3[i] = arr1[i];
> 
> removed miscompilation, I think this implies that the former one is folded into
> `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but
> the latter one is folded into `arr[3] = arr1[i]`.
> 
> This explains the generated assembly as well: Compiling store_10_to_p with -O3
> generates
> 
> ```
> store_10_to_p(int*, int*):
>   mov DWORD PTR [rdi], 1
>   mov DWORD PTR [rsi], 10
>   ret
> ```
> 
> meaning that it is actually trying to store 10 to *q, which is UB if the
> function is inlined. (B[4] is out-of-bounds)

Ok, so the sequence of optimization is as follows.  First the
comparison is narrowed so we compare arr1[i] and arr2[i] directly.
Then we sink the store and get

  _10 = arr1[i_9];
  _11 = arr2[i_9];
  if (_10 == _11)
    goto <bb 5>; [34.00%]
  else
    goto <bb 4>; [66.00%]

  <bb 4> [58.67%]:

  <bb 5> [88.89%]:
  # cstore_29 = PHI <_10(3), _11(4)>
  arr3[i_9] = cstore_29;

then this is pattern matched to just

  _1 = arr1[i_4];
  _2 = arr2[i_4];
  arr3[i_4] = _2;

which is correct since if _10 == _11 we can as well store _11
instead of _10 to arr3[i_9].  This is where things go "wrong"
and a "bug" similar to PR61502 arises.  We propagate conditional
equivalences.

Then alias analysis comes along and while it originally
correctly computed arr3 to "point" to both A and B it now
only sees the provenance on B and both stores can be
re-ordered by scheduling.  We expand to RTL from
(-fdump-tree-optimized-alias-uid):

  <bb 2> [11.11%]:
  # USE = nonlocal null { D.2675 D.2676 } (escaped)
  # CLB = nonlocal null { D.2675 D.2676 } (escaped)
  printf ("%p %p\n", &AD.2675, &BD.2676[4]);
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551600
  _8 = (long unsigned intD.10) &BD.2676[4];
  MEM[(charD.7 * {ref-all})&arr2D.2693] = _8;
  _27 = MEM[(charD.7 * {ref-all})&arr2D.2693];
  MEM[(charD.7 * {ref-all})&arr3D.2694] = _27;
  _14 = MEM[(charD.7 * {ref-all})&arr3D.2694];
  # PT = null { D.2676 } (escaped)
  r_15 = (intD.6 *) _14;
  MEM[(intD.6 *)&AD.2675] = 1;
  *r_15 = 10;
  arr2D.2693 ={v} {CLOBBER};
  arr3D.2694 ={v} {CLOBBER};
  # USE = nonlocal null { D.2675 D.2676 } (escaped)
  # CLB = nonlocal null { D.2675 D.2676 } (escaped)
  printf ("%d\n", 1);
  AD.2675 ={v} {CLOBBER};
  BD.2676 ={v} {CLOBBER};
  return 0;

as you can see r_15 is computed to point to BD.2676 only.  The
"bug" reproduces with -O2 as well if you declare store_10_to_p
static inline.