Created attachment 44136 [details] A source file that raises the bug. ``` $ cat test-main.c #include <string.h> #include <stdio.h> #include <stdint.h> // If f(A, B + 4) is given, and integer representation of A and B + 4 // are the same, c1 == c2 in the loop becomes true, // so arr3 = arr1. Hence r = A, and *A should be 10. // However, if compiled with -O3, *A is still 1. void store_10_to_p(int *p, int *q) { unsigned char arr1[8]; unsigned char arr2[8]; unsigned char arr3[8]; // Type punning with char* is allowed. memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p)); memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q)); // Now arr1 is p, arr2 is q. for (int i = 0; i < sizeof(q); i++) { int c1 = (int)arr1[i], c2 = (int)arr2[i]; // Note that c1 == c2 is a comparison between _integers_ (not pointers). if (c1 == c2) // Always true if p and q have same integer representation. arr3[i] = arr1[i]; else arr3[i] = arr2[i]; } // Now arr3 is equivalent to arr1, which is p. int *r; memcpy(&r, (unsigned char *)arr3, sizeof(r)); // Now r is p. *p = 1; *r = 10; } int main() { int B[4], A[4]; printf("%p %p\n", A, &B[4]); store_10_to_p(A, &B[4]); printf("%d\n", A[0]); return 0; } $ gcc -O3 -o test-main test-main.c $ ./test-main 0x7fffffffe5a0 0x7fffffffe5a0 1 $ gcc -O0 -o test-main test-main.c $ ./test-main 0x7fffffffe5b0 0x7fffffffe5b0 10 ``` The output should be 10 because the integral representation of A and B[4] are the same, but compiling with -O3 gives 1.
Hmm, I can't reproduce it because the arrays happen to be not adjacent for me. But if I change if (c1 == c2) // Always true if p and q have same integer representation. arr3[i] = arr1[i]; else arr3[i] = arr2[i]; to if (c1 == c2) // Always true if p and q have same integer representation. arr3[i] = arr2[i]; else arr3[i] = arr1[i]; I get 10 consistently. This is because then arr3 will indeed be A. Changing main to int main() { struct { int B[4]; int A[4]; } a; printf("%p %p\n", a.A, &a.B[4]); store_10_to_p(a.A, &a.B[4]); printf("%d\n", a.A[0]); return 0; } also makes it work reliably for me. I guess at -O3 you get store_10_to_p inlined. You can try changing the function to void __attribute__((noinline,noclone)) store_10_to_p(int *p, int *q) { to see if that makes any difference. As said, I don't see any bug here.
If address is not adjacent - you can reorder the definition of A and B and try again. ``` int A[4], B[4]; ``` If changing if (c1 == c2) // Always true if p and q have same integer representation. arr3[i] = arr1[i]; else arr3[i] = arr2[i]; to if (c1 == c2) // Always true if p and q have same integer representation. arr3[i] = arr2[i]; else arr3[i] = arr1[i]; removed miscompilation, I think this implies that the former one is folded into `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but the latter one is folded into `arr[3] = arr1[i]`. This explains the generated assembly as well: Compiling store_10_to_p with -O3 generates ``` store_10_to_p(int*, int*): mov DWORD PTR [rdi], 1 mov DWORD PTR [rsi], 10 ret ``` meaning that it is actually trying to store 10 to *q, which is UB if the function is inlined. (B[4] is out-of-bounds)
On Wed, 16 May 2018, juneyoung.lee at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800 > > --- Comment #2 from Juneyoung Lee <juneyoung.lee at sf dot snu.ac.kr> --- > If address is not adjacent - you can reorder the definition of A and B and try > again. > ``` > int A[4], B[4]; > ``` > > If changing > > if (c1 == c2) > // Always true if p and q have same integer representation. > arr3[i] = arr1[i]; > else > arr3[i] = arr2[i]; > > to > > if (c1 == c2) > // Always true if p and q have same integer representation. > arr3[i] = arr2[i]; > else > arr3[i] = arr1[i]; > > removed miscompilation, I think this implies that the former one is folded into > `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but > the latter one is folded into `arr[3] = arr1[i]`. > > This explains the generated assembly as well: Compiling store_10_to_p with -O3 > generates > > ``` > store_10_to_p(int*, int*): > mov DWORD PTR [rdi], 1 > mov DWORD PTR [rsi], 10 > ret > ``` > > meaning that it is actually trying to store 10 to *q, which is UB if the > function is inlined. (B[4] is out-of-bounds) Ok, so the sequence of optimization is as follows. First the comparison is narrowed so we compare arr1[i] and arr2[i] directly. Then we sink the store and get _10 = arr1[i_9]; _11 = arr2[i_9]; if (_10 == _11) goto <bb 5>; [34.00%] else goto <bb 4>; [66.00%] <bb 4> [58.67%]: <bb 5> [88.89%]: # cstore_29 = PHI <_10(3), _11(4)> arr3[i_9] = cstore_29; then this is pattern matched to just _1 = arr1[i_4]; _2 = arr2[i_4]; arr3[i_4] = _2; which is correct since if _10 == _11 we can as well store _11 instead of _10 to arr3[i_9]. This is where things go "wrong" and a "bug" similar to PR61502 arises. We propagate conditional equivalences. Then alias analysis comes along and while it originally correctly computed arr3 to "point" to both A and B it now only sees the provenance on B and both stores can be re-ordered by scheduling. We expand to RTL from (-fdump-tree-optimized-alias-uid): <bb 2> [11.11%]: # USE = nonlocal null { D.2675 D.2676 } (escaped) # CLB = nonlocal null { D.2675 D.2676 } (escaped) printf ("%p %p\n", &AD.2675, &BD.2676[4]); # RANGE [0, 18446744073709551615] NONZERO 18446744073709551600 _8 = (long unsigned intD.10) &BD.2676[4]; MEM[(charD.7 * {ref-all})&arr2D.2693] = _8; _27 = MEM[(charD.7 * {ref-all})&arr2D.2693]; MEM[(charD.7 * {ref-all})&arr3D.2694] = _27; _14 = MEM[(charD.7 * {ref-all})&arr3D.2694]; # PT = null { D.2676 } (escaped) r_15 = (intD.6 *) _14; MEM[(intD.6 *)&AD.2675] = 1; *r_15 = 10; arr2D.2693 ={v} {CLOBBER}; arr3D.2694 ={v} {CLOBBER}; # USE = nonlocal null { D.2675 D.2676 } (escaped) # CLB = nonlocal null { D.2675 D.2676 } (escaped) printf ("%d\n", 1); AD.2675 ={v} {CLOBBER}; BD.2676 ={v} {CLOBBER}; return 0; as you can see r_15 is computed to point to BD.2676 only. The "bug" reproduces with -O2 as well if you declare store_10_to_p static inline.