The following function (gcc.dg/tree-ssa/ivopts-lt.c) is miscompiled when compiled with with -O1 for X86_64: #include "stdint.h" void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } The IR after cunroll looks like: void f1 (char * p, uintptr_t i, uintptr_t n) { <bb 2>: p_6 = p_4(D) + i_5(D); <bb 3>: # p_1 = PHI <p_6(2), p_9(5)> # i_2 = PHI <i_5(D)(2), i_10(5)> *p_1 = 0; p_9 = p_1 + 1; i_10 = i_2 + 1; if (i_10 < n_11(D)) goto <bb 5>; else goto <bb 4>; <bb 5>: goto <bb 3>; <bb 4>: return; } This is then changed by ivopts to void f1 (char * p, uintptr_t i, uintptr_t n) { sizetype _13; char * _14; <bb 2>: p_6 = p_4(D) + i_5(D); _13 = n_11(D) - i_5(D); _14 = p_6 + _13; <bb 3>: # p_1 = PHI <p_6(2), p_9(5)> MEM[(char *)p_1] = 0; p_9 = p_1 + 1; if (p_9 < _14) goto <bb 5>; else goto <bb 4>; <bb 5>: goto <bb 3>; <bb 4>: return; } Suppose the function gets called with the values: p = 0x0002ffffffffffff i = 0xffff000000000001 n = 0xdffd7fffffffffff The original function writes 0 to address 0x0002000000000000, and then exits. The optimized function overflows when calculating _14, and the function does the equivalent of memset(0x0002000000000000, 0, 0xdffe7ffffffffffe);
I think the point is we fail to represent Analyzing # of iterations of loop 1 exit condition [i_5(D) + 1, + , 1] < n_11(D) bounds on difference of bases: -18446744073709551615 ... 18446744073709551615 result: zero if i_5(D) + 1 > n_11(D) # of iterations (n_11(D) - i_5(D)) + 18446744073709551615, bounded by 18446744073709551615 number of iterations (n_11(D) - i_5(D)) + 18446744073709551615; zero if i_5(D) + 1 > n_11(D) specifically the 'zero if i_5(D) + 1 > n_11(D)' I think may_eliminate_iv is wrong here, maybe not considering overflow of the niter expression? I wonder if it is possible to write a runtime testcase that FAILs with reasonable memory requirement/layout.
Here is a runtime testcase: #include <sys/mman.h> #include <unistd.h> #include <stdint.h> __attribute__((noipa)) void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -3 - (uintptr_t)p; f1 (p+2, -2, n); return 0; }
Oops. I messed up the test case... It "works", but the actual values does not make sense... The following is better: int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -2 - (uintptr_t)(p+pgsz); f1 (p+pgsz, -2, n); return 0; }
Confirmed.
It's going wrong in iv_elimination_compare_lt which tries to exactly handle this kind of loop: We aim to handle the following situation: sometype *base, *p; int a, b, i; i = a; p = p_0 = base + a; do { bla (*p); p++; i++; } while (i < b); Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1. We aim to optimize this to p = p_0 = base + a; do { bla (*p); p++; } while (p < p_0 - a + b); This preserves the correctness, since the pointer arithmetics does not overflow. More precisely: 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no overflow in computing it or the values of p. 2) if a + 1 > b, then we need to verify that the expression p_0 - a does not overflow. To prove this, we use the fact that p_0 = base + a. there's either a hole in that logic or the implementation is off. /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not overflow. */ offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step), cand->iv->step, fold_convert (TREE_TYPE (cand->iv->step), a)); if (!difference_cannot_overflow_p (data, cand->iv->base, offset)) return false; where 'A' is 'i', CAND->IV->BASE is 'p + i' and CAND->IV->STEP is 1 as 'sizetype'. That just checks that (p + i) - i doesn't overflow. Somehow it misses to prove p + b doesn't overflow since we end up with p' < (p + i) + (n - i) aka p' < p + n.