Take: ``` void f(unsigned char * __restrict src, unsigned stride, int h, int row) { unsigned char *src1 = src+row*stride; for(int col = 0; col < h; col++) { src1[col]=src1[col] + 1; } } void f1(unsigned char * __restrict src, unsigned rs, unsigned cs, unsigned stride, int h, int row) { for(int col = 0; col < h; col++) { src[row*stride+col]=src[row*stride+col] + 1; } } ``` These 2 function should be vectorized. But f1 does not. If we use -m32 (or -mabi=ilp32 on aarch64) f1 does now get vectorized. Note LLVM is able to vectorize f1 for both aarch64 and x86_64.
The issue is that row*stride + col doesn't evolve with an affine function as far as SCEV is concerned. t.c:6:32: missed: failed: evolution of base is not affine. The "issue" is that we are dealing with pointer arithmetic here and the row*stride + col can overflow but SCEV doesn't have "assumptions" we could verify at runtime. Use a signed 'stride', otherwise all arithmetic is promoted unsigned. Duplicate of other similar bugs.
Ah, row*stride is loop invariant and that saves us with -m32. With -m64 we end up with <bb 3> [local count: 105119324]: row.0_1 = (unsigned int) row_13(D); _2 = row.0_1 * stride_14(D); <bb 4> [local count: 955630224]: # col_20 = PHI <col_17(6), 0(3)> col.1_3 = (unsigned int) col_20; _4 = _2 + col.1_3; _5 = (sizetype) _4; _6 = src_15(D) + _5; _7 = *_6; which is "unfortuante" association of the (sizetype) conversion. But as the col addtition is unsigned it might overflow and we can't associate the (sizetype) conversion but it makes the result non-affine. A runtime versioning would be necessary, guaranteeing that _2 + col.1_3 never overflows.
I am going to take a stab at implementing this ...