For --- #define N 16 float b[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; float c[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; float a[N]; void test (void) { int i; for (i = 0; i < N/2; i++) a[i] = b[2*i+1] * c[2*i+1]; } --- vectorizer generates: test () { unsigned int ivtmp.30; vector(4) float * vect_pa.29; vector(4) float * vect_pa.26; vector(4) float vect_var_.25; vector(4) float vect_perm_odd.24; vector(4) float vect_perm_even.23; vector(4) float vect_var_.22; vector(4) float vect_var_.21; vector(4) float * c.20; vector(4) float * vect_pc.19; vector(4) float * vect_pc.16; vector(4) float vect_perm_odd.15; vector(4) float vect_perm_even.14; vector(4) float vect_var_.13; vector(4) float vect_var_.12; vector(4) float * b.11; vector(4) float * vect_pb.10; vector(4) float * vect_pb.7; unsigned int ivtmp.6; int i; float D.2731; float D.2730; float D.2729; int D.2728; int D.2727; <bb 2>: b.11_18 = (vector(4) float *) &b; vect_pb.10_21 = b.11_18 + 4; vect_pb.7_22 = vect_pb.10_21; c.20_30 = (vector(4) float *) &c; vect_pc.19_31 = c.20_30 + 4; vect_pc.16_32 = vect_pc.19_31; vect_pa.29_41 = (vector(4) float *) &a; vect_pa.26_42 = vect_pa.29_41; <bb 3>: # i_14 = PHI <i_10(4), 0(2)> # ivtmp.6_20 = PHI <ivtmp.6_19(4), 8(2)> # vect_pb.7_23 = PHI <vect_pb.7_24(4), vect_pb.7_22(2)> # vect_pc.16_33 = PHI <vect_pc.16_34(4), vect_pc.16_32(2)> # vect_pa.26_43 = PHI <vect_pa.26_44(4), vect_pa.26_42(2)> # ivtmp.30_45 = PHI <ivtmp.30_46(4), 0(2)> D.2727_3 = i_14 * 2; D.2728_4 = D.2727_3 + 1; vect_var_.12_25 = M*vect_pb.7_23{misalignment: 32}; vect_pb.7_26 = vect_pb.7_23 + 16; vect_var_.13_27 = M*vect_pb.7_26{misalignment: 32}; vect_perm_even.14_28 = VEC_EXTRACTEVEN_EXPR <vect_var_.12_25, vect_var_.13_27>; vect_perm_odd.15_29 = VEC_EXTRACTODD_EXPR <vect_var_.12_25, vect_var_.13_27>; D.2729_5 = b[D.2728_4]; vect_var_.21_35 = M*vect_pc.16_33{misalignment: 32}; vect_pc.16_36 = vect_pc.16_33 + 16; vect_var_.22_37 = M*vect_pc.16_36{misalignment: 32}; vect_perm_even.23_38 = VEC_EXTRACTEVEN_EXPR <vect_var_.21_35, vect_var_.22_37>; vect_perm_odd.24_39 = VEC_EXTRACTODD_EXPR <vect_var_.21_35, vect_var_.22_37>; D.2730_8 = c[D.2728_4]; vect_var_.25_40 = vect_perm_even.14_28 * vect_perm_even.23_38; D.2731_9 = D.2729_5 * D.2730_8; *vect_pa.26_43 = vect_var_.25_40; i_10 = i_14 + 1; ivtmp.6_19 = ivtmp.6_20 - 1; vect_pb.7_24 = vect_pb.7_26 + 16; vect_pc.16_34 = vect_pc.16_36 + 16; vect_pa.26_44 = vect_pa.26_43 + 16; ivtmp.30_46 = ivtmp.30_45 + 1; if (ivtmp.30_46 < 2) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return; } The problem is D.2727_3 = i_14 * 2; D.2728_4 = D.2727_3 + 1; vect_var_.12_25 = M*vect_pb.7_23{misalignment: 32}; vect_pb.7_26 = vect_pb.7_23 + 16; vect_var_.13_27 = M*vect_pb.7_26{misalignment: 32}; vect_perm_even.14_28 = VEC_EXTRACTEVEN_EXPR <vect_var_.12_25, vect_var_.13_27>; vect_perm_odd.15_29 = VEC_EXTRACTODD_EXPR <vect_var_.12_25, vect_var_.13_27>; may access memory beyond the array boundary, depending on how VEC_EXTRACTEVEN_EXPR and VEC_EXTRACTODD_EXPR are implemented in backend. The misaligned assess: vect_var_.12_25 = M*vect_pb.7_23{misalignment: 32}; vect_var_.13_27 = M*vect_pb.7_26{misalignment: 32}; may read one element outside of array if backend needs to read in the whole misaligned memory.
Do you mean that extract_even implementation does something illegal with this last element? Misaligned load also accesses elements outside the array, but the problem is in extract_even? Other than doing something in the backend, we can reduce the number of vector iterations in cases that may access elements outside array bounds for specific targets...
(In reply to comment #1) > Do you mean that extract_even implementation does something illegal with this > last element? Misaligned load also accesses elements outside the array, but the > problem is in extract_even? Vectorizer generates vect_var_.12_25 = M*vect_pb.7_23{misalignment: 32}; vect_var_.13_27 = M*vect_pb.7_26{misalignment: 32}; Those may read beyond the end of array. Vectorizer should check that vect_pb.7_23/vect_pb.7_26 + vector size < end of array.
I am curious what is the problem with that? These elements are not used, they are just loaded...
(In reply to comment #3) > I am curious what is the problem with that? These elements are not used, they > are just loaded... An out-of-bounds read can result in a SEGV if the memory is unmapped. Worse things can happen if the memory is "special" (think kernels and MMIO).
Even if we are talking about less than vector size from array boundary? And that boundary is not (vector) aligned.
It depends on the specific values of (a) array end alignment and (b) the number of bytes read. As long as the array end + number of bytes read can cross a page boundary, you're potentially causing SEGV or other errors.
(In reply to comment #6) > It depends on the specific values of (a) array end alignment and (b) the number > of bytes read. As long as the array end + number of bytes read can cross a page > boundary, you're potentially causing SEGV or other errors. I don't think this can happen. The access to the out-of-bounds area only happens if there are pieces inluded in the last (aligned) vector move. That vector move will be aligned so it can't cross page-boundary. As it contains at least one allocated element the access may not trap.
This is a non-bug. The transformation is ok and will never cause a pagefault.