Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a "Segmentation fault: 11" on all my Intel machines with SSE: ----------------------------------- double Q[4*64]; double P[5*64]; int main() { int i, j; double *p = P; double *q = Q; for (j=0; j<32; ++j) { for (i=0; i<4; ++i) { q[i] = p[i]; } q += 4; p += 5; } return 0; } ----------------------------------- Looking at the assembly code the problem is in ----------------------------------- L2: movapd 16(%rax), %xmm0 addq $40, %rax addq $32, %rdx movapd -40(%rax), %xmm1 movaps %xmm0, -16(%rdx) movaps %xmm1, -32(%rdx) cmpq %rcx, %rax jne L2 ----------------------------------- So %rax contains the address of p. But even if p=P is initially alined correctly on a 16-Byte address P+5 is not. So movapd must not be used. Changing the assembly code manually to ----------------------------------- L2: movupd 16(%rax), %xmm0 addq $40, %rax addq $32, %rdx movupd -40(%rax), %xmm1 movaps %xmm0, -16(%rdx) movaps %xmm1, -32(%rdx) cmpq %rcx, %rax jne L2 ----------------------------------- fixed the problem. Cheers, Michael
I can reproduce the SEGV with current gcc-6, 5, and 4.9, but not with 4.8 or 4.7. Dropping from -O3 to -O2 also prevents the SEGV.
Started with r197189.
I believe I fixed this on trunk for PR66510. Confirmed with GCC 5 though. GCC 6 now generates .L2: movupd 16(%rax), %xmm0 addq $40, %rax addq $32, %rdx movupd -40(%rax), %xmm1 movaps %xmm0, -16(%rdx) movaps %xmm1, -32(%rdx) cmpq $P+1240, %rax jne .L2 movupd P+1256(%rip), %xmm0 xorl %eax, %eax movupd P+1240(%rip), %xmm1 movaps %xmm0, Q+1008(%rip) movaps %xmm1, Q+992(%rip) ret might be a bit hard to backport the changes though. Mine nevertheless.
The relevant hunk is the following but as it requires us to know the final vectorization factor it requires us to re-order (and split) the pipeline quite a bit. r223670 has bits of the prerequesites but not all and r224598 has the fix below (and some unrelated bits). Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c (revision 230924) +++ gcc/tree-vect-data-refs.c (working copy) @@ -701,21 +701,22 @@ vect_compute_data_ref_alignment (struct } } - /* Similarly, if we're doing basic-block vectorization, we can only use - base and misalignment information relative to an innermost loop if the - misalignment stays the same throughout the execution of the loop. - As above, this is the case if the stride of the dataref evenly divides - by the vector size. */ - if (!loop) + /* Similarly we can only use base and misalignment information relative to + an innermost loop if the misalignment stays the same throughout the + execution of the loop. As above, this is the case if the stride of + the dataref evenly divides by the vector size. */ + else { tree step = DR_STEP (dr); - HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); + unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1; - if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0) + if (tree_fits_shwi_p (step) + && ((tree_to_shwi (step) * vf) + % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "SLP: step doesn't divide the vector-size.\n"); + "step doesn't divide the vector-size.\n"); misalign = NULL_TREE; } }
Oh, and it needs r223798.
GCC 4.9 branch is being closed
Sehr geehrte Damen und Herren, vielen Dank für Ihre Nachricht. Ich bin bis zum 30. September 2016 nicht im Büro und kann Ihre Nachricht nicht entgegennehmen. Ihre Mail wird nicht gelesen und nicht weiter geleitet. Vielen Dank. Mit freundlichen Grüßen Michael Lehn ------------------------------------------------------------------------ I will be away until September 30 with limited internet access. Your email will not be read. Cheers, Michael
GCC 5 branch has been closed, should be fixed in GCC 6 and later.