[forwarded from http://bugs.debian.org/663654] The following versions of gcc: Debian gcc-4.6.3-1, Debain gcc-4.4.6-14, Debian gcc-4.6.2-14, Debian gcc-4.4.6-15, Ubuntu 4.4.3-4ubuntu5 generates *wrong* code - aligned vector loads instead of unaligned vector loads for x86_64 arch. This causes the compiled code to crash with SIGSEGV(General Protection Fault). Bug *not* present on trunk and gcc-4.5.3-12. Consider the following program: void foo(int* __restrict ia, int n){ int i; for(i=0;i<n;i++){ ia[i]=ia[i]*ia[i]; } } int main(){ int a[9]; int sum=0,i; for(i=0;i<9;i++){ a[i]=(i*i)%128; } foo((int*)((char*)a+2), 8); for(i=0;i<9;i++){ sum+=a[i]; } return sum; } In x86 and x86_64, unaligned word access are valid - *((int*)<unaligned memory address>) But x86_64 SSE has two kinds of vector instructions - aligned vector move (movdqa) - unaligned vector move (movdqu) Use of aligned vector move with an unaligned vector address, will trigger the application to crash. When compiled with any of the following command lines: gcc -O3 foo.c g++ -O3 foo.c gcc -m64 -O2 -ftree-vectorize gcc_bug.c g++ -m64 -O2 -ftree-vectorize gcc_bug.c gcc generates an aligned vector load movdqa -54(%rsp,%rax), %xmm0 instead of unaligned vector load - movdqu. This result in above application to crash with SIGSEGV(General Protection Fault). gcc-4.7 correctly generates movdqu -54(%rsp), %xmm0
The testcase is invalid C, while x86_64/i?86 will do the expected thing of doing unaligned loads/stores silently, it won't do that in vectorized code or for atomic accesses. You need to tell the compiler that ia isn't aligned through aligned attribute. E.g. typedef int T __attribute__((aligned (2))); and using T *__restrict ia instead of int *__restrict ia.
(In reply to comment #1) > The testcase is invalid C, while x86_64/i?86 will do the expected thing of > doing unaligned loads/stores silently, it won't do that in vectorized code or > for atomic accesses. Shouldn't the compiler vectorize the code _conservatively_, by generating code to check if the address is aligned or generating unaligned vector load instructions, as any code written for x86_64 will break with -O3, with newer gcc. Also note that, this bug will get triggered only when __restricted is used. If you remove __restricted, gcc is generating proper code. Also it works properly for gcc 4.7 too (even with __restricted).