bash-3.2$ cat x.c typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); __m128 __attribute__((noinline)) iszero (__m128 x) { return x; } typedef __m128 __attribute__((aligned(1))) unaligned; __m128 __attribute__((noinline)) foo (__m128 a1, __m128 a2, __m128 a3, __m128 a4, __m128 a5, __m128 a6, __m128 a7, __m128 a8, int b1, int b2, int b3, int b4, int b5, int b6, int b7, unaligned y) { return iszero (y); } int main (void) { unaligned x; __m128 y, x0 = { 0 }; x = x0; y = foo (x0, x0, x0, x0, x0, x0, x0, x0, 1, 2, 3, 4, 5, 6, 7, x); return !__builtin_memcmp (&y, &x0, sizeof (y)); } bash-3.2$ /export/build/gnu/gcc/build-x86_64-linux/stage1-gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/stage1-gcc/ -O x.c -o x bash-3.2$ ./x Segmentation fault bash-3.2$ The issue here is V4SFmode may not always be properly aligned. This is very similar to PR 32000. The difference is TDmode is passed as TImode on the stack. But here V4SFmode is used. The same problem exists to all other SSE modes.
*** Bug 35771 has been marked as a duplicate of this bug. ***
Middle end use canonical type for passing parameters to function calls. ix86_function_arg_boundary should do the same. Otherwise, there will be a mismatch.
Subject: Bug 35767 Author: hjl Date: Tue May 27 20:18:33 2008 New Revision: 136054 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136054 Log: gcc/ 2008-05-27 H.J. Lu <hongjiu.lu@intel.com> PR target/35767 PR target/35771 * config/i386/i386.c (ix86_function_arg_boundary): Use alignment of canonical type. (ix86_expand_vector_move): Check unaligned memory access for all SSE modes. gcc/testsuite/ 2008-05-27 H.J. Lu <hongjiu.lu@intel.com> PR target/35767 PR target/35771 * gcc.target/i386/pr35767-1.c: New. * gcc.target/i386/pr35767-1d.c: Likewise. * gcc.target/i386/pr35767-1i.c: Likewise. * gcc.target/i386/pr35767-2.c: Likewise. * gcc.target/i386/pr35767-2d.c: Likewise. * gcc.target/i386/pr35767-2i.c: Likewise. * gcc.target/i386/pr35767-3.c: Likewise. * gcc.target/i386/pr35767-4.c: Likewise. * gcc.target/i386/pr35767-5.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr35767-1.c trunk/gcc/testsuite/gcc.target/i386/pr35767-1d.c trunk/gcc/testsuite/gcc.target/i386/pr35767-1i.c trunk/gcc/testsuite/gcc.target/i386/pr35767-2.c trunk/gcc/testsuite/gcc.target/i386/pr35767-2d.c trunk/gcc/testsuite/gcc.target/i386/pr35767-2i.c trunk/gcc/testsuite/gcc.target/i386/pr35767-3.c trunk/gcc/testsuite/gcc.target/i386/pr35767-4.c trunk/gcc/testsuite/gcc.target/i386/pr35767-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
Fixed.
gcc.target/i386/pr35767-5.c is failing for me in both -m32 and -m64 mode on trunk: xgcc (GCC) 4.9.0 20140204 (experimental) The assembly produced: test: subq $24, %rsp movaps .LC0(%rip), %xmm0 movups %xmm0, (%rsp) movaps %xmm0, %xmm7 movaps %xmm0, %xmm6 movaps %xmm0, %xmm5 movaps %xmm0, %xmm4 movaps %xmm0, %xmm3 movaps %xmm0, %xmm2 movaps %xmm0, %xmm1 call foo movl $0, %eax addq $24, %rsp ret The movups appears to be especially bogus since it's moving to 0(%rsp) that is guaranteed to be 16-byte aligned by the ABI.