GCC dosen't allocate __m128 locals on a 16-byte boundary, but continues to use movaps to access them, causing general protection faults at run-time.
Created attachment 11685 [details] The file that gcc fails to compile correctly. Use gcc -S -msse and look at the assembly. GCC allocates __m128 locals directly on the stack without adjusting ESP, which might not be 16-byte aligned. But GCC uses movaps, which requires its operand to be 16-byte aligned, to access those locals. ICC solves this problem by adding pushl %ebp movl %esp, %ebp andl $-16, %esp to the function prolog
(In reply to comment #1) > Use gcc -S -msse and look at the assembly. GCC allocates __m128 locals directly > on the stack without adjusting ESP, which might not be 16-byte aligned. But GCC > uses movaps, which requires its operand to be 16-byte aligned, to access those > locals. In a way this is a dup of bug 27537. Though there is an attribute to realign the stack in 4.2.0 so using that might just fix this issue instead.
Not specific to mingw32.
(In reply to comment #2) > In a way this is a dup of bug 27537. Though there is an attribute to realign > the stack in 4.2.0 so using that might just fix this issue instead. Indeed, 5c5 < void dct64_sse(float *a,float *b,float *c) --- > void __attribute__ ((force_align_arg_pointer)) dct64_sse(float *a,float *b,float *c) fixes on 4.2. BTW, this issue has particular importance for mingw32 multithreaded programs, since Win32 API CreateThread or the corresponding CRT _beginthreadex functions do not guarantee that the stack will be 16-byte-aligned on entry to the thread start-function callback. Marking the thread start function with force_align_arg_pointer attribute fixes. Hmm should that go in gcc.info? Danny
*** This bug has been marked as a duplicate of 27537 ***