target/6624: SSE misalignment with -O0

>Number:         6624
>Category:       target
>Synopsis:       SSE mis-alignment with -O0
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          wrong-code
>Submitter-Id:   net
>Arrival-Date:   Fri May 10 09:16:01 PDT 2002
>Originator:     Jesse Hall
>Release:        3.1 20020426 (prerelease)
System: Linux espresso 2.4.18-xfs #1 SMP Tue Mar 12 20:03:55 CST 2002 i686 unknown
Architecture: i686 (more precisely, dual AthlonMP)

host: i386-pc-linux-gnu
build: i386-pc-linux-gnu
target: i386-pc-linux-gnu
configured with: ../src/configure -v --enable-languages=c,c++,java,f77,proto,objc --prefix=/usr/lib/gcc-snapshot --infodir=/share/info --mandir=/share/man --enable-shared --with-gnu-as --with-gnu-ld --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-checking --enable-threads=posix --enable-java-gc=boehm --with-cpp-install-dir=bin --enable-objc-gc i386-linux
    The SSE intrinsic _mm_loadu_ps can do an aligned SSE load (LOADAPS) from
    an unaligned address when compiling with -O0. The problem seems to go away
    with -O1 and higher. This causes a segfault when the program is run. The
    assembly generated looks like:

        call   804834a <_mm_loadu_ps>
        add    $0x10,%esp
        movaps %xmm0,0xffffffe8(%ebp)

    On my machine, the movaps address is consistently not on a 16-byte
    boundary like it should be.

    This simple program exhibits the problem on my machine:

    #include <xmmintrin.h>
    int main(int argc, char** argv) {
        __m128 x;
        float a[4] = {1.0f, 1.0f, 1.0f, 1.0f};
        x = _mm_loadu_ps(a);
        return 0;

    Compiled with "gcc -g -O0 -msse foo.c -o foo".
    Compiling with -O1 or higher makes the problem go away, at least in the
    simple example above.

