Bug 37340 - -foptimize-register-move => wrong code for loading an sse2 register
Summary: -foptimize-register-move => wrong code for loading an sse2 register
Status: RESOLVED DUPLICATE of bug 37101
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2008-09-02 19:52 UTC by Emmanuel Thomé
Modified: 2008-09-02 20:53 UTC (History)
4 users (show)

See Also:
Host: x86_64-redhat-linux
Target: x86_64-redhat-linux
Build: x86_64-redhat-linux
Known to work:
Known to fail:
Last reconfirmed:


Attachments
testcase (1.45 KB, application/x-compressed-tar)
2008-09-02 19:53 UTC, Emmanuel Thomé
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Emmanuel Thomé 2008-09-02 19:52:44 UTC
The following code is miscompiled at -O3 with gcc 4.3.0 ; g[1] is filled with movq+movlps instead of movq+movhps.


void frob(long long *t, const long long *s1,
               const long long *s2)
{
    long long w_s2[2];
    long long z;

    z = s2[0];
    w_s2[0] = z & 0x1fUL;
    z >>= 5;
    w_s2[1] = z;

    typedef union {
        __v2di s;
        long long x[2];
    } __v2di_proxy;

    __v2di g[4];

    g[0] = (__v2di) { 0,};
    g[1] = (__v2di) { w_s2[0], w_s2[1],};
    // it's unused in my testcase, and makes the assembly diff more
    // readable.
    // g[2] = (__v2di) { 0,};
    g[3] = g[1];

    __v2di_proxy r;
    r.s =  g[s1[0]];
    t[0] = r.x[0];
    t[1] = r.x[1];
}

Here is the diff of the generated asm. A full testcase follows in the form of a tar file.

frob:
        movq    (%rdx), %rax
        pxor    %xmm0, %xmm0
        movdqa  %xmm0, -72(%rsp)
        movq    %rax, %rdx
        andl    $31, %edx
        movq    %rdx, -96(%rsp)
        sarq    $5, %rax
-       movq    %rax, -104(%rsp)
-       movq    -96(%rsp), %xmm1
-       movhps  -104(%rsp), %xmm1
+       movq    %rax, -112(%rsp)
+       movq    -112(%rsp), %xmm1
+       movlps  -96(%rsp), %xmm1

Notice how the shifted rax stored in -112(%rsp) goes to xmm1 with movq while it should reach the high word.


Have I done anything wrong ?

E.
Comment 1 Emmanuel Thomé 2008-09-02 19:53:53 UTC
Created attachment 16198 [details]
testcase
Comment 2 Emmanuel Thomé 2008-09-02 19:56:35 UTC
> Here is the diff of the generated asm.

One more note. The flags are, respectively:
-O
-O -foptimize-register-move
Comment 3 Richard Biener 2008-09-02 20:06:34 UTC
Sounds like a dup of PR37101 which is fixed for GCC 4.3.2.
Comment 4 Emmanuel Thomé 2008-09-02 20:53:16 UTC
(In reply to comment #3)
> Sounds like a dup of PR37101 which is fixed for GCC 4.3.2.

indeed. Thanks.


*** This bug has been marked as a duplicate of 37101 ***