Bug 24074 - (vector float){0, 0, b, a} code gen as not good as it should be
Summary: (vector float){0, 0, b, a} code gen as not good as it should be
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ssemmx
Depends on:
Blocks:
 
Reported: 2005-09-27 04:22 UTC by Andrew Pinski
Modified: 2007-07-01 00:36 UTC (History)
1 user (show)

See Also:
Host:
Target: i786-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-07-01 00:36:35


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-09-27 04:22:05 UTC
Take the following code:
#define vector __attribute__((vector_size(16)))

float a; float b;
vector float fb(void) { return (vector float){ 0,0,b,a};}
--------
Currently we produce:
        movss   _a, %xmm1
        movss   _b, %xmm0
        unpcklps        %xmm1, %xmm0
        movaps  %xmm0, %xmm1
        xorps   %xmm0, %xmm0
        movlhps %xmm1, %xmm0
        ret
-----

But from what I hear the xorps and movlhps are useless instructions because those bits are already 
zero.
Comment 1 Andrew Pinski 2005-09-27 05:08:28 UTC
The issue is in ix86_expand_vector_init.
Comment 2 Serge Belyshev 2005-09-27 05:49:26 UTC
what happening here is:
                                                xmm0            xmm1
        movss   _a, %xmm1                       {?, ?, ?, ?}    {a, 0, 0, 0}
        movss   _b, %xmm0                       {b, 0, 0, 0}    {a, 0, 0, 0}
        unpcklps        %xmm1, %xmm0            {b, a, 0, 0}    {a, 0, 0, 0}
        movaps  %xmm0, %xmm1                    {b, a, 0, 0}    {b, a, 0, 0}
        xorps   %xmm0, %xmm0                    {0, 0, 0, 0}    {b, a, 0, 0}
        movlhps %xmm1, %xmm0                    {0, 0, b, a}    {b, a, 0, 0}
        ret

note we cannot substitute

        movaps  %xmm0, %xmm1
        xorps   %xmm0, %xmm0
        movlhps %xmm1, %xmm0

with just
    
        movlhps %xmm0, %xmm0
        
because low-order 64 bits of the destination register are not modified,
but instead we could:
                                                xmm0            xmm1
        movss   _a, %xmm1                       {?, ?, ?, ?}    {a, 0, 0, 0}
        movss   _b, %xmm0                       {b, 0, 0, 0}    {a, 0, 0, 0}
        unpcklps %xmm1, %xmm0                   {b, a, 0, 0}    {a, 0, 0, 0}
        shufps  78 /*[1,0,3,2]*/, %xmm0, %xmm0  {0, 0, b, a}    {a, 0, 0, 0}
        ret