This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/58497] SLP vectorizes identical operations

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 23 Sep 2013 08:33:42 +0000
Subject: [Bug tree-optimization/58497] SLP vectorizes identical operations
Auto-submitted: auto-generated
References: <bug-58497-4 at http dot gcc dot gnu dot org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-09-23
         Depends on|                            |53947
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Heh ;)  I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't
vectorize this at all.

Note that with for example

typedef float float4 __attribute__((vector_size(16)));

float4 g(int x)
{
  float4 W;
  W[0]=W[1]=x+1;
  W[2]=x+2;
  W[3]=x+3;
  return W;
}

vectorizing two same operations may be profitable.  But yes, if all
scalars are the same there is no point to do it.  And the cost model
should have disabled it as well (though likely the four "stores"
made it profitable in the end).

I will have a look at some point.

OTOH generated code is

g:
.LFB0:
        .cfi_startproc
        movl    %edi, -12(%rsp)
        movd    -12(%rsp), %xmm1
        pshufd  $0, %xmm1, %xmm0
        paddd   .LC0(%rip), %xmm0
        cvtdq2ps        %xmm0, %xmm0
        ret

vs. -fno-tree-vectorize:

g:
.LFB0:
        .cfi_startproc
        xorps   %xmm1, %xmm1
        addl    $1, %edi
        xorps   %xmm0, %xmm0
        cvtsi2ss        %edi, %xmm1
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $36, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $196, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        unpcklps        %xmm0, %xmm0
        movss   %xmm1, %xmm0
        shufps  $225, %xmm2, %xmm0
        movss   %xmm1, %xmm0
        ret

so clearly a win, but improvable to sth like

        addl    $1, %edi
        cvtsi2ss        %edi, %xmm1
        pshufd  $0, %xmm1, %xmm0

the above also shows that vector init by BIT_FIELD_REF is not expanded
very well (sth for a generalized vector shuffle recognition in the bswap pass).

References:
- [Bug tree-optimization/58497] New: SLP vectorizes identical operations
  - From: glisse at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]