[Bug middle-end/65796] New: unnecessary stack spills during complex numbers function calls

Fri Apr 17 17:06:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796

            Bug ID: 65796
           Summary: unnecessary stack spills during complex numbers
                    function calls
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jtaylor.debian at googlemail dot com

following function calling cabsf exhibits poor performance when compiled with
gcc:

#include <complex>
using namespace std;
void __attribute__((noinline)) v(int nCor, complex<float> * inp, complex<float>
* out)
{
    for (int icorr = 0; icorr < nCor; icorr++) {
        float amp = abs(inp[icorr]);
        if (amp > 0.f) {
            out[icorr] = amp * inp[icorr];
        }   
        else {
            out[icorr] = 0.; 
        }   
    }

with gcc 4.9 and 5 (20150208) on x86_64 produces:
g++- test.cc -O2  -c -S

.L15:
    movss    4(%rsp), %xmm2
    addq    $8, %rbx
    addq    $8, %rbp
    movss    (%rsp), %xmm1
    mulss    %xmm0, %xmm2
    mulss    %xmm0, %xmm1
    movss    %xmm2, -8(%rbx)
    movss    %xmm1, -4(%rbx)
    cmpq    %r12, %rbx
    je    .L14
.L7:
    movss    0(%rbp), %xmm2
    movss    4(%rbp), %xmm1
    movss    %xmm2, 8(%rsp)
    movss    %xmm1, 12(%rsp)
    movq    8(%rsp), %xmm0
    movss    %xmm2, 4(%rsp)
    movss    %xmm1, (%rsp)
    call    cabsf
    pxor    %xmm3, %xmm3
    ucomiss    %xmm3, %xmm0
    ja    .L15

note the spills of xmm[12] onto the stack and reloading it into xmm0
instead of spilling to the stack one could use unpcklps to prepare xmm0

with a simple benchmark on 5000 floats this would speed up the function by
about 30% on an intel core2 and an i5 which is quite significant given the
expensive cabs call that is also done in it.