[Bug middle-end/65796] New: unnecessary stack spills during complex numbers function calls
jtaylor.debian at googlemail dot com
gcc-bugzilla@gcc.gnu.org
Fri Apr 17 17:06:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796
Bug ID: 65796
Summary: unnecessary stack spills during complex numbers
function calls
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: jtaylor.debian at googlemail dot com
following function calling cabsf exhibits poor performance when compiled with
gcc:
#include <complex>
using namespace std;
void __attribute__((noinline)) v(int nCor, complex<float> * inp, complex<float>
* out)
{
for (int icorr = 0; icorr < nCor; icorr++) {
float amp = abs(inp[icorr]);
if (amp > 0.f) {
out[icorr] = amp * inp[icorr];
}
else {
out[icorr] = 0.;
}
}
with gcc 4.9 and 5 (20150208) on x86_64 produces:
g++- test.cc -O2 -c -S
.L15:
movss 4(%rsp), %xmm2
addq $8, %rbx
addq $8, %rbp
movss (%rsp), %xmm1
mulss %xmm0, %xmm2
mulss %xmm0, %xmm1
movss %xmm2, -8(%rbx)
movss %xmm1, -4(%rbx)
cmpq %r12, %rbx
je .L14
.L7:
movss 0(%rbp), %xmm2
movss 4(%rbp), %xmm1
movss %xmm2, 8(%rsp)
movss %xmm1, 12(%rsp)
movq 8(%rsp), %xmm0
movss %xmm2, 4(%rsp)
movss %xmm1, (%rsp)
call cabsf
pxor %xmm3, %xmm3
ucomiss %xmm3, %xmm0
ja .L15
note the spills of xmm[12] onto the stack and reloading it into xmm0
instead of spilling to the stack one could use unpcklps to prepare xmm0
with a simple benchmark on 5000 floats this would speed up the function by
about 30% on an intel core2 and an i5 which is quite significant given the
expensive cabs call that is also done in it.
More information about the Gcc-bugs
mailing list