[Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call
ian at airs dot com
gcc-bugzilla@gcc.gnu.org
Wed Apr 25 15:09:00 GMT 2007
When I compile this test case with -O2 for x86_64:
extern void g (void);
float
f (float sum, float mult, int *pi)
{
int i, j;
for (i = 0; i < 10; ++i)
{
g ();
for (j = 0; j < 1000; ++j)
sum += *pi++ * mult;
}
return sum;
}
I get this result:
f:
.LFB2:
pushq %rbp
.LCFI0:
movaps %xmm0, %xmm2
xorl %ebp, %ebp
pushq %rbx
.LCFI1:
movq %rdi, %rbx
subq $40, %rsp
.LCFI2:
movss %xmm1, 28(%rsp)
.L2:
movss %xmm2, (%rsp)
call g
cvtsi2ss (%rbx), %xmm0
leaq 4(%rbx), %rax
movl $1, %edx
movss (%rsp), %xmm2
mulss 28(%rsp), %xmm0
addss %xmm0, %xmm2
.p2align 4,,7
.L3:
cvtsi2ss (%rax), %xmm1
addl $1, %edx
addq $4, %rax
cmpl $1000, %edx
mulss 28(%rsp), %xmm1
addss %xmm1, %xmm2
jne .L3
addl $1, %ebp
addq $4000, %rbx
cmpl $10, %ebp
jne .L2
addq $40, %rsp
movaps %xmm2, %xmm0
popq %rbx
popq %rbp
ret
In the original code, the inner loop is performance critical. Note that this
compiles into a mulss loading a value from memory. It would be more efficient
to have the value in a register during the inner loop. In fact the value was
in a register, but we stored it in the stack because it crossed the function
call, and we load it from the stack once for each inner loop iteration rather
than once for each outer loop iteration.
I don't see a simple approach to fixing this. Some sort of live range
splitting might work.
--
Summary: x86_64 poor floating point register allocation across
function call
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ian at airs dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31704
More information about the Gcc-bugs
mailing list