This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
PR 15492: floating-point arguments are loaded too early to x87 stack
- From: Uros Bizjak <uros at kss-loka dot si>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 19 Aug 2004 10:34:42 +0200
- Subject: PR 15492: floating-point arguments are loaded too early to x87 stack
Hello!
I would like to bring this PR (
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15492 ) to the attention of
gcc developers. The problem, described in this PR has big impact on FP
calculations, because fp-stack is wasted with register copies and a lot
of unnecessary fxch instructions are generated.
A simple testcase:
double test (double a, double b) {
return a*a + b*b;
}
Current (Aug. 19) mainline CVS gcc generates:
with "gcc -O2 -fomit-frame-pointer":
test:
fldl 4(%esp)
fldl 12(%esp)
fxch %st(1)
fmul %st(0), %st
fxch %st(1)
fmul %st(0), %st
faddp %st, %st(1)
ret
and without optimization, "gcc -fomit-frame-pointer":
test:
fldl 4(%esp)
fmull 4(%esp)
fldl 12(%esp)
fmull 12(%esp)
faddp %st, %st(1)
ret
According to "How to optimize for the Pentium family of microprocessors"
by Agner Fog, "fld r/m32/m64" consumes one clock cycle on P1, PMMX,
PPRO, P2, P3 and P4 in all its forms. As it is shown, gcc actually
de-optimizes code with "-O2".
In PR 15492, a couple of other examples are shown.
This shows, how serious problem could be:
gcc -ffast-math -S -O2 almabench.c
grep fxch almabench.s | wc -l
114
gcc -ffast-math -S almabench.c
grep fxch almabench.s | wc -l
5
I belive that this problem also affects PR 13712: "Executable runs 25%
slower than when compiled with INTEL compiler" (
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 ).
I was trying to look into this problem, and I have found that
-fno-schedule-insns produce a little bit better code (but not even close
to the code without -O), but it looks that problem is inside RTL generator.
Could somebody with more knowledge of gcc help to solve this problem?
Uros.