This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: PR 15492: floating-point arguments are loaded too early to x87stack
- From: Uros Bizjak <uros at kss-loka dot si>
- To: Florian Weimer <fw at deneb dot enyo dot de>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 19 Aug 2004 13:59:15 +0200
- Subject: Re: PR 15492: floating-point arguments are loaded too early to x87stack
- References: <41246622.5090401@kss-loka.si> <87y8kbeau5.fsf@deneb.enyo.de>
Florian Weimer wrote:
On modern x86 CPUs, fxch is executed at instruction decoding time by
renaming floating point registers. It only costs execution time if
the instruction decoder cannot keep up with the remaining pipeline (or
if the working set exceeds the size of the processor's trace cache, if
there is one).
fxch is only part of problem. Another problem is, that more fp-stack
registers are used in -O2 case. Consider another example from PR 15492:
double test1 (double a, int x, double b, int y, double c)
{
return sin (c) + tan (b) / sqrt (a) + x * fabs (b) + y;
}
test1 ( with -O2 ):
fldl 4(%esp) #st(0)
fsqrt
fldl 16(%esp) #st(0) st(1)
fld %st(0) #st(0) st(1) st(2)
fldl 28(%esp) #st(0) st(1) st(2) st(3)
fxch %st(2)
fabs
fxch %st(3)
fdivrl .LC1 #st(0) st(1) st(2) st(3)
fxch %st(1)
fptan #st(0) st(1) st(2) st(3) st(4)
fstp %st(0) #st(0) st(1) st(2) st(3)
fxch %st(2)
fsin #st(0) st(1) st(2) st(3)
fxch %st(2)
fmulp %st, %st(1) #st(0) st(1) st(2)
faddp %st, %st(1) #st(0) st(1)
fildl 12(%esp) #st(0) st(1) st(2)
fmulp %st, %st(2) #st(0) st(1)
faddp %st, %st(1) #st(0)
fildl 24(%esp) #st(0) st(1)
faddp %st, %st(1) #st(0)
ret
test1 ( without -O2 ):
fldl 28(%esp) #st(0)
fsin
fldl 16(%esp) #st(0) st(1)
fptan #st(0) st(1) st(2)
fstp %st(0) #st(0) st(1)
fldl 4(%esp) #st(0) st(1) st(2)
fsqrt
fdivrp %st, %st(1) #st(0) st(1)
faddp %st, %st(1) #st(0)
fildl 12(%esp) #st(0) st(1)
fldl 16(%esp) #st(0) st(1) st(2)
fabs
fmulp %st, %st(1) #st(0) st(1)
faddp %st, %st(1) #st(0)
fildl 24(%esp) #st(0) st(1)
faddp %st, %st(1) #st(0)
ret
If functions are more complex, more stack space is wasted. There was no
problem with fadd, fsub, fmul and fdiv insn, because they can operate
with one of their arguments off stack. OTOH, arguments to fsin, fcos,
etc are always loaded at the beginning of the function. In optimized
(-O2) case, argument to fsin insn is filling up fp-stack slot for half
of the function calculation, but it could be loaded just in front of
fsin insn.
Perhaps this problem could be solved by introducing an expander for fsin
& co. in i386.md to expand "fsin (memory)" pattern into "arg load; sin"
sequence? Then combiner could combine "arg load; sin" sequence into
"*fsin_m" pattern when appropriate and this new expander would expand
this pseudo-insn back to "fldl;fsin". This would IMHO move a load in
front of fsin when appropriate.
Uros.