This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/17622] Non-optimal code sequence for floating point "x=0; x+=a*b;"
- From: "uros at kss-loka dot si" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 23 Sep 2004 11:28:58 -0000
- Subject: [Bug target/17622] Non-optimal code sequence for floating point "x=0; x+=a*b;"
- References: <20040922213444.17622.bangerth@dealii.org>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From uros at kss-loka dot si 2004-09-23 11:28 -------
(In reply to comment #0)
> (A point I'm not quite sure about, because I don't know the least
> about cycle counts etc: wouldn't it be faster to use the
> sequence "fldz ; faddp st,st(1)" rather than "faddl .LC1"? gcc2.95
> used to create that sequence.)
On classic pentium, fldz takes 2 clock cycles and faddp st,st(1) takes 3 cycles.
As fadd can overlap 2 cycles with fldz, the total combination cost is 3 cycles.
fadd st, (mem) also takes 3 cycles, so it doesn't matter.
On i686, fldz goes to p0, fadd st,st(1) also goes to p0. The fadd st,(mem) insn
goes to p0 and p2, so it is faster. As it is the last fp insn in your code,
latency does not matter.
Uros.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17622