This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/17622] Non-optimal code sequence for floating point "x=0; x+=a*b;"

From: "uros at kss-loka dot si" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 23 Sep 2004 11:28:58 -0000
Subject: [Bug target/17622] Non-optimal code sequence for floating point "x=0; x+=a*b;"
References: <20040922213444.17622.bangerth@dealii.org>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Additional Comments From uros at kss-loka dot si  2004-09-23 11:28 -------
(In reply to comment #0)

> (A point I'm not quite sure about, because I don't know the least 
> about cycle counts etc: wouldn't it be faster to use the  
> sequence "fldz ; faddp st,st(1)" rather than "faddl .LC1"? gcc2.95 
> used to create that sequence.) 

On classic pentium, fldz takes 2 clock cycles and faddp st,st(1) takes 3 cycles.
As fadd can overlap 2 cycles with fldz, the total combination cost is 3 cycles.
fadd st, (mem) also takes 3 cycles, so it doesn't matter.

On i686, fldz goes to p0, fadd st,st(1) also goes to p0. The fadd st,(mem) insn
goes to p0 and p2, so it is faster. As it is the last fp insn in your code,
latency does not matter.

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17622

References:
- [Bug target/17622] New: Non-optimal code sequence for floating point "x=0; x=a*b;"
  - From: bangerth at dealii dot org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]