This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/17622] Non-optimal code sequence for floating point "x=0; x+=a*b;"


------- Additional Comments From uros at kss-loka dot si  2004-09-23 11:28 -------
(In reply to comment #0)

> (A point I'm not quite sure about, because I don't know the least 
> about cycle counts etc: wouldn't it be faster to use the  
> sequence "fldz ; faddp st,st(1)" rather than "faddl .LC1"? gcc2.95 
> used to create that sequence.) 

On classic pentium, fldz takes 2 clock cycles and faddp st,st(1) takes 3 cycles.
As fadd can overlap 2 cycles with fldz, the total combination cost is 3 cycles.
fadd st, (mem) also takes 3 cycles, so it doesn't matter.

On i686, fldz goes to p0, fadd st,st(1) also goes to p0. The fadd st,(mem) insn
goes to p0 and p2, so it is faster. As it is the last fp insn in your code,
latency does not matter.

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17622


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]