This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: GCC and Floating-Point
- From: "Menezes, Evandro" <evandro dot menezes at AMD dot com>
- To: "Uros Bizjak" <uros at kss-loka dot si>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 25 May 2005 17:39:00 -0500
- Subject: RE: GCC and Floating-Point
Uros,
> > Actually, in many cases, SSE did help x86 performance as
> well. That
> > happens in FP-intensive applications which spend a lot of time in
> > loops when the XMM register set can be used more
> efficiently than the x87 stack.
>
> This code could be a perfect example how XMM register file
> beats x87 reg stack.
> However, contrary to all expectations, x87 code is 20%
> faster(!!) /on p4, but it would be interesting to see this
> comparison on x86_64, or perhaps on 32bit AMD/.
> The code structure, produced with -mfpmath=sse, is the same
> as the code structure produced with -mfpmath=x87, so IMO
> there is no register allocator effects in play.
I'll look into it and share what I see.
> I was trying to look into this problem, but on first sight,
> code seems optimal to me...
FWIW, here's some old data I got almost 2 years ago (run-times and geometric means of the ratios using SPEC's bases):
CPU2000 A B
164.gzip 205s 203s
175.vpr 185s 188s
176.gcc 117s 116s
181.mcf 313s 314s
186.crafty 112s 112s
197.parser 268s 268s
252.eon 147s 167s
253.perlbmk 175s 180s
254.gap 148s 148s
255.vortex 178s 178s
256.bzip2 211s 202s
300.twolf 313s 328s
Int Geomean 812 801
177.mesa 173s 187s
179.art 346s 690s
183.equake 163s 162s
188.ammp 325s 336s
FP Geomean 757 620
Using GCC 3.3.3 from 3_3-hammer branch with the options for runs in column B were "-m32 -O3 -march=k8 -ffast-math -fomit-frame-pointer -malign-double +FDO", for column A, the same ones plus "-mfpmath=sse". The system was a 1.4GHz Athlon 64 with PC2100 RAM.
Because things were so much better with SSE, I haven't run with x87 lately...
--
Evandro