This is the mail archive of the
mailing list for the GCC project.
fancy x87 ops, SSE and -mfpmath=sse,387 performance
- From: tbp <tbptbp at gmail dot com>
- To: "GCC Mailing List" <gcc at gcc dot gnu dot org>
- Date: Sun, 6 Aug 2006 08:52:56 +0200
- Subject: fancy x87 ops, SSE and -mfpmath=sse,387 performance
Basically i'd like to have the cake and also eat it.
With g++-4.2-20060805/cygwin on a k8 box on some software path with
lots of sp float ops but no transcendentals or library calls
-mfpmath=sse,387: 5.2 Mray/s
-mfpmath=sse: 6 Mray/s
That 15% performance difference is no surprise when you see things like
4037c8: flds 0x4(%esp)
4037cc: mulss %xmm5,%xmm2
4037d0: fsubrp %st,%st(1)
4037d2: movss %xmm1,0x4(%esp)
4037d8: addss 0x278(%esp,%ecx,4),%xmm0
4037e1: flds 0x4(%esp)
4037e5: fsubrp %st,%st(1)
4037e7: addss %xmm2,%xmm0
4037eb: movss %xmm0,0x4(%esp)
4037f1: flds 0x4(%esp)
4037f5: fdivrp %st,%st(1)
4037f7: fcomi %st(1),%st
4037fb: setae %dl
4037fe: fcomip %st(1),%st
403800: seta %al
403803: or %al,%dl
403805: je 4036ca
Therefore -mfpmath=sse is the way to go and is in fact on par or
better than what i get out of icc 9.1 for the same code.
Where it gets ugly is when, for example, you throw some cosf() into
the same compilation unit as with -mfpmath=sse you pay for some really
really slow library function calls (at least on cygwin).
Wishful thinking got me trying -march=k8 -mfpmath=sse
-mfancy-math-387, to no avail :(
Is there a way to enable such exotic codegen for 32bit environments?