This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: SSE vs. x87 povray deathmatch [was: Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP ...]
- From: "Menezes, Evandro" <evandro dot menezes at amd dot com>
- To: "Uros Bizjak" <ubizjak at gmail dot com>
- Cc: "Roger Sayle" <roger at eyesopen dot com>, "Michael Matz" <matz at suse dot de>, "Jan Hubicka" <hubicka at ucw dot cz>, "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Richard Guenther" <rguenther at suse dot de>
- Date: Tue, 10 Oct 2006 16:58:01 -0500
- Subject: RE: SSE vs. x87 povray deathmatch [was: Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP ...]
Uros,
> I have re-run official povray-3.6.1 benchmark on
>
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 47
> model name : AMD Athlon(tm) 64 Processor 3000+
> stepping : 2
> cpu MHz : 1809.276
> cache size : 512 KB
>
> On Fedora Core 4 (2.6.11-1.1369_FC4 #1 Thu Jun 2 22:56:33 EDT 2005
> x86_64 x86_64 x86_64 GNU/Linux) using out of the box glibc:
>
> GNU C Library development release version 2.3.5, by Roland
> McGrath et al.
> ...
> Compiled by GNU CC version 4.0.0 20050525 (Red Hat 4.0.0-9).
> Compiled on a Linux 2.4.20 system on 2005-05-30.
As I said, GLIBC doesn't have fast routines for x86-64. SUSE and others do, but neither FSF nor RH do.
I'll run SPEC CPU2006 Povray which I have handy on SUSE 10.0 and post the results later. Then I'll run 3.6.1 as well.
> I have speculated that the slowdown was due to costly SSE->mem->x87
> moves. These moves should be avoided as much as possible, and
> this fact
> was already proved some time ago (this is actually the reason why x87
> intrinsics are disabled for SSE math). To prove this
> speculation, -msse3
> was added to compile flags to enable generation of fisttp instruction.
Could it have removed changes to the rounding mode as well?
> So, at this
> point x87 code of a real world application (which is BTW the
> part of a
> SPEC suite) beats x86_64 SSE, despite the fact that SSE has
> two times as
> many non-stacked FP registers and implements register passing
> convention
> (thus avoiding memory moves).
That's not the correct conclusion. As I said, you haven't isolated x87 microcode vs. GLIBC math functions...
Thanks,
--
_______________________________________________________
Evandro Menezes AMD Austin, TX