This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[BENCHMARK] 3.4.4-pre Povray official benchmark scores for 387 andSSE insn sets
- From: Uros Bizjak <uros at kss-loka dot si>
- To: Giovanni Bajo <rasky at develer dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 25 Jan 2005 16:05:14 +0100
- Subject: [BENCHMARK] 3.4.4-pre Povray official benchmark scores for 387 andSSE insn sets
- References: <41F5260D.4070606@kss-loka.si> <015b01c502d5$bdd8f890$bf03030a@trilan>
Giovanni Bajo wrote:
Following are the scores for a Povray official benchmark on P4-3.2GHz,
800MHz FSB.
gcc version 4.0.0 20050124 (experimental)
Is it possible to have a comparison with 3.4.3?
Here are the results for gcc version 3.4.4 20050125 (prerelease):
-mfpmath=sse
Time For Parse: 0 hours 0 minutes 1.0 seconds (1 seconds)
Time For Photon: 0 hours 0 minutes 38.0 seconds (38 seconds)
Time For Trace: 0 hours 27 minutes 16.0 seconds (1636 seconds)
Total Time: 0 hours 27 minutes 55.0 seconds (1675 seconds)
-mfpmath=387
Time For Parse: 0 hours 0 minutes 2.0 seconds (2 seconds)
Time For Photon: 0 hours 0 minutes 40.0 seconds (40 seconds)
Time For Trace: 0 hours 29 minutes 40.0 seconds (1780 seconds)
Total Time: 0 hours 30 minutes 22.0 seconds (1822 seconds)
However, it should be noted that in -mfpmath=sse case, we are comparing
apples to oranges. In 3.4.4, math builtins (sin, cos, atan, log, exp)
are _enabled_, but in 4.0, these builtins were disabled for
-mfpmath=sse, because it was shown that on *x86_64* optimized SSE math
libraries are faster and more accurate and that these builtins interfere
with SSE math in some unwanted way (x87 - SSE register shuffling). We
are talking about a significant portion of math functions here:
grep sin povray_asm_34.sse | wc -l
98
grep cos povray_asm_34.sse | wc -l
76
grep fscale povray_asm_34.sse | wc -l
65
grep fpatan povray_asm_34.sse | wc -l
38
... etc ...
So in this case, gcc_34 took the best from both "worlds".
Unfortunatelly, on 32bits, we are stuck with an old API, where passed FP
parameters have to be dragged to and from memory (instead of being
passed into SSE regs) and where return value is returned in x87 reg. So
on 32bits, every function call in fact _forces_ register shuffling that
we are trying to avoid by disabling x87 math builtins.
Uros.