This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[BENCHMARK] Povray official benchmark scores for 387 and SSE insnsets


Hello!

Following are the scores for a Povray official benchmark on P4-3.2GHz, 800MHz FSB.

gcc version 4.0.0 20050124 (experimental)

PovRay version 3.50c, stripped.

Compile flags:
-O3 -march=pentium4 -mfpmath=??? -ffast-math -D__NO_MATH_INLINES -finline-functions -fomit-frame-pointer -funroll-loops -fexpensive-optimizations -malign-double -foptimize-sibling-calls -minline-all-stringops -Wno-multichar


-mfpmath=sse:
Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  35.0 seconds (35 seconds)
Time For Trace:    0 hours 29 minutes  29.0 seconds (1769 seconds)
   Total Time:    0 hours 30 minutes   6.0 seconds (1806 seconds)

-mfpmath=387:
Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  35.0 seconds (35 seconds)
Time For Trace:    0 hours 28 minutes  27.0 seconds (1707 seconds)
   Total Time:    0 hours 29 minutes   4.0 seconds (1744 seconds)

It should be noted that:

- 387 version can use a lot of shortcuts by optimizing math built-in functions, for example:
grep fsincos povray.asm.387 | wc -l
56


- Math builtins are effectively disabled for sse version.

- Another handicap for sse version is the fact that all return values from math functions travel from st(0) x87 reg to memory and back to SSE reg:
804b91d: f2 0f 11 04 24 movsd %xmm0,(%esp,1)
804b922: e8 7d f5 ff ff call 0x804aea4
804b927: dd 5c 24 28 fstpl 0x28(%esp,1)
804b92b: f2 0f 10 4c 24 28 movsd 0x28(%esp,1),%xmm1
804b931: f2 0f 5c 4b 10 subsd 0x10(%ebx),%xmm1


- In both cases register allocator still can't figure out which register set it _really_ wants, so these values travel the same way as in example above:
grep pxor povray.asm.387 | wc -l
21
grep fldz povray.asm.sse | wc -l
122
grep fld1 povray.asm.sse | wc -l
171


With all these in mind, -mfpmath=sse results are very good and show the effect of rth's TARGET_SSE_MATH cleanups.

As a last thought - is there a reason not to create an ABI similar to something like x86_64 has for its FP math? If SFmode and DFmode floats could be passed to and from functions in SSE registers instead of via memory, I'm sure some benefits can be shown for FP performance. Perhaps a multi-lib approach could be introduced, where appropriate sse-only (sub-)library [with parameters passed in SSE registers] would be linked in for -mfpmath=sse, instead of current x87 [with inline asm troubles :)] one.

Uros.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]