This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: SSE vs. x87 povray deathmatch [was: Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP ...]


> Menezes, Evandro wrote:
> Povray was compiled using "-pipe -Wno-multichar -O3 -march=k8-mtune=k8 
> -ffast-math -minline-all-stringops" for SSE.
> The result of benchmark run was:
> user    27m43.635s
> 
> 387 benchmark was compiled with -mfpmath=387 added to compile flags.
> The result of benchmark was:
> user    28m40.049s
> 
> and this way many x87->mem->SSE moves were removed. The result of 
> benchmark run is now:
> user    27m27.141s

Hmm, fun ineed ;)
If you manage to get any instruction level oprofiles of routines that
execute faster on x87 than on SSE, I would be definitly interested to
see them.  I will try to get this done for both benchmarks sometime
later this week or next week myself if time allows.

In addition to the mentioned math functions, comparing SSE to x87
performance is tricky especially for code working on floats as C
introduce many "implicit" float to double conversions that are noops on
x87, but rather expensive on SSE.  I did some work on elliminating this
by adding folders around common offenders (as fabs), but perhaps we need
more epsecially for -ffast-math.  Sadly many programs are written in a
manner doing those conversions in nontrivial cases for no good reasons
and I guess it is more or less matter of re-optimizing those
applications for new hardware (I guess povray is good example of
application that got extensive tuning for x87 hardware, dryrstone is not
however). 

Other common causes for slowdowns in SSE is the lack of reversed order
instructions (you can do reg=reg-reg2, but not reg=reg2-reg, while x87
allows both) and also sometimes increased instruction length causing
decoder stalls.  None of those should however show significantly enought
to outweight all the x87 fxch braindamage...  So lets give a try
identifying and hopefully fixing the SSE codegen issues.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]