This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: "SSE instruction set disabled?"

From: Ian Ollmann <iano at cco dot caltech dot edu>
To: gcc at gcc dot gnu dot org
Date: Wed, 4 Dec 2002 15:44:03 -0800
Subject: RE: "SSE instruction set disabled?"

> Why wouldn't it be a good idea to always "tune" for the chosen
> architecture?

It is a bad idea when you are shipping a binary that needs to have high
performance but which also must run on a diversity of processors. In this
case, there is no "chosen architecture". For some apps, least common
denomenator performance is not good enough.

In order to achieve superior broad spectrum performance, a frequent
approach is to have an application that has multiple parallel functions
for different architectures. So, for example, in the PowerPC world, we
might have one piece of code that just uses the scalar units for a G3
processor, and another piece of code that does the same thing using
AltiVec for newer processors. At run time, you make the decision about
what hardware is available and call the appropriate function or load the
appropriate library, etc.

This all falls flat on its face when a single flag, such as -msse does
multiple things. It (1) turns on the sse builtins, (2) replaces x87 scalar
code with xmm based scalar SSE code, and (3) may use the xmm register file
for other things like caching integer values that spill off the integer
register file. Number (1) we need in order to write SSE *vector* code.
You need that for performance. However, numbers (2) and (3) are a poison
pill if you are trying to have the same executable also run on a PPro --
it would crash when the PPro hits the automagically generated SSE code.
Also, the reduced precision available in SSE/SSE2 compared to x87 may also
cause problems for some apps, because certain calculations that used to
work now return Inf.

Of course, we could move the vector code off to its own compilation unit.
However this is undesirable for many reasons. We can get into lengthy
religious discussions about exactly how undesirable it actually is.
However, I believe it is more productive to simply point out that there is
no apparent reason to require the -msse flag in order to use the
__builtins or vector types like V4SI. There does not seem to me to be any
potential for namespace collision.

I personally would advocate having the __builtins and vector types
available all the time. It would solve an awful lot of problems like the
one that spawned this thread.

Ian

---------------------------------------------------
   Ian Ollmann, Ph.D.       iano@cco.caltech.edu
---------------------------------------------------

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]