This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: SSE and x86 FP ABI (again)
- To: Tim Prince <tprince at computer dot org>
- Subject: Re: SSE and x86 FP ABI (again)
- From: Drew Hess <dhess at ilm dot com>
- Date: Fri, 2 Mar 2001 17:08:45 -0800 (PST)
- cc: <gcc at gcc dot gnu dot org>
On Fri, 2 Mar 2001, Tim Prince wrote:
> I thought that linux people had long since accepted the idea of
> supporting alignments; 16-byte alignment support has been needed since
> the Pentium Pro, and 32-byte alignments are needed for performance on
> the current generation architectures. The compilers I've used didn't
> pass arguments in SSE registers, so there is no problem inter-mixing SSE
> and non-SSE objects. Are you talking about changing that? I doubt that
> people running gcc/g77 on Windows, as I do, will be standing in the way
> of SSE, although the alignment problems in gcc-2.95 g++ seem severe.
The only discussion I've seen on the list re: breaking the ABI was about
the loss of precision in SSE versus double-extended x87.
> Now that the major bugs in -ffast-math seem to be out, I'm much happier
> with it. AFAIK the major implication is the unpredictable behavior of
> comparisons involving NaN. That hasn't been such a problem for me as it
> was on notorious NaN-generating architectures like R8000.
Even without using -ffast-math, with x87 FP and gcc we've had problems
with very small numbers that should round to zero in single-precision, but
are representable in double-extended. Depending on the optimization level
used, these values may or may not get spilled (and therefore may or may
not get converted to single-precision), and the same code behaves
differently. Before the SSE modes, we were forced to use -ffloat-store
for predictable behavior across different optimization levels in these
cases.
> The major Windows compilers are happy with setting 53-bit precision
> mode; then, the behavior must resemble some rs6k implementations; and
> the performance of sqrt() and divide increases. I myself prefer to see
> it left to default, only to change when the programmer requests it.
Yeah, I think that's fine.
> > The SSE modes give us
> > the best of both worlds.
> >
> It's got a long way to go; the loss of out-of-order with SSE can kill
> the performance when it is applied with conditional branching.
Am I misunderstanding you, or are you saying that SSE instructions are not
executed out-of-order or speculatively (across branches) on Pentium 3
and/or Pentium 4? That's hard to believe. At the very least, I can't
find any mention of that behavior in the Pentium 4 optimization guide.
> > I'm just
> > hoping to maintain the status quo.
>
> In which way? Do you mean to support high performance without
> necessarily using -ffast-math, or continuing not to support more than
> 4-byte alignments in Windows?
I mean that I would like to see the -msse and -msse2 flags implemented as
separate options and not part of -ffast-math. That's how they're
currently implemented, and I'm hoping it doesn't change.
-dwh-