This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86

To: moshier at mediaone dot net
Subject: Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
From: Craig Burley <burley at gnu dot org>
Date: Mon, 14 Dec 1998 11:49:06 -0500 (EST)
Cc: egcs at cygnus dot com
Cc: burley at gnu dot org
>The Intel and 68k floating-point behavior has existed some fifteen
>years and has never stopped anyone I know from writing good-quality,
>portable programs.

That info doesn't help anyone AFAICT.  For example, you didn't specify
whether those programs made heavy use of floating-point, or whether they
met the price/performance expectations compared to other machines
that *don't* offer the 80-bit-FP behavior, or whether they were
compiled with a compiler that doesn't have the problems addressed
by my proposal.  (Not all compilers spill 80-bit FP values to 64 bits
like gcc does, apparently.)

>We have some real problems that ought to have more priority.  An example
>is that float complex does not work on alphas.  There is no workaround
>for that bug.  It is hard to fix properly, and it is something that
>needs fixing.

Yes, but that is also a longstanding bug, across more machines than
just Alpha, and AFAIK it has never stopped anyone from writing good-
quality, portable programs -- because, to do that, they either write
the operations out longhand (necessary for C if you say "portable"
anyway), thus avoiding the bug entirely, or just use Fortran (which has
had -femulate-complex as the default for some time now to work around the
bug).

In this case, -fno-emulate-complex increases performance at some
cost of possibly encountering the code-generation bug(s).  g77
might be the only gcc front end that supports a built-in complex
type and supports switching between use of the gcc back end's
built-in complex support; AFAIK, Ada (GNAT) does the former but
always uses emulation (the equivalent of g77's default,
-femulate-complex).  I don't know about C++ (g++) or Pascal (gpc),
or any of the others, offhand.  I think gcc might be the only
front end that provides a native complex type but doesn't offer
emulation (though using gcc's native complex type does not exactly
make for portable code, if by "portable" one means "compiled by
the compiler of your choice...").

If you'd take a moment to review the reason I originally submitted
my proposal here, you might notice that what triggered it was my
awareness that people were seriously talking about rewriting the x86
machine description from scratch, or nearly so.  (Maybe I wasn't
too clear about this motivation in my email, but it *was* why I
proposed it then, instead of waiting to study the issues further.
I was trying to make sure my proposal didn't miss a crucial window
of opportunity, a window in the early phase of a redesign.)

If we're going to rewrite substantial areas of the ix86 compiler,
such as the machine descriptions, we might as well make sure we
tell it the truth: that x86 FP registers are 80-bit, and that they should
therefore normally be spilled to 80-bit-containing temporaries,
not 64-bit ones.  That seems like a no-brainer to me, and it is,
essentially, all I'm proposing, or at least the bulk of it.  (E.g.
I'd like to see an option that gets the old behavior, and maybe there
are a few "unexpected" places that'd still chop while spilling unless
specifically fixed, outside of the machine-description and obviously
related areas.)

Calling 80-bit FP registers a "feature" is perhaps appropriate when
referring to the hardware, and, even so, debatable when getting
strict 64-bit behavior is so difficult that it is impossible to do
so without a substantial drop in performance (as is the case with the
x86 architecture and all implementations to date, AFAICT)...but let's
put that aside.

Calling 80-bit FP registers a "feature" in terms of *compiler* support
of an 80-bit-FP machine, when that compiler unpredictably
decides whether to make use of that feature, is IMO completely
ludicrous.

For example, I wouldn't mind a feature on my system whereby it automatically
backed up my files as I edited and wrote them.  However, if it randomly
decided to back up only 80% of a file depending on the precise timing
of my keyboard hits and mouse movements within the previous 5 minutes,
then it isn't a feature -- it's a bug, and I'd want that behavior fixed
to be more consistent, or the feature removed entirely.  Sure, you could
just yell at me about how I should be making my *own* backups, but
if you give me a feature that, when poked and prodded at, *pretends*
to reliably back up my files but, on occasion, decides on its own to
not bother doing so quite properly, then *you* are at fault, not me,
when I get burnt by relying on this behavior.  (That is: if you call
such automatic back-up a "feature", then that means it should be
relied upon.  If it's *not* a feature, then, as a behavior that
affects performance and/or consistency of results, it's a bug.  The
proponents of the x86's 80-bit FP call it a "feature".  Therefore,
it must be made as *predictable* in normal use as possible, or it
is, in fact, a bug.)

The fact is, gcc does not, and never has, properly supported its
implicit use of extended precision (as mandated by performance
concerns on the underlying hardware), and this is becoming more
and more noticable as more and more people use gcc, g77, g++, and
so on on x86 machines.  That few people in the past have noticed
this is irrelevant: more and more people notice this every day,
in trivial examples, and we don't have *any* clue as to how many
people *should* have noticed this in more complicated code that's
in production, with latent bugs, thanks to gcc's misbehavior, but
haven't managed to notice the bug (or didn't bother to report it).

Therefore, on the whole, 80-bit FP support in gcc on machines like the
x86 is not a feature, it's a bug.  We can either fix the bug, or,
as some would recommend, eliminate the feature by not using the
extra precision (by storing/reloading every single computation to a
64-bit value).  The former hardly affects performance, while preserving
the underlying hardware feature, while the latter greatly affects
performance.

I think most gcc/g77/g++ users would prefer the former.  Personally,
I don't have a strong opinion: I tend to prefer consistent behavior
across *all* GNU-supported machines (e.g. I'd like -mieee to be the
default on Alphas), but it seems that most of the industry prefers
the underlying machinery, and its performance capabilities, be *more*
exposed for FP, as compared to other things that GNU (and UNIX in general)
rightly hides behind a consistent, portable interface.  When it comes
to FP behavior, I tend to discount my own opinions in favor of what
others, with more experience, think, and the FP experts seem to have
already discounted different FP behavior across systems, but not
(yet) different FP behavior across function calls (which gcc gives them
today).

And, setting FP modes to 32 or 64 bit on the x86 is simply not a
feasible solution for gcc to offer as a default at this point in time.

Someday, when it can dynamically recognize the expected default
mode of any assembler code (including object files created before
this support, meaning default to extended, 80-bit mode), tag every
function, even every assembly-code snippet, with the mode requirements
(including "don't care" or "use caller" or whatever), emit those tags
for each snippet emitted by gcc, collect all the info on those tags at
link time, and optimize away re-settings of the FPU to the values
it will already have...*then* we can consider that as being a
worthwhile default, because it might actually not have miserable
performance.  (I'd like to see such linker optimizations include not
just FPU settings, but better decisions regarding where to allocate
temporaries, e.g. on the stack vs. on the heap, for example.)

But, for now, setting the FP mode is something we can really recommend
only to specific users to do on specific programs in specific cases
with, probably, some combination of a lot of work and finger-crossing,
since there's lots of underlying library code (libg2c, libm, who-
knows-what) that might or might not assume that the FP mode is in
the default, 80-bit state.  Any easy way out seems to be a *slow* way
out, and people don't generally want gcc to produce slow code.  Even
studying all the existing library code doesn't help, because there's
currently, AFAIK, no way to mark up the code (whether in assembler,
C, etc.) as "studied", "assumes 80-bit", "works in prevailing mode",
and so on, in a way a linker, and other tools, would recognize.

That's also why the store/reload-every-computation approach isn't
feasible.  Though it might make for more consistent IEEE behavior
across gcc targets, it still won't complete the job for at least
two reasons:

  -  Store/reload, in addition to being slow, doesn't really produce
     consistent IEEE behavior on x86 (same with setting the FP mode
     to 64 bits, apparently).  For example, it rounds twice, instead
     of once (not a problem for setting the FP mode, I think).

  -  Other gcc targets don't provide full IEEE behavior by default
     anyway (even those supporting the format, like Alphas, don't
     default to supporting the full range of the format, unless
     options like -mieee are used).

As Tim Hollebeek pointed out to me in private email, if my proposal
is adopted, it would reduce the number of cases where a change in
optimization level makes a substantial difference in the behavior
of FP code.  That is one of several reasonable conclusions resulting
from my general point that the current behavior of this so-called
"feature" is, for all intends and purposes, random: sometimes you get
it, sometimes you don't, and those time differentials can even
occur (in theory, at least) across different invocations of the
same function by the exact same code while running a single executable.

And, these reductions would take place in the most unpredictable "space"
of where gcc currently chops 80-bit results into 64-bit ones.  There'd
still be potential for such chopping, but they'd generally involve
constructs more visible to the programmer, in the source code (though
I have some reservations about just how acceptable even *this* is going
to be in the long run, which is why I haven't painted my proposal
as a rosy cure-all for peoples' FP problems).

I don't really mind if my proposal isn't adopted ASAP.  If I get to
where I think it has to be done ASAP, I'll try and do it myself (something
I considered long ago re the float-complex bugs, and essentially did
within g77 by implementing -femulate-complex).

But, it will be really sad if the consensus on my proposal *now*
becomes sufficiently negative that the pending rewrite of the ix86
machine description is done without any regard for my proposal, so
that the new description itself requires *another* rewrite down the
road if my proposal is ever to be adopted.

And that is really all I want to do *now* -- prevent the rewrite of
the x86 machine description from assuming the current behavior is,
and always will be, acceptable, because it almost certainly isn't,
and, at least, it is entirely reasonable to someday offer an option
to get the behavior I propose, just as we're already offering
-ffloat-store (which is almost useless, except for lucky people or
people accepting slow performance, on the x86, given what we now
know about gcc's handling of FP code).

(I don't think we'd be arguing about any of this is it was discovered
that gcc sometimes spilled 32-bit integers to 29-bit temporaries,
discarding ones in bits 28-30, even though it could be claimed that
programmers shouldn't *assume* they always get the full 32-bit range
from `int', since the language standard doesn't define `int' as 32
bits.  I bet lots of code would continue running just fine if we
made this change in gcc.  I bet *some* code wouldn't run just fine,
and would be very painful to fix -- especially code written and
debugged using other compilers -- and it'd be pretty silly to claim
there's no reason to make gcc accommodate that code because chopping
32-bit integers to 29-bit ones on a random basis is just another
thing with which good programmers are able to contend.  So I don't
understand why this reasoning is applied to FP results.  To make this
example more pertinent, assume gcc on some machines did 32-bit-int
computations in 64 bits, thus not overflowing/underflowing in cases
where it might normally, but that it randomly chopped the 64-bit
results back down to 32 bits.  That'd be even more nightmarish, even
if it affected less code overall.  Would we refuse to even provide
an *option* to get no chopping down of 64-bit intermediate results to
32 bits?)

        tq vm, (burley)
Follow-Ups:
- Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
  - From: Joe Buck
References:
- Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
  - From: Stephen L Moshier
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]