This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
- To: Joe Buck <jbuck at synopsys dot com>
- Subject: Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
- From: Edward Jason Riedy <ejr at CS dot Berkeley dot EDU>
- Date: Mon, 14 Dec 1998 14:30:29 -0800
- cc: egcs at cygnus dot com
Oh well. And Joe Buck writes:
- What would the performance cost be if we spilled ix86 FP registers
- as 80 bits?
FYI, this is Dr. Kahan's suggested fix whenever any of us (us == grad
students at Berkeley who ask him about it) mention the truncation
problem.
Dr. Kahan's original intent was to have the on-chip stack be just the
top few cells in the total FP stack. It's outlined in a paper from 1989
(with a 1990 prefix (modified in 1994) and a 1998 addendum) titled
``How Intel 80x87 Stack Over/Underflow Should Have Been Handled.''
It's in the FP98 notes, on the off chance any of y'all have them.
Quick summary of points from that paper (which I've seen mentioned
here, I think, but not with details):
* He forcasts that only 1280 bytes (128 80-bit words) of memory
for stack extension would be ``almost always ample.''
* Differences in the 80x87 family make engineering the intended
behavior nasty. It also involves OS help for the trap handlers.
* The 80287 has two major variants.
* Not all opcodes are recorded the same way through the
80x87 family.
* The 80387 has undocumented anomalies.
* <80387 don't support many IEEE 754 operations, and they
would need emulated.
* Other co-processors (namely the Weiteks) don't have 80-bit
precision. (Like I said, 1989. Not so much an issue now.)
* Some IEEE functions aren't in <80387 FPUs, making drivers
more difficult to implement.
In this paper, part of the problem he mentions is that programs would
need to determine which FPU exists at run-time. When he wrote the paper,
that wasn't a common operation. The diversification of the 80x86 family
(MMX, 3Dnow, etc) has made it quite common.
It would be _really, really, really_ nice if someone could make the
whole thing work as intended. It's quite possible with the free Unices
and gcc, especially since Linux / *BSD don't bother too much about pre-
80386 chips. I want to look at it, but I won't be able to start for
a few months due to other commitments (and lack of understanding of the
relevant gcc / Linux / glibc code, but I'm working on it).
If there's interest, I'll try to convince Dr. Kahan to post this paper
on-line. It's a nice outline of the issues involved.
Jason