This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Floating point trouble with x86's extended precision
- From: Jim Wilson <wilson at tuliptree dot org>
- To: Volker Reichelt <reichelt at igpm dot rwth-aachen dot de>
- Cc: lucier at math dot purdue dot edu, gcc at gcc dot gnu dot org
- Date: Thu, 21 Aug 2003 11:50:59 -0700
- Subject: Re: Floating point trouble with x86's extended precision
- References: <200308211350.h7LDoVaX014090@relay.rwth-aachen.de>
Volker Reichelt wrote:
Just to make sure I get this right: The register is spilled with
the last bits truncated (cut off) instead of rounded, right?
It is just an fst/fstp instruction. According to the manual I have, it
does a rounding operation before the store.
And one question out of curiosity. What happens to values in the FPU
that finally get written into memory *not* because the floating point
stack is filled, but because of other reasons (like the variable y in my
example). Do they also suffer from not being rounded but truncated?
I'd think so, right?
It is the same fst/fstp instruction, so it is the same rounding.
But I think the difference between rounding and truncating is not the
problem for the users.
The problem isn't that the value is rounded. The problem is that the
value is rounded at unpredictable locations, making it impossible to
compensate for the rounding.
You'll get much better results with the extended precision than without
(since I get 1 rounding to 64 bits instead of 10000).
No one here is saying that extended precision is bad. It is very
important for correct results for some algorithms. That is why the IEEE
FP standards require its presense.
However, the problem with the x86 fp reg-stack is that you can't control
when the extended precision is used. It is always used, whether you
want it or not. This makes it very hard for a compiler to get correct
results without sacrificing performance, accuracy, or access to long
double. There is really no good solution to this problem that makes
everyone happy.
This is a design from the 1980's. It seemed like a good idea at the
time, but experience has shown it was a mistake, and no one designs FP
hardware like that anymore. People who did design FP hardware like that
have since fixed the mistake. The original m68k FPU (68881) had this
mistake, and Motorola fixed it in the 68040/68060. As I mentioned
before, Intel and AMD have both fixed the x86 mistake with Itanium and
AMD64 respectively.
i) You could say that accurate results are obtained, if you do the
rounding to 64 bit after each computation.
(That has the consequence that the result does not change with
optimization, if computations aren't rearranged using commutativity
or associativity etc.)
This is the ideal situation. All gcc targets work this way except x86
(if using the reg-stack not the SSE registers) and m68k (pre-68040).
ii) You could say that accurate results are obtained, if the final
result is close to the exact arithmetic result. This means, that you
should postpone rounding to 64 bit as long as possible.
(In my understanding this is what GCC currently does - if one
ignores the fact that it doesn't round, but truncate in several
cases, right?)
Modulo the gcc bugs, yes, this is what we get. This isn't very useful
unless we fix the bugs though.
But you said it yourself: "In this case, I think we have to admit that both
viewpoints are valid, and then agree to disagree."
Yes, I think that is a good way to state my position.
Your proposed fixes all try to enforce definition i) of accuracy, but
I think definition ii) is also a valid position.
That is a valid criticism.
The only thing GCC can really blamed for is that there's no option to turn
on the workaround in c). Therefore, I'd rather call this a missing feature.
There are two things missing. The abililty to turn on the workaround in
c (i.e. emit FP rounding instruction after every operation), and a bug
fix for the register spills.
Even with those things, I think we are still in trouble. In the first
case, having explicit rounding instructions eliminates the excess
precision problem, but it introduces a double-rounding problem. So we
have lost again there. This is probably not fixable without changing
the hardware. In the second case, fixing the problem with reload spills
eliminates one source of unpredictable rounding, however, we still have
the problem that local variables get rounded if they are allocated to
the stack and not rounded if they are allocated to an FP reg-stack
register. Thus we still have the problem that we get different results
at different optimization levels. So we still lose there again also.
This might be fixable by promoting all stack locals to long double to
avoid unexpected rounding, but that will probably cause other problems
in turn. It will break all programs that expect assignments doubles
will round to double for instance. If we don't promote stack locals,
then we need explicit rounding for them, and then we have
double-rounding again.
I really see no way to fix this problem other than by fixing the
hardware. The hardware has to have explicit float and double operations.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com