This is the mail archive of the
mailing list for the GCC project.
Re: Floating point trouble with x86's extended precision
- From: Jim Wilson <wilson at tuliptree dot org>
- To: Volker Reichelt <reichelt at igpm dot rwth-aachen dot de>
- Cc: lucier at math dot purdue dot edu, gcc at gcc dot gnu dot org
- Date: Wed, 20 Aug 2003 13:56:27 -0700
- Subject: Re: Floating point trouble with x86's extended precision
- References: <200308201627.h7KGRHaX017733@relay.rwth-aachen.de>
Volker Reichelt wrote:
In the process of revamping the non-bugs section of the bug reporting
instructions I came across a problem with the excess precision of the
This is a complicated issue.
There is a bug in the x86 port that causes it to emit buggy FP code.
This is partly a flaw in the x86 hardware; it lacks SFmode/DFmode
operations on the floating point register stack. This is partly a flaw
in the x86 backend. It lies, and claims that SFmode/DFmode operations
are available. Thus the gcc optimizer thinks it is emitting
SFmode/DFmode instructions when it is actually emitting XFmode
instructions, and this causes unexpected rounding problems. The easiest
way to see this is to write an expression that needs more than 8
register to evaluate. Reload will spill registers in the middle of the
expression, and they will be truncated to 64-bits when spilled because
the optimizer thinks we have 64-bit values. This results in rounding
error. The same expression can give different results at different
optimization levels because different pseudos get spilled. This is
clearly a gcc bug. This problem has been known for over a decade, and
has not yet been fixed, and probably never will be. Getting correct
results will require emitting a lot of explicitly rounding operations
which will reduce performance noticably, and may cause more complaints
than the rounding bug.
The easiest way to fix this problem is to fix the hardware. Both Intel
and AMD have done so, but in different ways. Intel has the IPF (aka
IA-64) architecture which has explicit SFmode/DFmode operations and thus
no problem. AMD has the AMD64 architecture which has an ABI that
requires use of the SSE registers instead of the floating point register
stack, and the SSE registers have explicit SFmode/DFmode operations and
hence do not have this problem.
There is also another problem here that excess precision can cause
problems even when it doesn't result in rounding errors. This is the
immediate case you are discussing with Brad Lucier. Ideally, we should
have no excess precision, and the testcase should work. However, due to
the design of the x86, eliminating the excess precision is a burden on
the compiler, and hurts performance, hence it is easier to ask users to
program around it. Excess precision has even been accepted by the IEEE
FP standard in some cases. For instance, the powerpc has a multiply
accumulate instruction that doesn't round the intermediate result. This
means you get a different answer with separate multiply and add
instructions than you do if you use the multiply accumulate instruction.
This was officially blessed by the IEEE FP committee as being OK,
because the multiply accumulate result was more accurate even though it
is different. In most numerical calculations, you have to expect some
rounding error, and one could argue that this case is no different.
In this case, I think we have to admit that both viewpoints are valid,
and then agree to disagree.
As for solutions to the problem...
1) If you care about numerical accuraccy, don't use x86. Seriously.
AMD64 and IPF (aka IA-64) are OK, but IA-32 is not. I realize this is
impractical in most cases, but it is something that should be mentioned.
If you must use x86 FP hardware then...
2) Do FP arithmetic in the SSE registers via the -mfpmath=sse option. I
haven't tried this myself, so I don't know how practical it is.
3) Set the FP reg rounding precision to 64 bits. This has the flaw that
you can no longer perform XFmode operations. This is only a partial
fix, in that we still have excess precision problems for SFmode operations.
4) Try using -ffloat-store. This works for some but not all programs.
-ffloat-store forces user declared variables to be allocated on the
stack, and hence avoid the in register excess precision problems.
However, temporaries are still allocated to registers, and can still
cause rounding errors due to excess precision, so this is not a complete
5) Fix the x86 backend to stop lying about availability of SFmode/DFmode
operations, probably via an option, since this will reduce performance
so much as to cause other problems. This would at least give people the
option of getting slow but correct code instead of the current fast but
I've found FP bugs in all of the x86 compilers that I've ever used. I
am not sure if there are any that get it right, so I am skeptical that
gcc will ever get it right. I haven't tried any of the compilers from
companies that specialize in FP though, maybe some of them get it right.
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com