This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Floating point trouble with x86's extended precision

Volker Reichelt wrote:
In the process of revamping the non-bugs section of the bug reporting
instructions I came across a problem with the excess precision of the
x86 FPU:

This is a complicated issue.

There is a bug in the x86 port that causes it to emit buggy FP code. This is partly a flaw in the x86 hardware; it lacks SFmode/DFmode operations on the floating point register stack. This is partly a flaw in the x86 backend. It lies, and claims that SFmode/DFmode operations are available. Thus the gcc optimizer thinks it is emitting SFmode/DFmode instructions when it is actually emitting XFmode instructions, and this causes unexpected rounding problems. The easiest way to see this is to write an expression that needs more than 8 register to evaluate. Reload will spill registers in the middle of the expression, and they will be truncated to 64-bits when spilled because the optimizer thinks we have 64-bit values. This results in rounding error. The same expression can give different results at different optimization levels because different pseudos get spilled. This is clearly a gcc bug. This problem has been known for over a decade, and has not yet been fixed, and probably never will be. Getting correct results will require emitting a lot of explicitly rounding operations which will reduce performance noticably, and may cause more complaints than the rounding bug.

The easiest way to fix this problem is to fix the hardware. Both Intel and AMD have done so, but in different ways. Intel has the IPF (aka IA-64) architecture which has explicit SFmode/DFmode operations and thus no problem. AMD has the AMD64 architecture which has an ABI that requires use of the SSE registers instead of the floating point register stack, and the SSE registers have explicit SFmode/DFmode operations and hence do not have this problem.

There is also another problem here that excess precision can cause problems even when it doesn't result in rounding errors. This is the immediate case you are discussing with Brad Lucier. Ideally, we should have no excess precision, and the testcase should work. However, due to the design of the x86, eliminating the excess precision is a burden on the compiler, and hurts performance, hence it is easier to ask users to program around it. Excess precision has even been accepted by the IEEE FP standard in some cases. For instance, the powerpc has a multiply accumulate instruction that doesn't round the intermediate result. This means you get a different answer with separate multiply and add instructions than you do if you use the multiply accumulate instruction. This was officially blessed by the IEEE FP committee as being OK, because the multiply accumulate result was more accurate even though it is different. In most numerical calculations, you have to expect some rounding error, and one could argue that this case is no different. In this case, I think we have to admit that both viewpoints are valid, and then agree to disagree.

As for solutions to the problem...
1) If you care about numerical accuraccy, don't use x86. Seriously. AMD64 and IPF (aka IA-64) are OK, but IA-32 is not. I realize this is impractical in most cases, but it is something that should be mentioned. If you must use x86 FP hardware then...
2) Do FP arithmetic in the SSE registers via the -mfpmath=sse option. I haven't tried this myself, so I don't know how practical it is.
3) Set the FP reg rounding precision to 64 bits. This has the flaw that you can no longer perform XFmode operations. This is only a partial fix, in that we still have excess precision problems for SFmode operations.
4) Try using -ffloat-store. This works for some but not all programs. -ffloat-store forces user declared variables to be allocated on the stack, and hence avoid the in register excess precision problems. However, temporaries are still allocated to registers, and can still cause rounding errors due to excess precision, so this is not a complete solution.
5) Fix the x86 backend to stop lying about availability of SFmode/DFmode operations, probably via an option, since this will reduce performance so much as to cause other problems. This would at least give people the option of getting slow but correct code instead of the current fast but incorrect code.

I've found FP bugs in all of the x86 compilers that I've ever used. I am not sure if there are any that get it right, so I am skeptical that gcc will ever get it right. I haven't tried any of the compilers from companies that specialize in FP though, maybe some of them get it right.
Jim Wilson, GNU Tools Support,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]