Node: Floating-point Errors, Previous: Strange Behavior at Run Time, Up: But-bugs
Some programs appear to produce inconsistent floating-point
results compiled by
g77 versus by other compilers.
Often the reason for this behavior is the fact that floating-point
values are represented on almost all Fortran systems by
approximations, and these approximations are inexact
even for apparently simple values like 0.1, 0.2, 0.3, 0.4, 0.6,
0.7, 0.8, 0.9, 1.1, and so on.
Most Fortran systems, including all current ports of
use binary arithmetic to represent these approximations.
Therefore, the exact value of any floating-point approximation
as manipulated by
g77-compiled code is representable by
adding some combination of the values 1.0, 0.5, 0.25, 0.125, and
so on (just keep dividing by two) through the precision of the
fraction (typically around 23 bits for
REAL(KIND=1), 52 for
REAL(KIND=2)), then multiplying the sum by a integral
power of two (in Fortran, by
2**N) that typically is between
-127 and +128 for
REAL(KIND=1) and -1023 and +1024 for
REAL(KIND=2), then multiplying by -1 if the number
So, a value like 0.2 is exactly represented in decimal--since
it is a fraction,
2/10, with a denominator that is compatible
with the base of the number system (base 10).
2/10 cannot be represented by any finite number
of sums of any of 1.0, 0.5, 0.25, and so on, so 0.2 cannot
be exactly represented in binary notation.
(On the other hand, decimal notation can represent any binary
number in a finite number of digits.
Decimal notation cannot do so with ternary, or base-3,
notation, which would represent floating-point numbers as
sums of any of
1/9, and so on.
After all, no finite number of decimal digits can exactly
Fortunately, few systems use ternary notation.)
Moreover, differences in the way run-time I/O libraries convert between these approximations and the decimal representation often used by programmers and the programs they write can result in apparent differences between results that do not actually exist, or exist to such a small degree that they usually are not worth worrying about.
For example, consider the following program:
PRINT *, 0.2 END
When compiled by
g77, the above program might output
0.20000003, while another compiler might produce a
executable that outputs
This particular difference is due to the fact that, currently,
conversion of floating-point values by the
g77, handles only double-precision values.
0.2 in the program is a single-precision value, it
is converted to double precision (still in binary notation)
before being converted back to decimal.
The conversion to binary appends binary zero digits to the
original value--which, again, is an inexact approximation of
0.2--resulting in an approximation that is much less exact
than is connoted by the use of double precision.
(The appending of binary zero digits has essentially the same
effect as taking a particular decimal approximation of
1/3, such as
0.3333333, and appending decimal
zeros to it, producing
Treating the resulting decimal approximation as if it really
had 18 or so digits of valid precision would make it seem
a very poor approximation of
As a result of converting the single-precision approximation
to double precision by appending binary zeros, the conversion
of the resulting double-precision
value to decimal produces what looks like an incorrect
result, when in fact the result is inexact, and
is probably no less inaccurate or imprecise an approximation
of 0.2 than is produced by other compilers that happen to output
the converted value as "exactly"
(Some compilers behave in a way that can make them appear
to retain more accuracy across a conversion of a single-precision
constant to double precision.
See Context-Sensitive Constants, to see why
this practice is illusory and even dangerous.)
Note that a more exact approximation of the constant is computed when the program is changed to specify a double-precision constant:
PRINT *, 0.2D0 END
Future versions of
libg2c might convert
single-precision values directly to decimal,
instead of converting them to double precision first.
This would tend to result in output that is more consistent
with that produced by some other Fortran implementations.
A useful source of information on floating-point computation is David Goldberg, `What Every Computer Scientist Should Know About Floating-Point Arithmetic', Computing Surveys, 23, March 1991, pp. 5-48. An online version is available at http://docs.sun.com/.
Information related to the IEEE 754 floating-point standard by a leading light can be found at http://http.cs.berkeley.edu/%7Ewkahan/ieee754status/; see also slides from the short course referenced from http://http.cs.berkeley.edu/%7Efateman/. http://www.linuxsupportline.com/%7Ebillm/ has a brief guide to IEEE 754, a somewhat x86-GNU/Linux-specific FAQ, and library code for GNU/Linux x86 systems.
The supplement to the PostScript-formatted Goldberg document, referenced above, is available in HTML format. See `Differences Among IEEE 754 Implementations' by Doug Priest. This document explores some of the issues surrounding computing of extended (80-bit) results on processors such as the x86, especially when those results are arbitrarily truncated to 32-bit or 64-bit values by the compiler as "spills".
g77 specifically, and
does arbitrarily truncate 80-bit results during spills
as of this writing.
It is not yet clear whether a future version of
the GNU compiler suite will offer 80-bit spills
as an option, or perhaps even as the default behavior.)
The GNU C library provides routines for controlling the FPU, and other documentation about this.
See Floating-point precision, regarding IEEE 754 conformance.