This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Serious bug


H.J. Lu writes ...
> 
> > Not to me. I was taught NEVER to use == or != with floating point, because
> > they simply don't make sense.
> 
> That is one thing I learned in my numerical analysis class.

They certainly should not be used naively, however "they don't make
sense" is much too strong.

A sane floating point model has a well specified floating point
representation, and well specified rules for how results are
calculated.  Sane floating point is *approximate* and *deterministic*.

Hence a program like the following makes sense:

---
int main() {
    double x = 1;

    while (1 + x != 1) {
        x /= 2;
    }

    printf("%g\n", x);
}
---

It contains some assumptions about the floating point model, but it
makes sense.  Given a floating point model, the program has a well
defined result, and a compiler that breaks this, for example, by
"optimizing" the loop condition to "x != 0" using laws about real
numbers, is broken.

Now change the printf to:

    printf("%.20f %.20f %.20f %.20f\n", x, 1+x, 1 + 2*x, 1 + 4*x);

On a machine with a sane floating point model, the results make sense.
For example, my SGI workstation prints:

0.00000000000000011102 1.00000000000000000000 1.00000000000000020000 1.00000000000000040000

However, a machine with the Intel FPU from hell gives:

0.00000000000000000005 1.00000000000000000000 1.00000000000000000000 1.00000000000000000000

due to the fact that whether 1+x and 1 are the same number depends on whether
the numbers are in the FPU or not.

If you put printf's in the loop to watch what x is, you'll get:

1.000000000000007105427357601002
1.000000000000003552713678800501
1.000000000000001776356839400250
1.000000000000000888178419700125
1.000000000000000444089209850063
1.000000000000000222044604925031
1.000000000000000000000000000000
[...] (repeats quite a number of times)
1.000000000000000000000000000000

while again, on a decent floating point implementation, the loop terminates
when 1 + x and 1 become indistinguishable, and this happens at a well 
determined point.

On an architecture where there are *two* floating point models, and
which one is used depends intimately on details of the generated code,
optimization level, what gets spilled to memory, etc, floating point
calculations seem nondeterministic.

People who understand floating point, and have develop numerical
programs on a regular basis refer to such architectures using highly
technical terms.  The behavior is refered to as "stupid", and the
architectures are considered to be "broken".

Now, is egcs required to fix this behavior?  Given how inefficient it
is to fix it on Intel abominations, probably not.  It would be nice if
there were a -ffloat-store-all to deal with the problem, though.  And
perhaps a FAQ entry explaining that this is a problem with Intel FPUs,
and not egcs.

---------------------------------------------------------------------------
Tim Hollebeek                           | "Everything above is a true
email: tim@wfn-shop.princeton.edu       |  statement, for sufficiently
URL: http://wfn-shop.princeton.edu/~tim |  false values of true."


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]