Reorganize -ffast-math code.

Thu Mar 8 10:39:00 GMT 2001

> So you seem to have a few choices:
>  - don't use x86
>  - use XMM on new hardware, which gets rid of the extended precision.
>  - have the compiler always set the rounding mode before each operation
>    (with obvious optimizations for consecutive operations etc where the
>     compiler can know the previous rounding mode)
>  - have the programmer set the rounding mode explicitly.
> 
> Right now you effectively have choice 4: set the rounding mode explicitly.
> 
> At that point results are stable on x86, no?

If by "results are stable on x86" you mean the results of a C program compiled
by gcc and run on x86 are reproducible and predictable by reading the program
text, unfortunately not. I don't have the time now to give a precise
example, but here is how you go about constructing the example code.

First, it doesn't matter what the rounding precision is, but you can set
it to double if you like.  (The term "mode" refers to whether you "round to
nearest", "truncate", "round up" or "round down".)  Consider

#include <stdio.h>
double y, z;
double x[] = { <all entries of x are about 2^{1000}>}

int main()
{
  z = sqrt (x[0] * x[0] + x[1] * x[1]);
  fprintf("%f\n", z);

  nonsense();
  fprintf("%f %f\n", y, z);
  return 0;
}

void nonsense()
{
  y = <large expression that requires more than 8 floating-point
       temporaries with (z = sqrt (x[0] * x[0] + x[1] * x[1])))
       embedded in the expression>
}

Compile this with "gcc -o test test.c" on x86.

Now, unless gcc does something really evil that I'm not aware
of, the first printf will always print a finite number around
2^{1000}; let's assume that this is the value that the programmer
wants.  If gcc happens to spill the temporary containing x[0] * x[0]
(or x[1] * x[1]) to a double stack slot and reload the spilled 
temporary for the addition, then the second will print +inf for z.  And if
gcc doesn't happen to spill either x[0] * x[0] and x[1] * x[1]
before adding them, then the value of z printed by the two printfs
will be the same.  

So gcc can generate assembley code that computes the same expression
in two different ways, producing different results.  For some applications
predictability and reproducibility are important enough that gcc cannot
be used to compile them on x86.

Brad