The x86's historical floating point unit (FPU) is the x87 unit. The x87 unit uses IEEE 80 bit arithmetic with 8 80 bit registers organized as a stack. Thus, by default x87 arithmetic is not true 64/32 bit IEEE, but gets extended precision from the x87 unit. However, anytime a value is moved from the registers to an IEEE 64 or 32 bit storage location, this 80 bit value must be rounded down to the appropriate number of bits. Thus computations performed by the x87 unit provide at least the same accuracy as strict 32/64 bit IEEE, with the upper limit being 80 bit IEEE accuracy.

So, the extra accuracy is almost always beneficial, but it can have some strange consequences. For instance, in the following code segment, depending on the compilation flags and numbers and calculations used to find tmp, the following code may print out that the values are different:

   double tmp, X[2];
   tmp = ....
   tmp += ....
   ...
   X[0] = tmp;
   if (X[0] == tmp)
      printf("Values are the same, as expected!\n");
   else
      printf("Values are different!\n");

This is because tmp will typically be moved to a register during register assignment, which means tmp may hold a full 80 bits of accuracy, some of which are lost in the store to X[0], and thus the numbers are no longer equal. You may workaround this problem by always explicitly storing to memory to force the round-down. Copy propagation could get ignore the intermediate store in some cases, so you may want to ensure any memory used for forcing rounding is declared volatile in order to avoid this happening.

There is a further complication. So far we have seen that values in registers have full 80 bit precision, and those explicitly pushed through memory will be rounded down to 64 or 32 bit accuracy. However, sometimes a value that is in a register must be spilled. Gcc has been designed so that such register spilling only spills the same number of bits as the underlying declared type (i.e. 64 for a double or 32 for a float). Thus, gcc may invisibly add a number of round-downs to any series of floating point calculations (in the worst case, a round-down after each calculation). This can confound certain expectations, such as that adding the same series of numbers in the same order will always provide the same answer (it won't be the case if routine A has a register spill that routine B does not). Gcc has so far considered these gcc-generated rounds as unimportant and/or considered the fix (spilling all 80 bits whenever an x87 register is spilled) too expensive to be worthwhile, as can been here. You can workaround this problem by declaring scalar temporaries (likely register assignment targets) as long double, though this will typically kill performance if the code is compiled with SSE or on another architecture.

An important point to be made, however, is that the amount of worst-case error that could possibly happen using the x87 (with any amount of intermediate rounding) is at worst the same as true 64 or 32 bit arithmetic, and in practice is almost always better.

Most modern x86 architectures also have the ability to perform flops (floating point operations) using their SSE units. SSE instruction may be used in vector or scalar mode, and by default they provide true 64 or 32 bit IEEE compliant arithmetic (though there is a status bit that can be toggled that provides faster execution at the cost of removing IEEE compliance). Therefore, codes sensitive to changes in error such as the above example may wish to use SSE. Note, however, that this greater repeatability comes at the cost of lost precision (i.e. SSE always gets the same precision because it always takes the equivalent of the x87's worst case: a forced round down at each step).

None: x87note (last edited 2008-01-10 19:38:37 by localhost)