Software floating point in GCC
Software floating point in GCC currently uses config/fp-bit.c, with some special case support code in mklibgcc.in, plus target-specific assembly implementations for some targets. In addition, libgcc2.c contains generic implementations of conversions of floating point to and from signed and unsigned DImode (or TImode on 64-bit targets), and of conversions from floating point to unsigned SImode, used on all targets unless they expand such conversions inline or use system library functions via set_conv_libfunc.
fp-bit.c is a rather inefficient implementation of software floating point. Some years ago Torbjorn Granlund posted an alternative implementation, ieeelib; see http://gcc.gnu.org/ml/gcc/2005-11/msg01373.html for discussion and references.
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
Each implementation differs in what features it supports:
- soft-fp is the only one which supports floating-point exceptions and rounding modes; this support can be compiled out, and is compiled out by default in the integration into GCC. It has support for target-specific selection of exactly what NaN is returned by some operations, possibly because of the use for hardware math emulation. It makes heavy use of preprocessor macros to reduce code duplication; this does mean it takes a while to make sense of the code. Floating-point numbers can be represented using 1, 2 or 4 words, so IEEE quad can be supported using only 32-bit operations (plus longlong.h). It does not support use of 16-bit words, though this would not be hard to add; however, code size may mean 16-bit targets would better use assembly implementations.
- ieeelib has two implementations of every function, to provide versions using either one or two words for floating-point values. Because it doesn't use macros like soft-fp, there is substantial code duplication between these versions. The code is very heavily tuned for speed (and also has small code size), at the expense of maintainabilty and generality.
fp-bit represents floating-point values using single words, so requiring TImode arithmetic to support IEEE quad arithmetic. The code is quite large and slow. It does have compile-time conditionals for various target peculiarities (e.g. disabling infinities and NaNs, or disabling denormals), not all used by any current target. It is the only one of the three with any support for IBM long double (although this support is unlikely to be as accurate as possible).
In view of the greater maintainabilty associated with the macro structure of soft-fp, and the optional exception and rounding mode support, it seems best to use soft-fp but with ideas and algorithms from ieeelib put in the macro structure of soft-fp to speed up soft-fp. RMS has approved using soft-fp under the GPL+exception licence used for libgcc (instead of LGPL as used for glibc).
The differences between soft-fp and ieeelib essentially are:
- ieeelib duplicates _1 and _2 function definitions, soft-fp make heavy use of macros to avoid such duplication; what's wanted is the macro structure of soft-fp and the logic of ieeelib. (In fact perhaps there should be even heavier use of macros than soft-fp does at present; there's still quite a bit of duplication.)
- soft-fp fully unpacks and classifies floating point values (in the manner of fp-bit), ieeelib only unpacks the bits without classification or inserting the implicit MSB.
- soft-fp includes exception and rounding mode support. With the default macro definitions, exception raising is just dead code and the rounding mode is a constant (round to nearest), but there are some places where the algorithms in ieeelib are only valid without exceptions and rounding modes, so soft-fp needs to be more complicated, and some may not get optimized down to the ieeelib algorithms with the default macro definitions in effect.
- ieeelib shifts the mantissa as far to the left as possible in unpacking (i.e. as many guard bits as possible), soft-fp always shifts by exactly 3 to the left. (The effect is that conversions from fp to integer only need shift right in ieeelib, never shift left; in soft-fp they may need to do either.)
- ieeelib unpacks exponent and sign into a single variable (with the sign bit replicated in all spare bits, e.g. ssseeeee from original number with sign s and exponent eeeee), soft-fp unpacks them separately.
To speed up soft-fp, a "semi-raw" unpacking mode has been added which is closer to that used by ieeelib, and this mode is used for various operations. These improvements have been committed to glibc CVS. The improved soft-fp has been committed to csl-ppc4xx-branch. Several bugs have been found and fixed in soft-fp in the process.
Future projects
It is intended to merge soft-fp from csl-ppc4xx-branch to mainline as an alternative software floating point implementation to fp-bit, and to make it the one used on powerpc-linux. It is hoped that it can be used on other targets as well to replace fp-bit.c (possibly adding some features to soft-fp required for particular targets). For each such target, soft-fp needs benchmarking and testing for correctness. IEEE floating-point test software is discussed at http://www.math.utah.edu/%7ebeebe/software/ieee/ ; paranoia and ucbtest are of particular use.
Some more specific projects and issues are:
- soft-fp in libc (on PowerPC --without-fp) is somewhat broken. Most of the functions in libc are not exported from the shared libc, but some are, while linking with -static yields problems with functions being defined in both libc and libgcc. Using the libc functions means exception handling and rounding mode support interacts correctly with the fenv.h functions. It might be useful in future to be able to configure GCC to disable those functions from libgcc known to be present in libc.
- It would be useful to have a configure option to enable exception and rounding mode support in libgcc. This should interact properly with the system fenv.h functions, and with hardware settings of exceptions and rounding modes in the cases where some floating-point modes are hard and some soft.
- Further profiling could be used to identify further speedups possible using ideas from ieeelib.
- In the csl-ppc4xx-branch version, soft-fp is used to replace the libgcc2.c soft-float functions as well as the fp-bit.c ones. This is fine for purely soft-float targets, but on targets with both soft-float and hard-float multilibs (such as powerpc-linux) it causes the functions to be replaced in both multilibs; the libgcc2.c versions potentially handle exceptions and rounding modes correctly, while by default the soft-fp versions don't. For mainline it is proposed to disable replacing the libgcc2.c functions for such targets until we have toplevel libgcc, after which the makefile fragments in toplevel libgcc can be evaluated once per multilib and varying the files and LIB2FUNCS_EXCLUDE settings for each multilib will be easier. It's a pre-existing condition that the soft-float functions from fp-bit.c appear in both versions of libgcc although only needed in the soft-float version.