This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

*To*: gcc at gcc dot gnu dot org*Subject*: Fourth Draft "Unsafe fp optimizations" project description.*From*: Toon Moene <toon at moene dot indiv dot nluug dot nl>*Date*: Sun, 12 Aug 2001 13:31:02 +0200*Organization*: Moene Computational Physics, Maartensdijk, The Netherlands

OK, I'd hoped that a fourth draft of this open project's document wouldn't be needed, but I ran afoul of the following (pointed out by several): When compiling with `-funsafe-math-optimizations' GCC not only rearranges floating point expressions, but it also allows the inline generation of `sqrt', `sin' and `cos' instructions on targets that support them, instead of calls to library routines. At first I thought this was a glibc issue (doing something clever with header files and macros that were only defined when compiling with `-funsafe-math-optimizations'), but this decision *is* part of GCC (see functions expand_builtin and expand_builtin_mathfn in builtins.c). Now, for `sqrt' this isn't much of a problem - it isn't hard to make a sqrt instruction that's as precise as the divide instruction, so if the target supports a sqrt instruction, it's OK to use it. However, for `sin' and `cos' this is different; the instructions might not be as accurate for all inputs as their library counterparts. I mention this issue below under "Open issues", because I currently have no idea how to deal with this. -- Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)

Transformations that change the meaning of floating point expressions Introduction The debate on the extent of rearrangement of floating point expressions allowed to the compiler when optimizing is a recurring theme on GCC's mailing lists. On this page we try to provide some structure to this discussion. It is understood that all of the rearrangements described here are only performed with the express permission of the user (i.e., an explicitly specified command line option). Rationale Why would someone forego numerical accuracy for speed ? Isn't the fast but wrong answer useless ? In numerical problems, there are roughly two kinds of computations: * Those that need full precision in order to guarantee acceptable results. * Those that are less sensitive to occasional loss of accuracy. Computational problems of the second kind generally have been beaten on, simplified and approximated in a drastic attempt to fit them into the limitations of present day computers. Especially the loss of accuracy due to these approximations could easily overwhelm that resulting from changing its floating point arithmetic slightly. The most obvious example of this is the first person shooting game: While the physics of reflection, refraction and scattering of electromagnetic radiation with wavelengths between 400 and 800 nm has been significantly approximated, what would make the game absolutely useless is the frequency of updating the image dropping below 20 per second. Rearranging the floating point arithmetic with associated loss of a few Units of the Last Position (ULP) could compare favourably to further approximation of the physics involved. Caveat: The loss of accuracy will not be the only effect - see below. Aim of the project The project will provide the GCC community with a classification of rearrangements of floating point expressions. Based on the classification, recommendations will be made on how to offer the users the possibility to instruct the compiler to perform rearrangements from a particular class. The classification will be based on the following criteria (courtesy of Robert Dewar): * The transformation is well-understood. * It is definitely an optimization. * All of its numerical effects are well-documented (with an emphasis on the "special effects"). (actually, Robert wrote: "does not introduce surprises" as the last criterion, but it's more useful to actually list the "special effects", i.e., anything that's not simply a loss of accuracy). Once the recommendations are agreed upon, a set of patches can be devised to change the compiler to implement the recommendations; parallel to this effort, the documentation should be updated to reflect the new reality (i.e., name and function of new flags, changes w.r.t. previous behaviour, etc.). As soon as this is done, we can close the project, indicate that status on this web page and keep it for future reference on our consensus. Preliminaries Obviously, it is useless to talk about the ill effects of rearranging floating point expressions without having a solid reference. To simplify the analysis below, the discussion is in terms of the floating point model supported by the ISO/IEC 9899 Standard (commonly known as C99), which refers to the IEC 60559 Standard (the successor to the IEEE-754 Standard). Another limitation we allow ourselves is to only treat rearrangements of expressions using +, -, * and / (however, see the "Open issues" chapter below). All other changes do not belong to the domain of the compiler proper. Unfortunately, at present GCC doesn't guarantee IEC 60559 conformance by default on all targets that can support it. A well-known exception is the ix86; the following summary of the defects is courtesy of Brad Lucier: * All temporaries generated for a single expression [should be] maintained in extended precision, even when spilled to the stack * Each assignment to a variable [should be] stored to memory. (And, if the value of that variable is used later by dereferencing its lvalue, the value is loaded from memory and the temporary that was stored to memory is not re-used.) where the [should be]'s indicate how it isn't, at present. Note: this document is discussing transformations that potentially affect the behavior of the program from a formal semantic point of view. The code generator may always make transformations that have no formal semantic effect. For example, if we write: x := x / 2.0; then the code generator might generate a division instruction which divides by two, or a halve instruction if one is present, or a multiplication by 0.5. All of these generate exactly the same result in x, and so the code generator is free to pick which ever sequence it deems most efficient. Language requirements GCC presently supports five languages: C, C++, Objective C, Java and Fortran. Of these, Fortran has the "loosest" requirements on floating point operations (basically, one could say that floating point accuracy in Fortran is a "quality of implementation" issue), while Java has the most restrictive, because it requires implementations to supply the same answers on all targets (this is definitely not a goal of the IEC 60559 Standard). It is understood that users who will apply the outcome of this project do know the extent to which they are violating the respective language standard. We might consider issuing appropriate warning messages. Classification The classification below should enable us to offer users a choice of allowable rearrangements based on the user's knowledge of the floating point values present in and generated by the computations in his/her program; hence the classification using subsets of the set of all floating point values (in a particular set: single precision or double precision). 1. Rearrangements whose only effect is for a small subset of all inputs. Rationale: Users might know the computational effects for those inputs. Example: Force underflow to zero. Savings may be large when denormal computation has to be emulated in the kernel. Special effects: Do not divide by underflowed numbers. 2. Rearrangements whose only effect is a loss of accuracy. Rationale: Users might be able to bound the effect of this rearrangement. Example: A*A*...*A -> different order of evaluation (compare a*a*a*a with t=a*a; t=t*t). Savings: Potentially many multiplies, at the cost of some temporaries. 3. Rearrangements whose effect is a loss of accuracy on a large subset of the inputs and a complete loss on a small subset of the inputs. Rationale: Users might know that their computations always fall in the "loss of accuracy" subset and be able to bound the effect of this rearrangement. Example: A*B + A*C -> A*(B+C). Will overflow for a small number of choices for B and C for which the original didn't overflow. Savings: One multiply and one temporary (register pressure). Example: B/A + C/A -> (B+C)/A. Will overflow for a small number of choices for B and C for which the original didn't overflow. Savings: One divide and one temporary (register pressure). Example: A/B -> A*(1/B). Will overflow if B is a denormal, whereas the original might not. Savings: One divide changed to a multiply - might be large in case B is a loop invariant. Remark: On some targets, the intermediate calculations will be done in extended precision (with its extended range). In that case the problem indicated above does not exist; see however the restriction for ix86 floating point arithmetic in chapter "Preliminaries". 4. Rearrangements whose effect is a loss of accuracy on half of the inputs and a complete loss on the other half of the inputs. Rationale: Users might know that their computations always fall in the "loss of accuracy" subset and be able to bound the effect of this rearrangement. Example: A/B/C -> A/(B*C). B*C will overflow for a quarter of the floating point numbers, whereas it will return zero for slightly fewer than a quarter of the numbers B, C. Example: Evaluation of Z1/Z2 (Z1, Z2 complex; Z1 = A + B i, Z2 = C + D i) as ((A*C + B*D) + i (B*C - A*D)) / (C^2 + D^2). The denominator will overflow or underflow for slightly more than half of the values of C and D. Remark: On some targets, the intermediate calculations will be done in extended precision (with its extended range). In that case the problem indicated above does not exist; see however the restriction for ix86 floating point arithmetic in chapter "Preliminaries". Open issues Currently, GCC outputs `sin' and `cos' instructions instead of calls to library functions on targets that support them (even though this fact is not documented, at least not in md.texi). Those instructions are only used when specifying the -funsafe-math-optimizations flag. This is not a (C) library issue, as can easily be seen by compiling the following Fortran code on an ix86 machine: READ*,X PRINT*,SIN(X) END with and without -funsafe-math-optimizations. Obviously, if we want to continue to support this, we have to come up with a classification of inputs to `sin' and `cos' that makes this change "safe". We should come up with a further classification for complex arithmetic. Recommendations None yet.

**Follow-Ups**:**Re: Fourth Draft "Unsafe fp optimizations" project description.***From:*Gabriel Dos Reis

**Re: Fourth Draft "Unsafe fp optimizations" project description.***From:*Linus Torvalds

**Re: Fourth Draft "Unsafe fp optimizations" project description.***From:*Tim Prince

**Re: Fourth Draft "Unsafe fp optimizations" project description.***From:*Jason Riedy

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |