This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

*To*: gcc at gcc dot gnu dot org*Subject*: Re: Fourth Draft "Unsafe fp optimizations" project description.*From*: Jason Riedy <ejr at CS dot Berkeley dot EDU>*Date*: Sun, 12 Aug 2001 22:50:37 -0700*cc*: Toon Moene <toon at moene dot indiv dot nluug dot nl>

And Toon Moene writes: - - OK, I'd hoped that a fourth draft of this open project's document - wouldn't be needed, but I ran afoul of the following (pointed out by - several): Unfortunately, I've been way too busy with 754 revision activities to keep up with lists, and I've missed this whole discussion... I've quickly skimmed the threads (the MARC archives are wonderful), so I may well have missed relevant discussion. I apologize if I'm out of line. Just so everyone knows, I'm a complete 754-head. Thoroughly brain-washed. I'm a grad student at Berkeley. Go fig. - Now, for `sqrt' this isn't much of a problem - it isn't hard to make a - sqrt instruction that's as precise as the divide instruction, so if the - target supports a sqrt instruction, it's OK to use it. Note that 754 also requires that a properly rounded sqrt operation be present in "the system", and most hardware supports this well for the hardware precisions. On the draft: - Transformations that change the meaning of floating point expressions - The debate on the extent of rearrangement of floating point expressions - allowed to the compiler when optimizing is a recurring theme on GCC's - mailing lists. On this page we try to provide some structure to this - discussion. It is understood that all of the rearrangements described here - are only performed with the express permission of the user (i.e., an - explicitly specified command line option). If it's ok, I'd like to put a link to this document from the 754 pages (http://grouper.ieee.org/groups/754/) when it's available... - In numerical problems, there are roughly two kinds of computations: - - * Those that need full precision in order to guarantee acceptable - results. - * Those that are less sensitive to occasional loss of accuracy. There is a third. * Those for which the numerical analysis is too costly or too painful for the developer. Unfortunately, this is by far the largest category. Even at bastions of numerical analysis, it's moving out of core course work. These users, the vast majority, need considered. - * All of its numerical effects are well-documented (with an emphasis on - the "special effects"). This way madness lies. There is no end to the list of desired numerical effects for complex and elementary functions. We've been going rounds on this in 754r meetings. You may wish to back off on the "all" part. Perhaps simply "Its numerical effects [...]"? - Obviously, it is useless to talk about the ill effects of rearranging - floating point expressions without having a solid reference. To simplify the - analysis below, the discussion is in terms of the floating point model - supported by the ISO/IEC 9899 Standard (commonly known as C99), which refers - to the IEC 60559 Standard (the successor to the IEEE-754 Standard). The successor to IEEE 754-1985 and possibly 854-1987 will be IEEE 754R-200x. It's in progress. If people have any time to spare (limitless, right?), please consider joining the mailing list. I'd appreciate your views on some topics. Many of the more knowledgable people seem allergic to email, but perhaps... The web site will have a better outline of outstanding issues soon. Honest. - Another limitation we allow ourselves is to only treat rearrangements of - expressions using +, -, * and / (however, see the "Open issues" chapter - below). All other changes do not belong to the domain of the compiler - proper. Don't forget that SQRT and REM are required operators for 754. We are definitely adding FMA and possibly adding MIN/MAX variants. These may be implemented in software, but it's reasonable to expose them as operations to a user. Also, conversions to/from decimal, especially literal constants, need to be hammered down. We're trying to come up with appropriate wording for 754r, but it may be beyond our reach. The C99 version is ok... - Unfortunately, at present GCC doesn't guarantee IEC 60559 conformance by - default on all targets that can support it. No one does, except under very liberal readings. ;) - * Each assignment to a variable [should be] stored to memory. (And, if - the value of that variable is used later by dereferencing its lvalue, - the value is loaded from memory and the temporary that was stored to - memory is not re-used.) Unless the compiler is allowed to widen types throughout a routine. This may prove to be a valid option for improving speed. (See Farnum's paper, and I'm working on a more expansive version.) Perhaps: * Each variable's value [should be] representable within its type. - restrictive, because it requires implementations to supply the same answers - on all targets (this is definitely not a goal of the IEC 60559 Standard). I believe it is a goal for 754r. There's consensus that it'd be a good thing to make it possible, but I don't know if everyone wants to absolutely require it. Keep in mind that this is only for the 754 _operations_. There are currently a few implementation choices allowed that cause side effects to differ, and there is also one poorly defined case (subnormal REM inf)... We want to use the 15+ yrs of relevant experience to remove the choices. While we may suggest expression evaluation disciplines in 754r, we will never require one and only one. The only way the disciplines will show up in 754r is as an informative annex, if we choose that route. - Classification s/accuracy/precision/ throughout. The compiler has no way of knowing what is really being computed, only how. Hence the compiler cannot judge accuracy, only precision. Pedantic, but it will prove to be a sanity-saving difference. - 1. Rearrangements whose only effect is for a small subset of all inputs. - - Rationale: Users might know the computational effects for those inputs. - Example: Force underflow to zero. Be sure to note that this destroys the implication between (x-y)==0 and x == y. The effects may not be seen for quite some time in the code. Code can quickly become impossible to debug. - Savings may be large when denormal - computation has to be emulated in the kernel. Special effects: Do not - divide by underflowed numbers. Not just in the OS kernel, but also inside the processor. The Pentium IV takes a huge hit from the internal trap hosing the pipeline. It really sucks. (FYI, we're replacing 754's term, denormal, with 854's term, subnormal. A denormal can also be a denormalized number that's a redundant representation (in an extended precision) for a normal number. We're using subnormal to remove any appearance of ambiguity.) - 4. Rearrangements whose effect is a loss of accuracy on half of the inputs - and a complete loss on the other half of the inputs. - Remark: On some targets, the intermediate calculations will be done in - extended precision (with its extended range). In that case the problem - indicated above does not exist; see however the restriction for ix86 - floating point arithmetic in chapter "Preliminaries". It most certainly does exist. Double-precision numbers can easily overflow in Intel's extended precision. It does reduce the region of error, however. Quad precision (being added to 754r) may be more help, but you know how many people implement it quickly. Additional: 5. Rearrangements that _increase_ precision. Rationale: Users might be iterating to convergence using expressions that lose precision. More precise expressions will reduce the number of loops and greatly speed the program. Example: [to be filled in... an example in Kahan's notes on root finders should work. similarly, various arrangements of sin(x)**2 + cos(x)**2 can produce various results. also, iterative refinement and contracting into fma operations.] Remark: Rearranging expressions to keep more, higher precision intermediates can drastically reduce the region for error. Almost no one analyzes their numeric code, even those who know how. These rearrangements can both speed code and make it more safe. Also, the extra precision can enable other optimizations. 6. Rearrangements that change when, where, and how decimal literals are converted into binary floating-point numbers. Rationale: The conversions are normally inexact and may even cause exceptional situations. Literal constants in a program tend to be related in ways not obvious to a compiler, so care needs taken in converting them into binary. Example: The ratio between 25.4 and 2.54 is exactly 10 in single and Intel's extended, but it differs slightly in double. This may impact unit conversion if the literals are rearranged such that they are converted into different types. I'm sure I can come up with more. ;) - with and without -funsafe-math-optimizations. Obviously, if we want to - continue to support this, we have to come up with a classification of inputs - to `sin' and `cos' that makes this change "safe". wheee... This is not easy. Some people would say this is nigh impossible. However, there may be good compromises. I'll try to find the write up on the costs of correctly rounded elementary functions if you'd like. Another student examined these last semester. Claims from IBM are missing a few qualifications. - We should come up with a further classification for complex arithmetic. It's the exact same problem. Dr. Kahan may have a write-up by Wednesday's 754r meeting on precisely this topic. I'll send along a pointer if anyone's interested. Other points picked up along the threads: *) Do not rely on probabilistic arguments. In practice, they almost never work. *) For a literature survey on a budget, see Bindel's annotated bibliography linked off the 754 page, http://grouper.ieee.org/groups/754/. *) Expect streamlined trap handling in 754r. We're trying to ensure they can be fully asynchronous in all interesting cases. That should make more optimizations reasonable. *) Additional exponent range can help flush-to-zero, but only on an individual operation's scale. Once you combine multiple operations in an expression, it still kills you. There should be a short note from David Bindel on using a few extra bits to implement proper underflow shortly. *) Subnormals are truly important. See Demmel's paper for the full numerical analyst's view. For a simpler view, does anyone really want to debug programs where you cannot rely on equality? Consider templates and generic programming... Flush to zero should not be a default mode simply for sanity's sake. The support cost will be comparable to the Intel extended fiasco. *) Note that -0.0 == 0.0, so -A+B == B-A is always true for numerical A and B. Obviously, NaNs will make it always unordered. *) Jim Thomas gave a presentation on C99 support for IEEE 754 at the last meeting. I'll have his slides up on the 754 site tomorrow unless a bus hits me. They're quite nice for outlining important issues, including many of the complex ones. *) If anyone wants to see specific, 754-related references linked from the official 754 page, send them to me. Links or just references to appropriate (non-Sun) platform documentation would be hugely appreciated. Jason

**Follow-Ups**:**Re: Fourth Draft "Unsafe fp optimizations" project description.***From:*Kai Henningsen

**References**:**Fourth Draft "Unsafe fp optimizations" project description.***From:*Toon Moene

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |