This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Fourth Draft "Unsafe fp optimizations" project description.


And Toon Moene writes:
 - 
 - OK, I'd hoped that a fourth draft of this open project's document
 - wouldn't be needed, but I ran afoul of the following (pointed out by
 - several):

Unfortunately, I've been way too busy with 754 revision activities
to keep up with lists, and I've missed this whole discussion...
I've quickly skimmed the threads (the MARC archives are wonderful), 
so I may well have missed relevant discussion.  I apologize if I'm
out of line.

Just so everyone knows, I'm a complete 754-head.  Thoroughly
brain-washed.  I'm a grad student at Berkeley.  Go fig.

 - Now, for `sqrt' this isn't much of a problem - it isn't hard to make a
 - sqrt instruction that's as precise as the divide instruction, so if the
 - target supports a sqrt instruction, it's OK to use it.

Note that 754 also requires that a properly rounded sqrt operation 
be present in "the system", and most hardware supports this well
for the hardware precisions.

On the draft:
 - Transformations that change the meaning of floating point expressions

 - The debate on the extent of rearrangement of floating point expressions
 - allowed to the compiler when optimizing is a recurring theme on GCC's
 - mailing lists. On this page we try to provide some structure to this
 - discussion. It is understood that all of the rearrangements described here
 - are only performed with the express permission of the user (i.e., an
 - explicitly specified command line option).

If it's ok, I'd like to put a link to this document from the
754 pages (http://grouper.ieee.org/groups/754/) when it's
available...

 - In numerical problems, there are roughly two kinds of computations:
 - 
 -    * Those that need full precision in order to guarantee acceptable
 -      results.
 -    * Those that are less sensitive to occasional loss of accuracy.

There is a third.

	* Those for which the numerical analysis is too costly or
	too painful for the developer.

Unfortunately, this is by far the largest category.  Even at
bastions of numerical analysis, it's moving out of core course
work.  These users, the vast majority, need considered.

 -    * All of its numerical effects are well-documented (with an emphasis on
 -      the "special effects").

This way madness lies.  There is no end to the list of desired
numerical effects for complex and elementary functions.  We've
been going rounds on this in 754r meetings.  You may wish to
back off on the "all" part.  Perhaps simply "Its numerical
effects [...]"?

 - Obviously, it is useless to talk about the ill effects of rearranging
 - floating point expressions without having a solid reference. To simplify the
 - analysis below, the discussion is in terms of the floating point model
 - supported by the ISO/IEC 9899 Standard (commonly known as C99), which refers
 - to the IEC 60559 Standard (the successor to the IEEE-754 Standard).

The successor to IEEE 754-1985 and possibly 854-1987 will be IEEE 
754R-200x.  It's in progress.  If people have any time to spare 
(limitless, right?), please consider joining the mailing list.  
I'd appreciate your views on some topics.  Many of the more 
knowledgable people seem allergic to email, but perhaps...  

The web site will have a better outline of outstanding issues soon.
Honest.

 - Another limitation we allow ourselves is to only treat rearrangements of
 - expressions using +, -, * and / (however, see the "Open issues" chapter
 - below). All other changes do not belong to the domain of the compiler
 - proper.

Don't forget that SQRT and REM are required operators for 754.  We
are definitely adding FMA and possibly adding MIN/MAX variants.
These may be implemented in software, but it's reasonable to
expose them as operations to a user.

Also, conversions to/from decimal, especially literal constants,
need to be hammered down.  We're trying to come up with appropriate
wording for 754r, but it may be beyond our reach.  The C99 version
is ok...

 - Unfortunately, at present GCC doesn't guarantee IEC 60559 conformance by
 - default on all targets that can support it.

No one does, except under very liberal readings.  ;)

 -    * Each assignment to a variable [should be] stored to memory. (And, if
 -      the value of that variable is used later by dereferencing its lvalue,
 -      the value is loaded from memory and the temporary that was stored to
 -      memory is not re-used.)

Unless the compiler is allowed to widen types throughout a routine.  
This may prove to be a valid option for improving speed.  (See 
Farnum's paper, and I'm working on a more expansive version.)

Perhaps:
	* Each variable's value [should be] representable within 
	its type.

 - restrictive, because it requires implementations to supply the same answers
 - on all targets (this is definitely not a goal of the IEC 60559 Standard).

I believe it is a goal for 754r.  There's consensus that it'd be a
good thing to make it possible, but I don't know if everyone wants
to absolutely require it.  

Keep in mind that this is only for the 754 _operations_.  There
are currently a few implementation choices allowed that cause 
side effects to differ, and there is also one poorly defined
case (subnormal REM inf)...  We want to use the 15+ yrs of
relevant experience to remove the choices.

While we may suggest expression evaluation disciplines in 754r, we 
will never require one and only one.  The only way the disciplines 
will show up in 754r is as an informative annex, if we choose that 
route.

 - Classification

s/accuracy/precision/ throughout.  The compiler has no way of knowing
what is really being computed, only how.  Hence the compiler cannot
judge accuracy, only precision.  Pedantic, but it will prove to be
a sanity-saving difference.

 -   1. Rearrangements whose only effect is for a small subset of all inputs.
 - 
 -      Rationale: Users might know the computational effects for those inputs.

 -      Example: Force underflow to zero.

Be sure to note that this destroys the implication between (x-y)==0 
and x == y.  The effects may not be seen for quite some time in the
code.  Code can quickly become impossible to debug.

 -      Savings may be large when denormal
 -      computation has to be emulated in the kernel. Special effects: Do not
 -      divide by underflowed numbers.

Not just in the OS kernel, but also inside the processor.  The
Pentium IV takes a huge hit from the internal trap hosing the
pipeline.  It really sucks.

(FYI, we're replacing 754's term, denormal, with 854's term, 
subnormal.  A denormal can also be a denormalized number that's
a redundant representation (in an extended precision) for a 
normal number.  We're using subnormal to remove any appearance of
ambiguity.)

 -   4. Rearrangements whose effect is a loss of accuracy on half of the inputs
 -      and a complete loss on the other half of the inputs.

 -      Remark: On some targets, the intermediate calculations will be done in
 -      extended precision (with its extended range). In that case the problem
 -      indicated above does not exist; see however the restriction for ix86
 -      floating point arithmetic in chapter "Preliminaries".

It most certainly does exist.  Double-precision numbers can
easily overflow in Intel's extended precision.  It does reduce
the region of error, however.  Quad precision (being added to
754r) may be more help, but you know how many people implement
it quickly.

Additional:
	5.  Rearrangements that _increase_ precision.

	Rationale: Users might be iterating to convergence using
	expressions that lose precision.  More precise expressions
	will reduce the number of loops and greatly speed the
	program.

	Example: [to be filled in...  an example in Kahan's notes
	on root finders should work.  similarly, various arrangements
	of sin(x)**2 + cos(x)**2 can produce various results.  also,
	iterative refinement and contracting into fma operations.]

	Remark: Rearranging expressions to keep more, higher 
	precision intermediates can drastically reduce the
	region for error.  Almost no one analyzes their numeric
	code, even those who know how.  These rearrangements
	can both speed code and make it more safe.  Also, the
	extra precision can enable other optimizations.


	6.  Rearrangements that change when, where, and how decimal
	literals are converted into binary floating-point numbers.

	Rationale: The conversions are normally inexact and may
	even cause exceptional situations.  Literal constants in
	a program tend to be related in ways not obvious to a
	compiler, so care needs taken in converting them into
	binary.

	Example: The ratio between 25.4 and 2.54 is exactly 10
	in single and Intel's extended, but it differs slightly
	in double.  This may impact unit conversion if the literals
	are rearranged such that they are converted into different
	types.

I'm sure I can come up with more.  ;)

 - with and without -funsafe-math-optimizations. Obviously, if we want to
 - continue to support this, we have to come up with a classification of inputs
 - to `sin' and `cos' that makes this change "safe".

wheee...  This is not easy.  Some people would say this is nigh
impossible.  However, there may be good compromises.  

I'll try to find the write up on the costs of correctly rounded
elementary functions if you'd like.  Another student examined these
last semester.  Claims from IBM are missing a few qualifications.

 - We should come up with a further classification for complex arithmetic.

It's the exact same problem.  Dr. Kahan may have a write-up
by Wednesday's 754r meeting on precisely this topic.  I'll send
along a pointer if anyone's interested.

Other points picked up along the threads:
	*) Do not rely on probabilistic arguments.  In practice, they
	almost never work.

	*) For a literature survey on a budget, see Bindel's 
	annotated bibliography linked off the 754 page,
	http://grouper.ieee.org/groups/754/.

	*) Expect streamlined trap handling in 754r.  We're trying 
	to ensure they can be fully asynchronous in all interesting 
	cases.  That should make more optimizations reasonable.

	*) Additional exponent range can help flush-to-zero, but
	only on an individual operation's scale.  Once you combine
	multiple operations in an expression, it still kills you.
	There should be a short note from David Bindel on using
	a few extra bits to implement proper underflow shortly.

	*) Subnormals are truly important.  See Demmel's paper
	for the full numerical analyst's view.  For a simpler
	view, does anyone really want to debug programs where
	you cannot rely on equality?  Consider templates and
	generic programming...  Flush to zero should not be a
	default mode simply for sanity's sake.  The support
	cost will be comparable to the Intel extended fiasco.

	*) Note that -0.0 == 0.0, so -A+B == B-A is always true
	for numerical A and B.  Obviously, NaNs will make it
	always unordered.

	*) Jim Thomas gave a presentation on C99 support for IEEE 
	754 at the last meeting.  I'll have his slides up on the 754
	site tomorrow unless a bus hits me.  They're quite nice
	for outlining important issues, including many of the
	complex ones.

	*) If anyone wants to see specific, 754-related references
	linked from the official 754 page, send them to me.  Links
	or just references to appropriate (non-Sun) platform 
	documentation would be hugely appreciated.

Jason


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]