This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Second Draft "Unsafe fp optimizations" project description.


L.S.

Attached is the second draft of the proposed description of the "Unsafe
floating point optimizations" project.

Comments, recommendations and critiques welcome.

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
Optimizations that change the meaning of floating point expressions

Introduction

The debate on the extent of rearrangement of floating point expressions
allowed to the compiler when optimizing is a recurring theme on GCC's
mailing lists. On this page we try to provide some structure to this
discussion. It is understood that all of the rearrangements described here
are only performed with the express permission of the user (i.e., an
explicitly specified command line option).

Rationale

Why would someone forego numerical accuracy for speed ? Isn't the fast but
wrong answer useless ?

Unfortunately, this simple reasoning doesn't cut it in the Real World. In
the Real World, computational problems have been beaten on, simplified and
approximated in a heroic attempt to fit them into limitations of present day
computers. Especially the loss of accuracy due to these approximations could
easily overwhelm that resulting from changing its floating point arithmetic
slightly. The most obvious example of this is the first person shooting
game: While the physics of reflection, refraction and scattering of
electromagnetic radiation with wavelengths between 400 and 800 nm has been
significantly approximated, what would make the game absolutely useless is
the frequency of updating the image dropping below 20 per second.
Rearranging the floating point arithmetic with associated loss of a few
Units of the Last Position (ULP) could compare favourably to further
approximation of the physics involved. Caveat: The loss of accuracy will not
be the only effect - see below.

[As an aside: Truly great warriors think of themselves in the third person;
cf. Julius Ceasar's "De Bello Gallico".]

Aim of the project

The project will provide the GCC community with a classification of
rearrangements of floating point expressions. Based on the classification,
recommendations will be made on how to offer the users the possibility to
instruct the compiler to perform rearrangements from a particular class. The
classification will be based on the following criteria (courtesy of Robert
Dewar):

   * The transformation is well-understood.
   * It is definitely an optimization.
   * All of its numerical effects are well-documented (with an emphasis on
     the "special effects").

(actually, Robert wrote: "does not introduce surprises" as the last
criterion, but it's more useful to actually list the "special effects",
i.e., anything that's not simply a loss of accuracy).

Preliminaries

Obviously, it is useless to talk about the ill effects of rearranging
floating point expressions without having a solid reference. To simplify the
analysis below, this project confines itself to the targets that support the
IEEE-754 Standard, using the standard rounding mode for reference.

Another limitation we allow ourselves is to only treat rearrangements of
expressions using +, -, * and /. All other changes do not belong to the
domain of the compiler proper.

Unfortunately, at present GCC doesn't guarantee IEEE-754 conformance on all
of these targets by default. A well-known exception is the ix86; the
following summary of the defects is courtesy of Brad Lucier:

   * All temporaries generated for a single expression [should] always [be]
     maintained in extended precision, even when spilled to the stack
   * Each assignment to a variable [should be] stored to memory. (And, if
     the value of that variable is used later by dereferencing its lvalue,
     the value is loaded from memory and the temporary that was stored to
     memory is not re-used.)

where the [should be]'s indicate how it isn't, at present.

Language requirements

GCC presently supports five languages: C, C++, Objective C, Java and
Fortran. Of these, Fortran has the "loosest" requirements on floating point
operations (basically, one could say that floating point accuracy in Fortran
is a "quality of implementation" issue), while Java has the most
restrictive, because it requires implementations to supply the same answers
on all targets (this is definitely not a goal of the IEEE-754 Standard). It
is understood that users who will apply the outcome of this project do know
the extent to which they are violating the respective language standard. We
might consider to issue appropriate warning messages.

Classification

Rearrangements which might change -0.0 to +0.0 or vice versa

Example: -A + B -> B - A. Savings: One negation and one temporary (register
pressure).

Rearrangements whose only effect is for a small subset of all inputs

Rationale: Users might know the computational effects for those inputs.

Example: Force underflow to zero. Savings may be large when denormal
computation has to be emulated in the kernel. Special effects: Do not divide
by underflowed numbers.

Rearrangements whose only effect is a loss of accuracy

Rationale: Users might be able to bound the effect of this rearrangement.

Example: A*A*...*A -> different order of evaluation (compare a*a*a*a with
t=a*a; t=t*t). Savings: Potentially many multiplies, at the cost of some
temporaries.

Rearrangements whose effect is a loss of accuracy on a large subset of the
inputs and a complete loss on a small subset of the inputs

Rationale: Users might know that their computations always fall in the
subset and be able to bound the effect of this rearrangement.

Example: A*B + A*C -> A*(B+C). Will overflow for a small number of choices
for B and C for which the original didn't overflow. Savings: One multiply
and one temporary (register pressure).

Example: B/A + C/A -> (B+C)/A. Will overflow for a small number of choices
for B and C for which the original didn't overflow. Savings: One divide and
one temporary (register pressure).

Example: A/B -> A*(1/B). Will overflow if B is a denormal, whereas the
original might not. Savings: One divide changed to a multiply - might be
large in case B is a loop invariant.

Rearrangements whose effect is a loss of accuracy on half of the inputs and
a complete loss on the other half of the inputs

Rationale: Users might know that their computations always fall in the
subset and be able to bound the effect of this rearrangement.

I thought A/B/C -> A/(B*C) fell into this class, but am not sure anymore -
does anyone have a good analysis what happens with this change ? Thanks.

Open issues

We should come up with a classification for complex arithmetic too. Just A/B
with A and B complex already has a couple of different possible evaluations.

Recommendations

None yet.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]