This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: What is acceptable for -ffast-math? A numerical viewpoint

• To: gcc at gcc dot gnu dot org
• Subject: Re: What is acceptable for -ffast-math? A numerical viewpoint
• From: Jonathan Thornburg <jthorn at galileo dot thp dot univie dot ac dot at>
• Date: Sat, 4 Aug 2001 12:31:59 +0200
• Cc: Jonathan Thornburg <jthorn at thp dot univie dot ac dot at>
• References: <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00009.html> <url:http://gcc.gnu.org/ml/gcc/2001-07/msg02141.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00176.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00192.html>

```Another interesting example to consider here is how to implement
complex arithmetic, particularly complex division.

Suppose we're in a language where "complex" is a primitive datatype
(eg Fortran or C99).  Then there are several plausible ways to implement
complex division:

a + bi   a*c + b*d   b*c - a*d
------ = --------- + --------- i
c + di   c^2 + d^2   c^2 + d^2

(1) use the above expression with multiplication-by-the-inverse for
the common divisor:
# 3 adds/subtracts, 8 multiplies, 1 divide
temp = 1.0 / (c*c + d*d)
result.real = temp * (a*c + b*d)
result.imag = temp * (b*c - a*d)
(2) use the above expression directly
# cost = 3 adds/subtracts, 6 multiplies, 2 divides
temp = c*c + d*d
result.real = (a*c + b*d) / temp
result.imag = (b*c - a*d) / temp
(3) rescale to avoid the possibility of overflow/underflow in c^2 + d^2:
if (|c| < |d|)
then {
compute
a/d + b/d i
-----------
c/d +  1  i
}
else {
compute
a/c + b/c i
-----------
1  + d/c i
}
doing the computation using either (1) or (2) with multiplications
by 1 elided

On most modern hardware (1) is substantially faster than (2), because
fp multiplies are fast and well-pipelined, but fp divides are *very*
slow and minimally pipelined.

However, both (1) and (2) will overflow/underflow if b^2 + c^2 hits the
range limits of the fp format, i.e. if either |b| or |c| is (roughly)
beyond the square root of the overflow/underflow threshold.  (3) avoids
almost (though not quite) all of the spurious overflows/underflows, at
the cost of a data-dependent conditional branch, which is a major
performance hit on (heavily-pipelined) modern hardware.  (3) also has
larger object code.

[There are also a couple of other variants of (3) which offer similar
robustness against overflow/underflow, e.g. rescale by a suitable power
of 2, or rescale by max(|c|,|d|), which offer somewhat different tradeoffs
of fp operation counts vs conditional move instructions vs full-fledged
conditional branches; different hardware will favor different choices
here.]

IMHO -ffast-math constitutes a request from the user to use (1), and
a "do arithmetic as carefully as you can" mode probably constitutes a
request to use (3) or one of its variants.

--
-- Jonathan Thornburg <jthorn@thp.univie.ac.at>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany             http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam

```

Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next]