This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: What is acceptable for -ffast-math? A numerical viewpoint
- To: gcc at gcc dot gnu dot org
- Subject: Re: What is acceptable for -ffast-math? A numerical viewpoint
- From: Jonathan Thornburg <jthorn at galileo dot thp dot univie dot ac dot at>
- Date: Sat, 4 Aug 2001 12:31:59 +0200
- Cc: Jonathan Thornburg <jthorn at thp dot univie dot ac dot at>
- References: <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00009.html> <url:http://gcc.gnu.org/ml/gcc/2001-07/msg02141.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00176.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00192.html>
Another interesting example to consider here is how to implement
complex arithmetic, particularly complex division.
Suppose we're in a language where "complex" is a primitive datatype
(eg Fortran or C99). Then there are several plausible ways to implement
complex division:
a + bi a*c + b*d b*c - a*d
------ = --------- + --------- i
c + di c^2 + d^2 c^2 + d^2
(1) use the above expression with multiplication-by-the-inverse for
the common divisor:
# 3 adds/subtracts, 8 multiplies, 1 divide
temp = 1.0 / (c*c + d*d)
result.real = temp * (a*c + b*d)
result.imag = temp * (b*c - a*d)
(2) use the above expression directly
# cost = 3 adds/subtracts, 6 multiplies, 2 divides
temp = c*c + d*d
result.real = (a*c + b*d) / temp
result.imag = (b*c - a*d) / temp
(3) rescale to avoid the possibility of overflow/underflow in c^2 + d^2:
if (|c| < |d|)
then {
compute
a/d + b/d i
-----------
c/d + 1 i
}
else {
compute
a/c + b/c i
-----------
1 + d/c i
}
doing the computation using either (1) or (2) with multiplications
by 1 elided
On most modern hardware (1) is substantially faster than (2), because
fp multiplies are fast and well-pipelined, but fp divides are *very*
slow and minimally pipelined.
However, both (1) and (2) will overflow/underflow if b^2 + c^2 hits the
range limits of the fp format, i.e. if either |b| or |c| is (roughly)
beyond the square root of the overflow/underflow threshold. (3) avoids
almost (though not quite) all of the spurious overflows/underflows, at
the cost of a data-dependent conditional branch, which is a major
performance hit on (heavily-pipelined) modern hardware. (3) also has
larger object code.
[There are also a couple of other variants of (3) which offer similar
robustness against overflow/underflow, e.g. rescale by a suitable power
of 2, or rescale by max(|c|,|d|), which offer somewhat different tradeoffs
of fp operation counts vs conditional move instructions vs full-fledged
conditional branches; different hardware will favor different choices
here.]
IMHO -ffast-math constitutes a request from the user to use (1), and
a "do arithmetic as carefully as you can" mode probably constitutes a
request to use (3) or one of its variants.
--
-- Jonathan Thornburg <jthorn@thp.univie.ac.at>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam