This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: What is acceptable for -ffast-math? A numerical viewpoint

To: gcc at gcc dot gnu dot org
Subject: Re: What is acceptable for -ffast-math? A numerical viewpoint
From: Jonathan Thornburg <jthorn at galileo dot thp dot univie dot ac dot at>
Date: Sat, 4 Aug 2001 12:31:59 +0200
Cc: Jonathan Thornburg <jthorn at thp dot univie dot ac dot at>
References: <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00009.html> <url:http://gcc.gnu.org/ml/gcc/2001-07/msg02141.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00176.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00192.html>

Another interesting example to consider here is how to implement
complex arithmetic, particularly complex division.

Suppose we're in a language where "complex" is a primitive datatype
(eg Fortran or C99).  Then there are several plausible ways to implement
complex division:

   a + bi   a*c + b*d   b*c - a*d
   ------ = --------- + --------- i
   c + di   c^2 + d^2   c^2 + d^2

(1) use the above expression with multiplication-by-the-inverse for
    the common divisor:
	# 3 adds/subtracts, 8 multiplies, 1 divide
	temp = 1.0 / (c*c + d*d)
	result.real = temp * (a*c + b*d)
	result.imag = temp * (b*c - a*d)
(2) use the above expression directly
	# cost = 3 adds/subtracts, 6 multiplies, 2 divides
	temp = c*c + d*d
	result.real = (a*c + b*d) / temp
	result.imag = (b*c - a*d) / temp
(3) rescale to avoid the possibility of overflow/underflow in c^2 + d^2:
    if (|c| < |d|)
       then {
	    compute
		a/d + b/d i
		-----------
		c/d +  1  i
	    }
       else {
	    compute
		a/c + b/c i
		-----------
		 1  + d/c i
	    }
    doing the computation using either (1) or (2) with multiplications
    by 1 elided

On most modern hardware (1) is substantially faster than (2), because
fp multiplies are fast and well-pipelined, but fp divides are *very*
slow and minimally pipelined.

However, both (1) and (2) will overflow/underflow if b^2 + c^2 hits the
range limits of the fp format, i.e. if either |b| or |c| is (roughly)
beyond the square root of the overflow/underflow threshold.  (3) avoids
almost (though not quite) all of the spurious overflows/underflows, at
the cost of a data-dependent conditional branch, which is a major
performance hit on (heavily-pipelined) modern hardware.  (3) also has
larger object code.

[There are also a couple of other variants of (3) which offer similar
robustness against overflow/underflow, e.g. rescale by a suitable power
of 2, or rescale by max(|c|,|d|), which offer somewhat different tradeoffs
of fp operation counts vs conditional move instructions vs full-fledged
conditional branches; different hardware will favor different choices
here.]

IMHO -ffast-math constitutes a request from the user to use (1), and
a "do arithmetic as carefully as you can" mode probably constitutes a
request to use (3) or one of its variants.

-- 
-- Jonathan Thornburg <jthorn@thp.univie.ac.at>
   Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
   Golm, Germany             http://www.aei.mpg.de/~jthorn/home.html
   "Washing one's hands of the conflict between the powerful and the
    powerless means to side with the powerful, not to be neutral."
                                      -- quote by Freire / poster by Oxfam

Follow-Ups:
- Re: What is acceptable for -ffast-math? A numerical viewpoint
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]