This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

*To*: gcc at gcc dot gnu dot org*Subject*: Re: What is acceptable for -ffast-math? A numerical viewpoint*From*: Jonathan Thornburg <jthorn at galileo dot thp dot univie dot ac dot at>*Date*: Sat, 4 Aug 2001 12:31:59 +0200*Cc*: Jonathan Thornburg <jthorn at thp dot univie dot ac dot at>*References*: <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00009.html> <url:http://gcc.gnu.org/ml/gcc/2001-07/msg02141.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00176.html> <url:http://gcc.gnu.org/ml/gcc/2001-08/msg00192.html>

Another interesting example to consider here is how to implement complex arithmetic, particularly complex division. Suppose we're in a language where "complex" is a primitive datatype (eg Fortran or C99). Then there are several plausible ways to implement complex division: a + bi a*c + b*d b*c - a*d ------ = --------- + --------- i c + di c^2 + d^2 c^2 + d^2 (1) use the above expression with multiplication-by-the-inverse for the common divisor: # 3 adds/subtracts, 8 multiplies, 1 divide temp = 1.0 / (c*c + d*d) result.real = temp * (a*c + b*d) result.imag = temp * (b*c - a*d) (2) use the above expression directly # cost = 3 adds/subtracts, 6 multiplies, 2 divides temp = c*c + d*d result.real = (a*c + b*d) / temp result.imag = (b*c - a*d) / temp (3) rescale to avoid the possibility of overflow/underflow in c^2 + d^2: if (|c| < |d|) then { compute a/d + b/d i ----------- c/d + 1 i } else { compute a/c + b/c i ----------- 1 + d/c i } doing the computation using either (1) or (2) with multiplications by 1 elided On most modern hardware (1) is substantially faster than (2), because fp multiplies are fast and well-pipelined, but fp divides are *very* slow and minimally pipelined. However, both (1) and (2) will overflow/underflow if b^2 + c^2 hits the range limits of the fp format, i.e. if either |b| or |c| is (roughly) beyond the square root of the overflow/underflow threshold. (3) avoids almost (though not quite) all of the spurious overflows/underflows, at the cost of a data-dependent conditional branch, which is a major performance hit on (heavily-pipelined) modern hardware. (3) also has larger object code. [There are also a couple of other variants of (3) which offer similar robustness against overflow/underflow, e.g. rescale by a suitable power of 2, or rescale by max(|c|,|d|), which offer somewhat different tradeoffs of fp operation counts vs conditional move instructions vs full-fledged conditional branches; different hardware will favor different choices here.] IMHO -ffast-math constitutes a request from the user to use (1), and a "do arithmetic as carefully as you can" mode probably constitutes a request to use (3) or one of its variants. -- -- Jonathan Thornburg <jthorn@thp.univie.ac.at> Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut), Golm, Germany http://www.aei.mpg.de/~jthorn/home.html "Washing one's hands of the conflict between the powerful and the powerless means to side with the powerful, not to be neutral." -- quote by Freire / poster by Oxfam

**Follow-Ups**:**Re: What is acceptable for -ffast-math? A numerical viewpoint***From:*Joseph S. Myers

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |