This is the mail archive of the
mailing list for the GCC project.
IEEE 754 and fused-madd (was: Re: patch: rs6000 specific)
- From: lucier at math dot purdue dot edu
- To: gcc-patches at gcc dot gnu dot org
- Cc: lucier at math dot purdue dot edu
- Date: Thu, 6 Dec 2001 15:46:39 -0500 (EST)
- Subject: IEEE 754 and fused-madd (was: Re: patch: rs6000 specific)
I've just finished going over this month's IEEE floating-point discussion
and the IEEE 754 standard.
First, -ffast-math can and does do many horrible, ugly, transformations
to fp arithmetic, and it would be a shame if all this baggage is invoked
when someone wants to have fused multiply-add. I wouldn't mind so much
if there were an fma intrinsic (or is there already one?), but having
to use -ffast-math to get fma (which is the effect if -ffused-madd
implies the fast-math transformations) adds too much uncertainty to
There is actually much flexibility in the IEEE spec, and I would
argue that fma actually adheres to the spec. Here's why.
First, what the PowerPC fma instruction does can be modelled as follows.
First, it multiplies two single- or double-precision numbers and it
put the result into an (internal) register (a "destination" in IEEE 754
terminology) that has the following characteristics:
Exponent bias: unspecified
Exponent width in bits: infinite
Format width in bits: infinite
What I mean here by infinite is that the destination register has
enough precision and exponent range to represent any product of
single- or double-precision fp arguments. So one could make up some
numbers, but I don't want to work at that.
At any rate, this internal register has a valid IEEE format that
satisfies the requirements of double extended format, which are:
precision: >= 64
Emax: >= 16383
Emin: <= -16382
Exponent bias: unspecified
Exponent width in bits: >= 15
Format width in bits: >= 79
Because the format is so big, all results in this internal destination
register are computed exactly, so with one rounding, by definition.
It then adds to (or subtracts from, or whatever) this internal register
a single (double) precision register, and gives a single (double)
precision result. It does this in a single rounding.
In both cases, the result is rounded to the precision of its destination,
first, to the precision of the internal register, then to the precision
of the single (or double) result register.
OK, the IEEE 754 spec says:
4.3 Rounding Precision. Normally a result is rounded to the precision
of its destination. However, some systems deliver results only to
double or extended destinations. On such a system the user, which may
be a high-level language compiler, shall be able to specify that a
result be rounded instead to single precision, though it may be stored
in the double or extended format with its wider exponent range.
Similary, a system that delivers results only to double extended
destinations shall permit the user to specify rounding to single or
double precision. Note that to meet the specifications in 4.1, the
result cannot suffer more than one rounding error.
Footnote : Control of rounding precision is intended to allow
systems whose destinations are always double or extended to mimic,
in the absence of over/underflow, the precisions of systems with
single and double destinations. An implementation should not
provide operations that combine double or extended operands to
produce a single result, nor operations that combine double
extended operands to produce a double result, with only one
Now, to meet the requirement that "On such a system ... the user ... shall
be able to specify that a result be rounded instead to single precision,
though it may be stored in the double or extended format with its wider
exponent range" is possible in gcc by -fno-fused-madd directive, which
simply disallows the use of this special internal register.
As for the part of the footnote that says "An implementation should not
provide operations ... that combine double extended operands to produce
a double result, with only one rounding", this is only a "should" so
even though the fma instruction combines an extended precision value
and a double precision value to produce a double precision value with
only one rounding, it doesn't violate the standard. Secondly, one
can argue that what fma does in this respect is OK, since one of its
arguments is a double.
So I think that fm[as], as implemented in the PowerPC, does not violate
IEEE 754 arithmetic. And using it should not invoke all the other stuff
that -ffast-math implies.