This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] RFC for patch to add C99 fma/fmaf/fmal builtins


On Tue, Oct 05, 2010 at 10:49:59PM -0400, Geert Bosch wrote:
> On Oct 4, 2010, at 19:39, Joseph S. Myers wrote:
> >> I seem to recall that double-rounding is ok as long as the intermediate
> >> step contains > 2N bits.  Which implies that for IEEE single (23 bits)
> >> we can generically implement it with IEEE double (53 bits > 46):
> >> 
> >>  result = (float)((double)op0 * (double)op1) + (double)op2);
> > 
> > No, that's not safe.  Say (double)op0 * (double)op1 has a value exactly 
> > half way between two representable float values, and (double)op2 is much 
> > smaller in magnitude, so that in round-to-nearest the above becomes 
> > (float)((double)op0 * (double)op1) - but the correct float result depends 
> > on the sign of the tiny op2.
> 
> An algorithm for the correctly rounded FMA using just regular floating
> point instructions has been published  in the IEEE TRANSACTIONS ON 
> COMPUTERS, VOL. 57, NO. 4, APRIL 2008. There is no real short-cut 
> for the correctly rounded sum of three floating point numbers.
> I think it might be possible to devise a slightly simpler
> algorithm for single precision, but as you illustrated, it doesn't
> become trivial.

Looking at that http://www.lri.fr/~melquion/doc/08-tc.pdf,
I believe for float we could though:

#include <fenv.h>
#include <ieee754.h>

float
fmaf (float op0, float op1, float op2)
{
  fenv_t env;
  /* Multiplication is always exact.  */
  double temp = (double) op0 * (double) op1;
  union ieee754_double u;
  feholdexcept (&env);
  fesetround (FE_TOWARDZERO);
  /* Perform addition with round to odd.  */
  u.d = temp + (double) op2;
  /* XXX check for NaNs and Infs?  */
  if ((u.ieee.mantissa1 & 1) == 0)
    u.ieee.mantissa1 |= fetestexcept (FE_INEXACT) != 0;
  fesetenv (&env);
  /* And finally truncation with round to nearest.  */
  return (float) u.d;
}

(and similarly for fma when long double is IEEE quad).  For fma
if long double is less precise than that or for fmal we'd need
to use the slower exact addition (either Dekker's or Knuth's) and
multiplication (Dekker's).

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]