This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,spu]: generate inline code for divdf3


Ulrich, Sa,

I've attached an alternate implementation for divdf3 (and divv2df3).
This implementation properly handles Inf, zero, NaN and exponents that
are out of the range [-126..128], except for 1023 and 1024. It doesn't
do anything special for IEEE exceptions (DBZ, etc.)

The current implementation uses frds which doesn't deal well with
doubles that don't fit in a single precision float.  So, I'm inclined to
replace it with a call to the attached out-of-line version.  Do you have
any objections to that?

Also, what do you think about making the attached version the default
even without -ffinite-math-only?   It is much faster and smaller than
the default provided by libgcc.

Trevor

* Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 10:56]:
> OK.
> 
> Trevor
> 
> * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > Similar to the int-to-double conversion patch 
> > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is 
> > about to genetate inline code for double division. The implementation 
> > doesn't handle INF or NAN, therefore it only applies when 
> > -ffinite-math-only is given.
> > 
> > No regression found in gcc test suites. OK for mainline?
> > 
> > Thanks!
> > Sa
#include <spu_intrinsics.h>

qword
__divv2df3_fast (qword x, qword y)
{
  qword y_f;
  qword sign, exp, mant;
  qword sign_mask, exp_mask;
  qword inverse, two;
  qword is_inf, is_zero, m_is_zero;

  two = si_from_double (2.0);
  sign_mask = (qword){0x80, 0, 0, 0, 0, 0, 0, 0, 0x80, 0, 0, 0, 0, 0, 0, 0};
  exp_mask = (qword){0x7f, 0xf0, 0, 0, 0, 0, 0, 0, 0x7f, 0xf0, 0, 0, 0, 0, 0, 0};

  exp = si_and (y, exp_mask);
  sign = si_and (y, sign_mask);

  /* Test for zero and inf */
  m_is_zero = si_ceqi (si_andc (si_andc (y, exp_mask), sign_mask), 0);
  is_inf = si_and (si_ceq (exp, si_ilhu (0x7ff0)), m_is_zero);
  is_inf = si_xswd (si_and (si_rotqbyi (is_inf, -4), m_is_zero));
  is_zero = si_xswd (si_rotqbyi (si_ceqi (exp, 0), -4));

  /* Compute the inverse of the exponent */
  exp = si_sf (exp, si_ilhu (0x7fd0));
  exp = si_selb (si_il (0), exp, si_cgti (exp, -1));

  /* The only part we want from frest/fi is the mantissa.  We use bit
   * manipulation to get 23 bits of mantissa from y, and set the sign
   * and exponent to 0x3f8. */
  y_f = si_selb (si_shlqbii (y, 3), si_ilhu (0x3f80), si_ilhu (0xff80));
  mant = si_fesd (si_fi (y_f, si_frest (y_f)));

  /* Merge the exponent and mantissa to create a double */
  inverse = si_selb (mant, exp, exp_mask);

  /* Three iterations of x = x * (2.0 - y * x) */
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_selb (inverse, exp_mask, is_zero);
  inverse = si_selb (inverse, si_il (0), is_inf);
  return si_dfm (si_xor (x, sign), inverse);
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]