This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH,spu]: generate inline code for divdf3

From: <trevor_smigiel at playstation dot sony dot com>
To: Sa Liu <SALIU at de dot ibm dot com>, Ulrich Weigand <uweigand at de dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org, Andrew_Pinski at playstation dot sony dot com, russell_olsen at playstation dot sony dot com
Date: Mon, 16 Jun 2008 14:58:55 -0700
Subject: Re: [PATCH,spu]: generate inline code for divdf3
References: <OFC4A8054B.F395E4FB-ONC12573B1.0056021D-C12573B1.005747B0@de.ibm.com> <20071217195630.GR3656@playstation.sony.com>

Ulrich, Sa,

I've attached an alternate implementation for divdf3 (and divv2df3).
This implementation properly handles Inf, zero, NaN and exponents that
are out of the range [-126..128], except for 1023 and 1024. It doesn't
do anything special for IEEE exceptions (DBZ, etc.)

The current implementation uses frds which doesn't deal well with
doubles that don't fit in a single precision float.  So, I'm inclined to
replace it with a call to the attached out-of-line version.  Do you have
any objections to that?

Also, what do you think about making the attached version the default
even without -ffinite-math-only?   It is much faster and smaller than
the default provided by libgcc.

Trevor

* Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 10:56]:
> OK.
> 
> Trevor
> 
> * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > Similar to the int-to-double conversion patch 
> > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is 
> > about to genetate inline code for double division. The implementation 
> > doesn't handle INF or NAN, therefore it only applies when 
> > -ffinite-math-only is given.
> > 
> > No regression found in gcc test suites. OK for mainline?
> > 
> > Thanks!
> > Sa

#include <spu_intrinsics.h>

qword
__divv2df3_fast (qword x, qword y)
{
  qword y_f;
  qword sign, exp, mant;
  qword sign_mask, exp_mask;
  qword inverse, two;
  qword is_inf, is_zero, m_is_zero;

  two = si_from_double (2.0);
  sign_mask = (qword){0x80, 0, 0, 0, 0, 0, 0, 0, 0x80, 0, 0, 0, 0, 0, 0, 0};
  exp_mask = (qword){0x7f, 0xf0, 0, 0, 0, 0, 0, 0, 0x7f, 0xf0, 0, 0, 0, 0, 0, 0};

  exp = si_and (y, exp_mask);
  sign = si_and (y, sign_mask);

  /* Test for zero and inf */
  m_is_zero = si_ceqi (si_andc (si_andc (y, exp_mask), sign_mask), 0);
  is_inf = si_and (si_ceq (exp, si_ilhu (0x7ff0)), m_is_zero);
  is_inf = si_xswd (si_and (si_rotqbyi (is_inf, -4), m_is_zero));
  is_zero = si_xswd (si_rotqbyi (si_ceqi (exp, 0), -4));

  /* Compute the inverse of the exponent */
  exp = si_sf (exp, si_ilhu (0x7fd0));
  exp = si_selb (si_il (0), exp, si_cgti (exp, -1));

  /* The only part we want from frest/fi is the mantissa.  We use bit
   * manipulation to get 23 bits of mantissa from y, and set the sign
   * and exponent to 0x3f8. */
  y_f = si_selb (si_shlqbii (y, 3), si_ilhu (0x3f80), si_ilhu (0xff80));
  mant = si_fesd (si_fi (y_f, si_frest (y_f)));

  /* Merge the exponent and mantissa to create a double */
  inverse = si_selb (mant, exp, exp_mask);

  /* Three iterations of x = x * (2.0 - y * x) */
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_selb (inverse, exp_mask, is_zero);
  inverse = si_selb (inverse, si_il (0), is_inf);
  return si_dfm (si_xor (x, sign), inverse);
}

Follow-Ups:
- Re: [PATCH,spu]: generate inline code for divdf3
  - From: Sa Liu
- Re: [PATCH,spu]: generate inline code for divdf3
  - From: Ulrich Weigand

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]