This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH,spu]: generate inline code for divdf3
- From: <trevor_smigiel at playstation dot sony dot com>
- To: Sa Liu <SALIU at de dot ibm dot com>, Ulrich Weigand <uweigand at de dot ibm dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Andrew_Pinski at playstation dot sony dot com, russell_olsen at playstation dot sony dot com
- Date: Mon, 16 Jun 2008 14:58:55 -0700
- Subject: Re: [PATCH,spu]: generate inline code for divdf3
- References: <OFC4A8054B.F395E4FB-ONC12573B1.0056021D-C12573B1.005747B0@de.ibm.com> <20071217195630.GR3656@playstation.sony.com>
Ulrich, Sa,
I've attached an alternate implementation for divdf3 (and divv2df3).
This implementation properly handles Inf, zero, NaN and exponents that
are out of the range [-126..128], except for 1023 and 1024. It doesn't
do anything special for IEEE exceptions (DBZ, etc.)
The current implementation uses frds which doesn't deal well with
doubles that don't fit in a single precision float. So, I'm inclined to
replace it with a call to the attached out-of-line version. Do you have
any objections to that?
Also, what do you think about making the attached version the default
even without -ffinite-math-only? It is much faster and smaller than
the default provided by libgcc.
Trevor
* Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 10:56]:
> OK.
>
> Trevor
>
> * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > Similar to the int-to-double conversion patch
> > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is
> > about to genetate inline code for double division. The implementation
> > doesn't handle INF or NAN, therefore it only applies when
> > -ffinite-math-only is given.
> >
> > No regression found in gcc test suites. OK for mainline?
> >
> > Thanks!
> > Sa
#include <spu_intrinsics.h>
qword
__divv2df3_fast (qword x, qword y)
{
qword y_f;
qword sign, exp, mant;
qword sign_mask, exp_mask;
qword inverse, two;
qword is_inf, is_zero, m_is_zero;
two = si_from_double (2.0);
sign_mask = (qword){0x80, 0, 0, 0, 0, 0, 0, 0, 0x80, 0, 0, 0, 0, 0, 0, 0};
exp_mask = (qword){0x7f, 0xf0, 0, 0, 0, 0, 0, 0, 0x7f, 0xf0, 0, 0, 0, 0, 0, 0};
exp = si_and (y, exp_mask);
sign = si_and (y, sign_mask);
/* Test for zero and inf */
m_is_zero = si_ceqi (si_andc (si_andc (y, exp_mask), sign_mask), 0);
is_inf = si_and (si_ceq (exp, si_ilhu (0x7ff0)), m_is_zero);
is_inf = si_xswd (si_and (si_rotqbyi (is_inf, -4), m_is_zero));
is_zero = si_xswd (si_rotqbyi (si_ceqi (exp, 0), -4));
/* Compute the inverse of the exponent */
exp = si_sf (exp, si_ilhu (0x7fd0));
exp = si_selb (si_il (0), exp, si_cgti (exp, -1));
/* The only part we want from frest/fi is the mantissa. We use bit
* manipulation to get 23 bits of mantissa from y, and set the sign
* and exponent to 0x3f8. */
y_f = si_selb (si_shlqbii (y, 3), si_ilhu (0x3f80), si_ilhu (0xff80));
mant = si_fesd (si_fi (y_f, si_frest (y_f)));
/* Merge the exponent and mantissa to create a double */
inverse = si_selb (mant, exp, exp_mask);
/* Three iterations of x = x * (2.0 - y * x) */
inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
inverse = si_selb (inverse, exp_mask, is_zero);
inverse = si_selb (inverse, si_il (0), is_inf);
return si_dfm (si_xor (x, sign), inverse);
}