This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Constant fold sqrt at compile time


On Fri, 25 Oct 2002, Richard Henderson wrote:
> On Wed, Oct 16, 2002 at 11:45:50PM -0600, Roger Sayle wrote:
> > + /* Calculate the square root of X in mode MODE, and store the result
> > +    in R.  Returns true if this doesn't trap and doesn't set errno.  */
> > +
> > + bool
> > + real_sqrt (r, mode, x)
>
> I don't see how "set errno" is relevant.  If you go ahead and
> return NaN, then
>
>       /* Test the result; if it is NaN, set errno=EDOM because
>          the argument was not in the domain.  */
>       emit_cmp_and_jump_insns (target, target, EQ, 0, GET_MODE (target),
>                                0, lab1);
>
> will evaluate to true at compile time, which will arrange for
> errno to get set appropriately.

Ok.  That's good to know.  It certainly simplifies much of the
logic, if we can evaluate everything except sqrt(NaN) at compile
time.  I also suspect that I may also have been overly cautious
with "flag_trapping_math".  I don't have access to any of the
relevant standards.  Can this now be replaced with the less
restrictive "flag_signaling_nans", and evaluate everything
except "sqrt(sNaN)"?


> > +   /* Newton's iteration for reciprocal sqrt, i */
> > +   for (iter = 0; iter < 16; iter++)
> [...]
> > +   /* Final result is original value, x, multiplied by i.  */
> > +   real_arithmetic (&t, MULT_EXPR, &i, x);
>
> I know that there are iterative integer solutions that generate
> one bit at a time, just like the standard divide loop.  I wonder
> if those more accurate results?  Certainly I can see avoiding
> double-rounding issues that way...

My understanding is that with the shift to longer floating point
representations, the bit-at-time iterative algorithms become far
less competative than the quadratic convergence of Newton methods.
Indeed most hardware FP division implementations now use Newtonian
methods, which explains why fsqrt and fdiv take the same number
of cycles on pentiumpro, pentium4 and k6.  Not really an issue for
compile-time evaluation, but the 160 bits used in real.c would be
a strong case for the selected implementation.

I know that Brad Lucier has ideas on how the accuracy can be
improved.  My understanding is that the current algorithm (or
together with a minor tweak) is accurate to 2 ulps, but always
under-estimates the result.  The HP technical report by Markstein
recommends zeroing the last few bits of the mantissa, and then
filling them in with a bit-at-a-time iterative algorithm (like
the one you suggest).

My proposal is more efficient but requires the function:

void real_inc_ulps(REAL_VALUE_TYPE *r, enum machine_mode mode)

which will increment the value "r" by one ulps.  This can then
be used, after we've rounded our result to mode MODE, to search
for the precise result.  Given that we know we are accurate to
2 ulps, we'll need at most four increments to reach a value
that when squared produces a result larger than the input value.

To round towards zero, this method requires a maximum of four
increments, multiplications and comparisons.  To implement
round to nearest requires an extra subtraction and comparison.
With luck, however, I'd expect we'd typically need less than
four iterations.


As I mentioned originally, real_sqrt is implemented using the
public routines available in real.h, the need for a real_inc_ulps
to get perfect 0.5 ulps results requires additional support from
the floating point library.

My hope was that "flag_unsafe_math_optimizations" provided a
suitable safety net to get this patch into CVS, and once there
it would provide a framework for further accuracy improvements.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]