[RFC]: patch to calculate Nth roots at compile-time for pow(x,1/n)
Kaveh R. Ghazi
ghazi@caip.rutgers.edu
Sun Sep 14 03:10:00 GMT 2003
I've written a patch to GCC so that it can calculate Nth roots at
compile-time using Newton's iterative method for finding the solution.
I expect this to be useful eventually for cbrt(x), exp*(1/n) and
pow(x, 1/n) where x and n are constants and n is an integer value.
Right now I've only done builtin pow for testing purposes.
I'm not a mathematician, so I'll describe some notes and hopefully
someone more comfortable with these issues can tell me if I'm going in
the right direction. :-)
1. I found two variations of "Newton's method", are they
mathematically equivalent? Do they converge equally fast? Is
there a better algorithm? If so, or if not, which one of these
should I prefer? (They both seem to work.)
Here's the one from GMP-4.1 for each iteration:
http://www.gnu.org/manual/gmp-4.1/html_node/Nth-Root-Algorithm.html#Nth%20Root%20Algorithm
assuming val is the value and n is the root, it can be stated as
new = (val/old**(n-1) + (n-1)*old) / n
Here's another one (I forget where I found it) but it's basically
new = old - f(x)/f'(x). Given we're solving "x**n - val = 0", we get:
new = old - ((old**n) - val) / (n * old**(n-1)).
So math experts, which is better?
2. Both algorithms seem to be sensitive to the initial guess, in that
it'll drastically effect how many iterations we take to converge.
I found that I could make decent upper and lower bound guess and
through a binary search of 16 iterations, zero in on a really good
guess. E.g. this guess means that the Newton's method most often
iterates less than 10 times to converge, whereas with bad guesses
it can take thousands or tens of thousands of iterations to
converge. Hopefully this approach is sound.
3. I found that both algorithms converge for most inputs, even wacky
large powers given the good initial guess. However seemingly
innocent values can cause it to go into an oscillating mode when
old and new become almost equal (differ by Xe-47 or Xe-48) but
then new and old start swapping values and never become identical.
If I detect this I pick one randomly and return success. Is this
legitimate with fast-math? How about without it? (I find I'm as
or more accurate than the system pow.)
E.g. I get oscillations for pow(16384, 1.0/4) in one algorithm and
pow(1.1, 1.0/311) for the other algorithm. Both suffer from the
problem for different inputs and pass each other's problematic
inputs. So neither one is clearly better in this regard.
Another idea, how about averaging the two oscillating numbers?
Anyway, here's the preliminary patch. Comments on the above issues
and anything else you notice about the patch are welcome.
Thanks,
--Kaveh
diff -rup orig/egcc-CVS20030913/gcc/builtins.c egcc-CVS20030913/gcc/builtins.c
--- orig/egcc-CVS20030913/gcc/builtins.c 2003-09-11 18:50:19.000000000 -0400
+++ egcc-CVS20030913/gcc/builtins.c 2003-09-13 22:16:55.086832000 -0400
@@ -2128,6 +2128,24 @@ expand_builtin_pow (tree exp, rtx target
return expand_powi (op, mode, n);
}
}
+
+ if (TREE_CODE (arg0) == REAL_CST
+ && ! TREE_CONSTANT_OVERFLOW (arg0))
+ {
+ real_arithmetic (&c, RDIV_EXPR, &dconst1, &c);
+ real_convert (&c, TYPE_MODE (TREE_TYPE (exp)), &c);
+ n = real_to_integer (&c);
+ real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+
+ if (real_identical (&c, &cint))
+ {
+ REAL_VALUE_TYPE result;
+
+ if (real_rooti (&result, TYPE_MODE (TREE_TYPE (exp)),
+ &TREE_REAL_CST (arg0), n))
+ return expand_expr (build_real (TREE_TYPE (exp), result), NULL_RTX, VOIDmode, 0);
+ }
+ }
}
return expand_builtin_mathfn_2 (exp, target, NULL_RTX);
}
@@ -6416,6 +6434,20 @@ fold_builtin (tree exp)
if (flag_unsafe_math_optimizations || !inexact)
return build_real (type, x);
}
+
+ real_arithmetic (&c, RDIV_EXPR, &dconst1, &c);
+ real_convert (&c, TYPE_MODE (type), &c);
+ n = real_to_integer (&c);
+ real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+
+ if (real_identical (&c, &cint))
+ {
+ REAL_VALUE_TYPE result;
+
+ if (real_rooti (&result, TYPE_MODE (TREE_TYPE (exp)),
+ &TREE_REAL_CST (arg0), n))
+ return build_real (TREE_TYPE (exp), result);
+ }
}
}
diff -rup orig/egcc-CVS20030913/gcc/real.c egcc-CVS20030913/gcc/real.c
--- orig/egcc-CVS20030913/gcc/real.c 2003-09-11 17:50:32.000000000 -0400
+++ egcc-CVS20030913/gcc/real.c 2003-09-13 22:24:16.654742000 -0400
@@ -4513,6 +4513,121 @@ real_sqrt (REAL_VALUE_TYPE *r, enum mach
return true;
}
+/* Calculate R as the Nth integer root of X in mode MODE. Returns
+ TRUE on success and FALSE on failure. */
+
+bool
+real_rooti (REAL_VALUE_TYPE *r, enum machine_mode mode,
+ const REAL_VALUE_TYPE *xp, HOST_WIDE_INT n)
+{
+ const unsigned max_iter = 25;
+ const bool negval = real_isneg (xp);
+ const bool negexp = (n < 0) ? (n = -n, true) : false;
+ REAL_VALUE_TYPE old2, old, new, dconstn, dconstnm1, x = *xp;
+ unsigned iter = 0;
+
+ /* When N is even, the Nth root of a negative number is imaginary. */
+ if (negval && n % 2 == 0)
+ {
+ get_canonical_qnan (r, 0);
+ return false;
+ }
+
+ /* Infinity and NaN return themselves. */
+ if (real_isinf (xp) || real_isnan (xp))
+ {
+ *r = *xp;
+ return false;
+ }
+
+ /* This is N and N-1 used in the algorithm below. */
+ REAL_VALUE_FROM_INT (dconstn, n, 0, VOIDmode);
+ REAL_VALUE_FROM_INT (dconstnm1, n-1, 0, VOIDmode);
+
+ /* Positive values converge faster given our initial guess. */
+ if (negval)
+ x.sign = 0;
+
+ /* Initial guess for Nth root of x.
+
+ In order for the algorithm below to converge quickly, the
+ accuracy of the initial guess is crucial. We want to find a
+ guess just above the actual solution. The solution will be
+ between 2**(exp/n) and 2**(exp/n+2), where exp is the exponent of
+ the original value and n is the Nth root we're looking for. Use
+ a clamped binary search to find the value we're looking for. */
+
+ new = dconst1;
+ new.exp = x.exp/n;
+
+ {
+ REAL_VALUE_TYPE pow, low, mid, high;
+ unsigned iter = 0;
+
+ low = new;
+ high = new;
+ high.exp += 2;
+
+ while (iter++ < 16)
+ {
+ do_add (&mid, &high, &low, 0);
+ mid.exp--;
+ real_powi (&pow, VOIDmode, &mid, n);
+ if (real_compare (LT_EXPR, &pow, &x))
+ low = mid;
+ else
+ high = mid;
+ }
+ new = high;
+ }
+
+ /* Ensure the initial value of `old' matches nothing. */
+ get_canonical_qnan (&old, 0);
+
+ do {
+ REAL_VALUE_TYPE t1, t2;
+
+ /* Fail after max iterations. */
+ if (iter++ > max_iter)
+ return false;
+
+ old2 = old;
+ old = new;
+#if 0
+ /* Calculate new = old - ((old**n) - x) / (n * old**(n-1)). */
+ real_powi (&t1, VOIDmode, &old, n);
+ do_add (&t1, &t1, &x, true);
+ real_powi (&t2, VOIDmode, &old, n-1);
+ do_multiply (&t2, &dconstn, &t2);
+ do_divide (&t1, &t1, &t2);
+ do_add (&new, &old, &t1, true);
+#else
+ /* Calculate new = (x/old**(n-1) + (n-1)*old) / n. */
+ real_powi (&t1, VOIDmode, &old, n-1);
+ do_divide (&t1, &x, &t1);
+ do_multiply (&t2, &dconstnm1, &old);
+ do_add (&t1, &t1, &t2, false);
+ do_divide (&new, &t1, &dconstn);
+#endif
+
+ do_add (&t1, &new, &old, 1);
+ } while (! real_identical (&new, &old) && !real_identical (&new, &old2));
+
+
+ /* Now fixup for negative inputs, if any. */
+ if (negexp)
+ do_divide (&new, &dconst1, &new);
+ if (negval)
+ new.sign = 1;
+
+ if (mode == VOIDmode)
+ *r = new;
+ else
+ real_convert (r, mode, &new);
+
+ return true;
+}
+
/* Calculate X raised to the integer exponent N in mode MODE and store
the result in R. Return true if the result may be inexact due to
loss of precision. The algorithm is the classic "left-to-right binary
@@ -4562,7 +4677,11 @@ real_powi (REAL_VALUE_TYPE *r, enum mach
if (neg)
inexact |= do_divide (&t, &dconst1, &t);
- real_convert (r, mode, &t);
+ if (mode == VOIDmode)
+ *r = t;
+ else
+ real_convert (r, mode, &t);
+
return inexact;
}
diff -rup orig/egcc-CVS20030913/gcc/real.h egcc-CVS20030913/gcc/real.h
--- orig/egcc-CVS20030913/gcc/real.h 2003-09-11 17:39:22.000000000 -0400
+++ egcc-CVS20030913/gcc/real.h 2003-09-13 22:16:55.136867000 -0400
@@ -363,6 +363,10 @@ extern bool real_sqrt (REAL_VALUE_TYPE *
extern bool real_powi (REAL_VALUE_TYPE *, enum machine_mode,
const REAL_VALUE_TYPE *, HOST_WIDE_INT);
+/* Calculate R as the Nth integer root of X in mode MODE. */
+extern bool real_rooti (REAL_VALUE_TYPE *, enum machine_mode,
+ const REAL_VALUE_TYPE *, HOST_WIDE_INT);
+
/* Standard round to integer value functions. */
extern void real_trunc (REAL_VALUE_TYPE *, enum machine_mode,
const REAL_VALUE_TYPE *);
More information about the Gcc-patches
mailing list