[RFC]: patch to calculate Nth roots at compile-time for pow(x,1/n)

Kaveh R. Ghazi ghazi@caip.rutgers.edu
Sun Sep 14 03:10:00 GMT 2003


I've written a patch to GCC so that it can calculate Nth roots at
compile-time using Newton's iterative method for finding the solution.
I expect this to be useful eventually for cbrt(x), exp*(1/n) and
pow(x, 1/n) where x and n are constants and n is an integer value.
Right now I've only done builtin pow for testing purposes.

I'm not a mathematician, so I'll describe some notes and hopefully
someone more comfortable with these issues can tell me if I'm going in
the right direction. :-)

1.  I found two variations of "Newton's method", are they
    mathematically equivalent?  Do they converge equally fast?  Is
    there a better algorithm?  If so, or if not, which one of these
    should I prefer?  (They both seem to work.)

    Here's the one from GMP-4.1 for each iteration:
    http://www.gnu.org/manual/gmp-4.1/html_node/Nth-Root-Algorithm.html#Nth%20Root%20Algorithm
    assuming val is the value and n is the root, it can be stated as
    new = (val/old**(n-1) + (n-1)*old) / n

    Here's another one (I forget where I found it) but it's basically
    new = old - f(x)/f'(x).  Given we're solving "x**n - val = 0", we get:
    new = old - ((old**n) - val) / (n * old**(n-1)).

    So math experts, which is better?


2.  Both algorithms seem to be sensitive to the initial guess, in that
    it'll drastically effect how many iterations we take to converge.
    I found that I could make decent upper and lower bound guess and
    through a binary search of 16 iterations, zero in on a really good
    guess.  E.g. this guess means that the Newton's method most often
    iterates less than 10 times to converge, whereas with bad guesses
    it can take thousands or tens of thousands of iterations to
    converge.  Hopefully this approach is sound.


3.  I found that both algorithms converge for most inputs, even wacky
    large powers given the good initial guess.  However seemingly
    innocent values can cause it to go into an oscillating mode when
    old and new become almost equal (differ by Xe-47 or Xe-48) but
    then new and old start swapping values and never become identical.
    If I detect this I pick one randomly and return success.  Is this
    legitimate with fast-math?  How about without it?  (I find I'm as
    or more accurate than the system pow.)

    E.g. I get oscillations for pow(16384, 1.0/4) in one algorithm and
    pow(1.1, 1.0/311) for the other algorithm.  Both suffer from the
    problem for different inputs and pass each other's problematic
    inputs.  So neither one is clearly better in this regard.

    Another idea, how about averaging the two oscillating numbers?



Anyway, here's the preliminary patch.  Comments on the above issues
and anything else you notice about the patch are welcome.

		Thanks,
		--Kaveh


diff -rup orig/egcc-CVS20030913/gcc/builtins.c egcc-CVS20030913/gcc/builtins.c
--- orig/egcc-CVS20030913/gcc/builtins.c	2003-09-11 18:50:19.000000000 -0400
+++ egcc-CVS20030913/gcc/builtins.c	2003-09-13 22:16:55.086832000 -0400
@@ -2128,6 +2128,24 @@ expand_builtin_pow (tree exp, rtx target
 	      return expand_powi (op, mode, n);
 	    }
 	}
+
+      if (TREE_CODE (arg0) == REAL_CST
+	  && ! TREE_CONSTANT_OVERFLOW (arg0))
+        {
+	  real_arithmetic  (&c, RDIV_EXPR, &dconst1, &c);
+	  real_convert (&c, TYPE_MODE (TREE_TYPE (exp)), &c);
+	  n = real_to_integer (&c);
+	  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+	  
+	  if (real_identical (&c, &cint))
+	    {
+	      REAL_VALUE_TYPE result;
+	      
+	      if (real_rooti (&result, TYPE_MODE (TREE_TYPE (exp)),
+			      &TREE_REAL_CST (arg0), n))
+		return expand_expr (build_real (TREE_TYPE (exp), result), NULL_RTX, VOIDmode, 0);
+	    }
+	}
     }
   return expand_builtin_mathfn_2 (exp, target, NULL_RTX);
 }
@@ -6416,6 +6434,20 @@ fold_builtin (tree exp)
 		      if (flag_unsafe_math_optimizations || !inexact)
 			return build_real (type, x);
 		    }
+
+		  real_arithmetic  (&c, RDIV_EXPR, &dconst1, &c);
+		  real_convert (&c, TYPE_MODE (type), &c);
+		  n = real_to_integer (&c);
+		  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+		      
+		  if (real_identical (&c, &cint))
+		    {
+		      REAL_VALUE_TYPE result;
+	      
+		      if (real_rooti (&result, TYPE_MODE (TREE_TYPE (exp)),
+				      &TREE_REAL_CST (arg0), n))
+			return build_real (TREE_TYPE (exp), result);
+		    }
 		}
 	    }
 
diff -rup orig/egcc-CVS20030913/gcc/real.c egcc-CVS20030913/gcc/real.c
--- orig/egcc-CVS20030913/gcc/real.c	2003-09-11 17:50:32.000000000 -0400
+++ egcc-CVS20030913/gcc/real.c	2003-09-13 22:24:16.654742000 -0400
@@ -4513,6 +4513,121 @@ real_sqrt (REAL_VALUE_TYPE *r, enum mach
   return true;
 }
 
+/* Calculate R as the Nth integer root of X in mode MODE.  Returns
+   TRUE on success and FALSE on failure.  */
+
+bool
+real_rooti (REAL_VALUE_TYPE *r, enum machine_mode mode,
+	       const REAL_VALUE_TYPE *xp, HOST_WIDE_INT n)
+{
+  const unsigned max_iter = 25;
+  const bool negval = real_isneg (xp);
+  const bool negexp = (n < 0) ? (n = -n, true) : false;
+  REAL_VALUE_TYPE old2, old, new, dconstn, dconstnm1, x = *xp;
+  unsigned iter = 0;
+
+  /* When N is even, the Nth root of a negative number is imaginary.  */
+  if (negval && n % 2 == 0)
+    {
+      get_canonical_qnan (r, 0);
+      return false;
+    }
+
+  /* Infinity and NaN return themselves.  */
+  if (real_isinf (xp) || real_isnan (xp))
+    {
+      *r = *xp;
+      return false;
+    }
+  
+  /* This is N and N-1 used in the algorithm below.  */
+  REAL_VALUE_FROM_INT (dconstn, n,  0, VOIDmode);
+  REAL_VALUE_FROM_INT (dconstnm1, n-1,  0, VOIDmode);
+
+  /* Positive values converge faster given our initial guess.  */
+  if (negval)
+    x.sign = 0;
+
+  /* Initial guess for Nth root of x.
+
+     In order for the algorithm below to converge quickly, the
+     accuracy of the initial guess is crucial.  We want to find a
+     guess just above the actual solution.  The solution will be
+     between 2**(exp/n) and 2**(exp/n+2), where exp is the exponent of
+     the original value and n is the Nth root we're looking for.  Use
+     a clamped binary search to find the value we're looking for.  */
+
+  new = dconst1;
+  new.exp = x.exp/n;
+
+  {
+    REAL_VALUE_TYPE pow, low, mid, high;
+    unsigned iter = 0;
+
+    low = new;
+    high = new;
+    high.exp += 2;
+    
+    while (iter++ < 16)
+      {
+	do_add (&mid, &high, &low, 0);
+	mid.exp--;
+	real_powi (&pow, VOIDmode, &mid, n);
+	if (real_compare (LT_EXPR, &pow, &x))
+	  low = mid;
+	else
+	  high = mid;
+      }
+    new = high;
+  }
+  
+  /* Ensure the initial value of `old' matches nothing.  */
+  get_canonical_qnan (&old, 0);
+  
+  do {
+    REAL_VALUE_TYPE t1, t2;
+
+    /* Fail after max iterations.  */
+    if (iter++ > max_iter)
+      return false;
+
+    old2 = old;
+    old = new;
+#if 0
+    /* Calculate new = old - ((old**n) - x) / (n * old**(n-1)).  */
+    real_powi (&t1, VOIDmode, &old, n);
+    do_add (&t1, &t1, &x, true);
+    real_powi (&t2, VOIDmode, &old, n-1);
+    do_multiply (&t2, &dconstn, &t2);
+    do_divide (&t1, &t1, &t2);
+    do_add (&new, &old, &t1, true);
+#else
+    /* Calculate new = (x/old**(n-1) + (n-1)*old) / n.  */
+    real_powi (&t1, VOIDmode, &old, n-1);
+    do_divide (&t1, &x, &t1);
+    do_multiply (&t2, &dconstnm1, &old);
+    do_add (&t1, &t1, &t2, false);
+    do_divide (&new, &t1, &dconstn);
+#endif
+
+    do_add (&t1, &new, &old, 1);
+  } while (! real_identical (&new, &old) && !real_identical (&new, &old2));
+  
+
+  /* Now fixup for negative inputs, if any.  */
+  if (negexp)
+    do_divide (&new, &dconst1, &new);
+  if (negval)
+    new.sign = 1;
+      
+  if (mode == VOIDmode)
+    *r = new;
+  else
+    real_convert (r, mode, &new);
+
+  return true;
+}
+
 /* Calculate X raised to the integer exponent N in mode MODE and store
    the result in R.  Return true if the result may be inexact due to
    loss of precision.  The algorithm is the classic "left-to-right binary
@@ -4562,7 +4677,11 @@ real_powi (REAL_VALUE_TYPE *r, enum mach
   if (neg)
     inexact |= do_divide (&t, &dconst1, &t);
 
-  real_convert (r, mode, &t);
+  if (mode == VOIDmode)
+    *r = t;
+  else
+    real_convert (r, mode, &t);
+
   return inexact;
 }
 
diff -rup orig/egcc-CVS20030913/gcc/real.h egcc-CVS20030913/gcc/real.h
--- orig/egcc-CVS20030913/gcc/real.h	2003-09-11 17:39:22.000000000 -0400
+++ egcc-CVS20030913/gcc/real.h	2003-09-13 22:16:55.136867000 -0400
@@ -363,6 +363,10 @@ extern bool real_sqrt (REAL_VALUE_TYPE *
 extern bool real_powi (REAL_VALUE_TYPE *, enum machine_mode,
 		       const REAL_VALUE_TYPE *, HOST_WIDE_INT);
 
+/* Calculate R as the Nth integer root of X in mode MODE.  */
+extern bool real_rooti (REAL_VALUE_TYPE *, enum machine_mode,
+		       const REAL_VALUE_TYPE *, HOST_WIDE_INT);
+
 /* Standard round to integer value functions.  */
 extern void real_trunc (REAL_VALUE_TYPE *, enum machine_mode,
 			const REAL_VALUE_TYPE *);



More information about the Gcc-patches mailing list