This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] RTL expansion of exponentiation by integer constant


The following patch implements support for inlining code to calculate
pow(x,n) [or x**n] where n is known to be an integer constant at compile
time.  This is currently done only with -ffast-math by expanding the pow
builtin into a sequence of floating point multiplications.  For example,
pow(x,6) can be calculated as x2 = x*x; x3 = x2*x; x6 = x3*x3.

As always this expansion is guarded by flag_unsafe_math_optimizations
as on machines without additional bits there may be rounding issues.  On
x86, which has extended precision, a very similar algorithm is used in
the implementation of glibc's pow function.  Similarly, the fortran
front-end also has similar code to implement X**N.  Hopefully, g77
may be able to switch to this code eventually.


The hybrid "window method"/look-up table algorithm significantly
outperforms the more traditional binary method (used in g77, glibc
and gmp).  For example, x**15 requires six multiplications with
the binary method, but can be done with only five.  Indeed, the use
of the "power tree" look-up tables means that we generate optimal
code for all powers less than 256.

Finally, a target macro is provided for platforms that may wish
to limit the number of multiplications that will be inlined, for
example, when evaluating pow(x,INT_MAX).  However, in practice,
exponents in real code are almost always "small", with even the
pathological cases requiring atmost 64 multiplications.  And on
x86, the code we generate always requires fewer mutiplications
that are required by the system library.  But the real reason I
recommend against setting POWI_MAX_MULTS, is that I'm working on
some additional optimizations that will be disabled if the target
provides a value for this macro.


The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all languages except treelang, and regression tested
with a top-level "make -k check" with no new failures.

Ok for mainline?


2003-06-20  Roger Sayle  <roger@eyesopen.com>

	* builtins.c (expand_builtin): Use expand_builtin_pow to expand
	calls for pow, powf, powl and their __builtin_ variants.
	(expand_builtin_pow): If the second argument is a constant
	integer and compiling with -ffast-math, use expand_powi to
	generate RTL if powi_cost is less than POWI_MAX_MULTS.
	(powi_cost): New function to return the number of multiplications
	necessary to evaluate an Nth power, for integer constant N.
	(expand_powi): New function to expand the RTL for evaluating
	the Nth power of a floating point value, for integer constant N.

	* doc/tm.texi (POWI_MAX_MULTS): Document new target macro.

	* gcc.dg/builtins-24.c: New test case.


Index: builtins.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/builtins.c,v
retrieving revision 1.217
diff -c -3 -p -r1.217 builtins.c
*** builtins.c	19 Jun 2003 19:30:34 -0000	1.217
--- builtins.c	20 Jun 2003 15:02:15 -0000
*************** expand_builtin_mathfn_2 (tree exp, rtx t
*** 1936,1941 ****
--- 1936,2187 ----
    return target;
  }

+ /* To evaluate powi(x,n), the floating point value x raised to the
+    constant integer exponent n, we use a hybrid algorithm that
+    combines the "window method" with look-up tables.  For an
+    introduction to exponentiation algorithms and "addition chains",
+    see section 4.6.3, "Evaluation of Powers" of Donald E. Knuth,
+    "Seminumerical Algorithms", Vol. 2, "The Art of Computer Programming",
+    3rd Edition, 1998, and Daniel M. Gordon, "A Survey of Fast Exponentiation
+    Methods", Journal of Algorithms, Vol. 27, pp. 129-146, 1998.  */
+
+ /* Provide a default value for POWI_MAX_MULTS, the maximum number of
+    multiplications to inline before calling the system library's pow
+    function.  powi(x,n) requires at worst 2*bits(n)-2 multiplications,
+    so this default never requires calling pow, powf or powl.  */
+
+ #ifndef POWI_MAX_MULTS
+ #define POWI_MAX_MULTS  (2*HOST_BITS_PER_WIDE_INT-2)
+ #endif
+
+ /* The size of the "optimal power tree" lookup table.  All
+    exponents less than this value are simply looked up in the
+    powi_table below.  This threshold is also used to size the
+    cache of pseudo registers that hold intermediate results.  */
+ #define POWI_TABLE_SIZE 256
+
+ /* The size, in bits of the window, used in the "window method"
+    exponentiation algorithm.  This is equivalent to a radix of
+    (1<<POWI_WINDOW_SIZE) in the corresponding "m-ary method".  */
+ #define POWI_WINDOW_SIZE 3
+
+ /* The following table is an efficient representation of an
+    "optimal power tree".  For each value, i, the corresponding
+    value, j, in the table states than an optimal evaluation
+    sequence for calculating pow(x,i) can be found by evaluating
+    pow(x,j)*pow(x,i-j).  An optimal power tree for the first
+    100 integers is given in Knuth's "Seminumerical algorithms".  */
+
+ static const unsigned char powi_table[POWI_TABLE_SIZE] =
+   {
+       0,   1,   1,   2,   2,   3,   3,   4,  /*   0 -   7 */
+       4,   6,   5,   6,   6,  10,   7,   9,  /*   8 -  15 */
+       8,  16,   9,  16,  10,  12,  11,  13,  /*  16 -  23 */
+      12,  17,  13,  18,  14,  24,  15,  26,  /*  24 -  31 */
+      16,  17,  17,  19,  18,  33,  19,  26,  /*  32 -  39 */
+      20,  25,  21,  40,  22,  27,  23,  44,  /*  40 -  47 */
+      24,  32,  25,  34,  26,  29,  27,  44,  /*  48 -  55 */
+      28,  31,  29,  34,  30,  60,  31,  36,  /*  56 -  63 */
+      32,  64,  33,  34,  34,  46,  35,  37,  /*  64 -  71 */
+      36,  65,  37,  50,  38,  48,  39,  69,  /*  72 -  79 */
+      40,  49,  41,  43,  42,  51,  43,  58,  /*  80 -  87 */
+      44,  64,  45,  47,  46,  59,  47,  76,  /*  88 -  95 */
+      48,  65,  49,  66,  50,  67,  51,  66,  /*  96 - 103 */
+      52,  70,  53,  74,  54, 104,  55,  74,  /* 104 - 111 */
+      56,  64,  57,  69,  58,  78,  59,  68,  /* 112 - 119 */
+      60,  61,  61,  80,  62,  75,  63,  68,  /* 120 - 127 */
+      64,  65,  65, 128,  66, 129,  67,  90,  /* 128 - 135 */
+      68,  73,  69, 131,  70,  94,  71,  88,  /* 136 - 143 */
+      72, 128,  73,  98,  74, 132,  75, 121,  /* 144 - 151 */
+      76, 102,  77, 124,  78, 132,  79, 106,  /* 152 - 159 */
+      80,  97,  81, 160,  82,  99,  83, 134,  /* 160 - 167 */
+      84,  86,  85,  95,  86, 160,  87, 100,  /* 168 - 175 */
+      88, 113,  89,  98,  90, 107,  91, 122,  /* 176 - 183 */
+      92, 111,  93, 102,  94, 126,  95, 150,  /* 184 - 191 */
+      96, 128,  97, 130,  98, 133,  99, 195,  /* 192 - 199 */
+     100, 128, 101, 123, 102, 164, 103, 138,  /* 200 - 207 */
+     104, 145, 105, 146, 106, 109, 107, 149,  /* 208 - 215 */
+     108, 200, 109, 146, 110, 170, 111, 157,  /* 216 - 223 */
+     112, 128, 113, 130, 114, 182, 115, 132,  /* 224 - 231 */
+     116, 200, 117, 132, 118, 158, 119, 206,  /* 232 - 239 */
+     120, 240, 121, 162, 122, 147, 123, 152,  /* 240 - 247 */
+     124, 166, 125, 214, 126, 138, 127, 153,  /* 248 - 255 */
+   };
+
+
+ /* Return the number of multiplications required to calculate
+    powi(x,n) where n is less than POWI_TABLE_SIZE.  This is a
+    subroutine of powi_cost.  CACHE is an array indicating
+    which exponents have already been calculated.  */
+
+ static int
+ powi_lookup_cost (unsigned HOST_WIDE_INT n, char *cache)
+ {
+   /* If we've already calculated this exponent, then this evaluation
+      doesn't require any additional multiplications.  */
+   if (cache[n])
+     return 0;
+
+   cache[n] = 1;
+   return powi_lookup_cost (n - powi_table[n], cache)
+ 	 + powi_lookup_cost (powi_table[n], cache) + 1;
+ }
+
+ /* Return the number of multiplications required to calculate
+    powi(x,n) for an arbitrary x, given the exponent N.  This
+    function needs to be kept in sync with expand_powi below.  */
+
+ static int
+ powi_cost (HOST_WIDE_INT n)
+ {
+   char cache[POWI_TABLE_SIZE];
+   unsigned HOST_WIDE_INT digit;
+   unsigned HOST_WIDE_INT val;
+   int result;
+
+   if (n == 0)
+     return 0;
+
+   /* Ignore the reciprocal when calculating the cost.  */
+   val = (n < 0) ? -n : n;
+
+   /* Initialize the exponent cache.  */
+   memset ((char*) cache, 0, POWI_TABLE_SIZE);
+   cache[1] = 1;
+
+   result = 0;
+
+   while (val >= POWI_TABLE_SIZE)
+     {
+       if (val & 1)
+ 	{
+ 	  digit = val & ((1 << POWI_WINDOW_SIZE) - 1);
+ 	  result += powi_lookup_cost (digit, cache)
+ 		    + POWI_WINDOW_SIZE + 1;
+ 	  val >>= POWI_WINDOW_SIZE;
+ 	}
+       else
+ 	{
+ 	  val >>= 1;
+ 	  result++;
+ 	}
+     }
+
+   return result + powi_lookup_cost (n, cache);
+ }
+
+ /* Recursive subroutine of expand_powi.  This function takes the array,
+    CACHE, of already calculated exponents and an exponent N and returns
+    an RTX that corresponds to CACHE[1]**N, as calculated in mode MODE.  */
+
+ static rtx
+ expand_powi_1 (enum machine_mode mode, unsigned HOST_WIDE_INT n, rtx *cache)
+ {
+   unsigned HOST_WIDE_INT digit;
+   rtx target, result;
+   rtx op0, op1;
+
+   if (n < POWI_TABLE_SIZE)
+     {
+       if (cache[n])
+         return cache[n];
+
+       target = gen_reg_rtx (mode);
+       cache[n] = target;
+
+       op0 = expand_powi_1 (mode, n - powi_table[n], cache);
+       op1 = expand_powi_1 (mode, powi_table[n], cache);
+     }
+   else if (n & 1)
+     {
+       target = gen_reg_rtx (mode);
+       digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
+       op0 = expand_powi_1 (mode, n - digit, cache);
+       op1 = expand_powi_1 (mode, digit, cache);
+     }
+   else
+     {
+       target = gen_reg_rtx (mode);
+       op0 = expand_powi_1 (mode, n >> 1, cache);
+       op1 = op0;
+     }
+
+   result = expand_mult (mode, op0, op1, target, 0);
+   if (result != target)
+     emit_move_insn (target, result);
+   return target;
+ }
+
+ /* Expand the RTL to evaluate powi(x,n) in mode MODE.  X is the
+    floating point operand in mode MODE, and N is the exponent.  This
+    function needs to be kept in sync with powi_cost above.  */
+
+ static rtx
+ expand_powi (rtx x, enum machine_mode mode, HOST_WIDE_INT n)
+ {
+   unsigned HOST_WIDE_INT val;
+   rtx cache[POWI_TABLE_SIZE];
+   rtx result;
+
+   if (n == 0)
+     return CONST1_RTX (mode);
+
+   val = (n < 0) ? -n : n;
+
+   memset ((char*) cache, 0, sizeof(cache));
+   cache[1] = x;
+
+   result = expand_powi_1 (mode, (n < 0) ? -n : n, cache);
+
+   /* If the original exponent was negative, reciprocate the result.  */
+   if (n < 0)
+     result = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
+ 			   result, NULL_RTX, 0, OPTAB_LIB_WIDEN);
+
+   return result;
+ }
+
+ /* Expand a call to the pow built-in mathematical function.  Return 0 if
+    a normal call should be emitted rather than expanding the function
+    in-line.  EXP is the expression that is a call to the builtin
+    function; if convenient, the result should be placed in TARGET.  */
+
+ static rtx
+ expand_builtin_pow (tree exp, rtx target, rtx subtarget)
+ {
+   tree arglist = TREE_OPERAND (exp, 1);
+   tree arg0, arg1;
+
+   if (!validate_arglist (arglist, REAL_TYPE, REAL_TYPE, VOID_TYPE))
+     return 0;
+
+   arg0 = TREE_VALUE (arglist);
+   arg1 = TREE_VALUE (TREE_CHAIN (arglist));
+
+   if (flag_unsafe_math_optimizations
+       && ! flag_errno_math
+       && TREE_CODE (arg1) == REAL_CST
+       && ! TREE_CONSTANT_OVERFLOW (arg1))
+     {
+       REAL_VALUE_TYPE cint;
+       REAL_VALUE_TYPE c;
+       HOST_WIDE_INT n;
+
+       c = TREE_REAL_CST (arg1);
+       n = real_to_integer (&c);
+       real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+       if (real_identical (&c, &cint)
+ 	  && powi_cost (n) <= POWI_MAX_MULTS)
+ 	{
+           enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+           rtx op = expand_expr (arg0, subtarget, VOIDmode, 0);
+           op = force_reg (mode, op);
+           return expand_powi (op, mode, n);
+ 	}
+     }
+   return expand_builtin_mathfn_2 (exp, target, NULL_RTX);
+ }
+
  /* Expand expression EXP which is a call to the strlen builtin.  Return 0
     if we failed the caller should emit a normal call, otherwise
     try to get the result in TARGET, if convenient.  */
*************** expand_builtin (tree exp, rtx target, rt
*** 4490,4495 ****
--- 4736,4748 ----
      case BUILT_IN_POW:
      case BUILT_IN_POWF:
      case BUILT_IN_POWL:
+       if (! flag_unsafe_math_optimizations)
+ 	break;
+       target = expand_builtin_pow (exp, target, subtarget);
+       if (target)
+ 	return target;
+       break;
+
      case BUILT_IN_ATAN2:
      case BUILT_IN_ATAN2F:
      case BUILT_IN_ATAN2L:
Index: doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.239
diff -c -3 -p -r1.239 tm.texi
*** doc/tm.texi	20 Jun 2003 06:41:12 -0000	1.239
--- doc/tm.texi	20 Jun 2003 15:02:18 -0000
*************** true when @var{after_prologue_epilogue_g
*** 9155,9157 ****
--- 9155,9168 ----
  to have to make special provisions in @code{INITIAL_ELIMINATION_OFFSET}
  to reserve space for caller-saved target registers.
  @end deftypefn
+
+ @defmac POWI_MAX_MULTS
+ If defined, this macro is interpreted as a signed integer C expression
+ that specifies the maximum number of floating point multiplications
+ that should be emitted when expanding exponentiation by an integer
+ constant inline.  When this value is defined, exponentiation requiring
+ more than this number of multiplications is implemented by calling the
+ system library's @code{pow}, @code{powf} or @code{powl} routines.
+ The default value places no upper bound on the multiplication count.
+ @end defmac
+


/* Copyright (C) 2003 Free Software Foundation.

   Check that the RTL expansion of floating point exponentiation by
   a constant integer doesn't break anything and produces the expected
   results.

   Written by Roger Sayle, 20th June 2003.  */

/* { dg-do run } */
/* { dg-options "-O2 -ffast-math" } */

extern double pow(double,double);
extern void abort(void);

double foo (double x)
{
  return pow (x, 6);
}

double bar (double x)
{
  return pow (x, -4);
}

int main()
{
  if (foo (2.0) != 64.0)
    abort ();

  if (bar (2.0) != 0.0625)
    abort ();

  return 0;
}


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]