[PATCH] Fix PR46728 (move pow/powi folds to tree phases)

William J. Schmidt wschmidt@linux.vnet.ibm.com
Fri May 13 15:58:00 GMT 2011


This patch addresses PR46728, which notes that pow and powi need to be
lowered in tree phases to restore lost FMA opportunities and expose
vectorization opportunities.

The approach is to move most optimizations from expand_builtin_pow[i]
into fold_builtin_pow[i]. One exception is the rewrite of powi as an
optimal sequence of multiplies, which relies on the ability to insert
new statements. Because fold_builtin_powi is called during front-end
parsing, the gimple machinery for statement manipulation can't be relied
upon there. 

A new pass (tree-ssa-math-opts.c:execute_lower_pow) is added at all opt
levels to lower calls to pow and powi early in the middle end. This is
where the expansion of powi into multiplies takes place. Other folds
from fold_builtin_pow[i] take place here as well. In many cases, these
opportunities were already folded during parsing, but not all front ends
may do this, so the patch doesn't rely on it.

Miscellaneous notes:

      * expand_builtin_pow[i] remain as skeletons, for those cases that
        can't be lowered into another form.
        
      * In some cases, fold_builtin_sqrt attempts to convert sqrt into
        pow[i] (the inverse of what the pow lowering does); this must be
        disabled when a hardware sqrt instruction is available.
        
      * Many fewer pow invocations will exist in tree form; they will
        now be converted to use sqrt, cbrt, powi, etc. where possible.
        Because powi will be more prevalent now, I duplicated many of
        the pow folds in fold-const.c to work on powi as well.
        
      * I added 16 new test cases. There is already good test coverage
        for this area, so most of the tests are powerpc-specific to test
        behavior when a hardware sqrt instruction is available.
        
Patch was regression-tested on powerpc64-linux and i686-linux-gnu. OK
for mainline?

Bill


gcc/

2011-04-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* Makefile.in (tree-ssa-math-opts.o): Add dependency on
	tree-ssa-propagate.h.
	* builtins.c (fold_builtin_pow, fold_builtin_powi): Remove forward
	declarations.
	(powi_cost): Update commentary.
	(expand_powi_1): Remove.
	(expand_powi): Remove.
	(expand_builtin_pow_root): Remove.
	(expand_builtin_pow): Remove all folds.
	(expand_builtin_powi): Remove all folds.
	(fold_builtin_sqrt): Restrict fold of sqrt(Nroot(x)) when
	hardware sqrt is available; add fold of sqrt(powi(x,y)) to
	pow(|x|,y*0.5).
	(fold_builtin_cbrt): Restrict fold of (cbrt(sqrt(x)) when hardware
	sqrt is available.
	(fold_eval_powi): New function.
	(build_call_expr_loc_strip_sign): New function.
	(fold_builtin_pow_frac_exp): New function.
	(fold_builtin_pow): Remove static declaraion; add fold of
	pow(x,0.25) to sqrt(sqrt(x)); add fold of pow(x,0.75) to
	sqrt(x)*sqrt(sqrt(x)); add folds of pow(x,c) to use powi when c,
	2c, or 3c is an integer.
	(powi_as_mults_1): New function.
	(powi_as_mults): New function.
	(tree_expand_builtin_powi): New function.
	(fold_builtin_powi): Remove static declaration; delay handling of
	compile-time constants until after simple folds, and move that
	handling into fold_eval_powi.
	* fold-const.c (fold_binary_loc): Add folds on powi similar to
	existing folds on pow; remove fold of x*x to pow(x,2.0).
	* passes.c (init_optimization_passes): Add pass_lower_pow.
	* tree.h (fold_builtin_pow, fold_builtin_powi): Add declarations.
	* tree-flow.h (tree_expand_builtin_powi): Add declaration.
	* tree-pass.h (pass_lower_pow): Add declaration.
	* tree-ssa-math-opts.c (tree-ssa-propagate.h): New include.
	(execute_lower_pow): New function.
	(pass_lower_pow): New gimple_opt_pass.
	
gcc/testsuite/

2011-05-13  Bill Schmidt <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr46728-1.c: New testcase.
	* gcc.target/powerpc/pr46728-2.c: New testcase.
	* gcc.target/powerpc/pr46728-3.c: New testcase.
	* gcc.target/powerpc/pr46728-4.c: New testcase.
	* gcc.target/powerpc/pr46728-5.c: New testcase.
	* gcc.dg/pr46728-6.c: New testcase.
	* gcc.target/powerpc/pr46728-7.c: New testcase.
	* gcc.target/powerpc/pr46728-8.c: New testcase.
	* gcc.dg/pr46728-9.c: New testcase.
	* gcc.target/powerpc/pr46728-10.c: New testcase.
	* gcc.target/powerpc/pr46728-11.c: New testcase.
	* gcc.dg/pr46728-12.c: New testcase.
	* gcc.target/powerpc/pr46728-13.c: New testcase.
	* gcc.target/powerpc/pr46728-14.c: New testcase.
	* gcc.target/powerpc/pr46728-15.c: New testcase.
	* gcc.target/powerpc/pr46728-16.c: New testcase.
	

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 173730)
+++ gcc/tree.h	(working copy)
@@ -1,6 +1,6 @@
 /* Front-end tree definitions for GNU compiler.
    Copyright (C) 1989, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -5271,6 +5271,8 @@
 extern bool fold_builtin_next_arg (tree, bool);
 extern enum built_in_function builtin_mathfn_code (const_tree);
 extern tree fold_builtin_call_array (location_t, tree, tree, int, tree *);
+extern tree fold_builtin_pow (location_t, tree, tree, tree, tree);
+extern tree fold_builtin_powi (location_t, tree, tree, tree, tree);
 extern tree build_call_expr_loc_array (location_t, tree, int, tree *);
 extern tree build_call_expr_loc_vec (location_t, tree, VEC(tree,gc) *);
 extern tree build_call_expr_loc (location_t, tree, int, ...);
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 173730)
+++ gcc/tree-pass.h	(working copy)
@@ -419,6 +419,7 @@
 extern struct gimple_opt_pass pass_cse_sincos;
 extern struct gimple_opt_pass pass_optimize_bswap;
 extern struct gimple_opt_pass pass_optimize_widening_mul;
+extern struct gimple_opt_pass pass_lower_pow;
 extern struct gimple_opt_pass pass_warn_function_return;
 extern struct gimple_opt_pass pass_warn_function_noreturn;
 extern struct gimple_opt_pass pass_cselim;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 173730)
+++ gcc/builtins.c	(working copy)
@@ -149,8 +149,6 @@
 static rtx expand_builtin_signbit (tree, rtx);
 static tree fold_builtin_sqrt (location_t, tree, tree);
 static tree fold_builtin_cbrt (location_t, tree, tree);
-static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
-static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
 static tree fold_builtin_cos (location_t, tree, tree, tree);
 static tree fold_builtin_cosh (location_t, tree, tree, tree);
 static tree fold_builtin_tan (tree, tree);
@@ -2940,7 +2938,7 @@
 
 /* Return the number of multiplications required to calculate
    powi(x,n) for an arbitrary x, given the exponent N.  This
-   function needs to be kept in sync with expand_powi below.  */
+   function needs to be kept in sync with fold_powi_as_mults, below.  */
 
 static int
 powi_cost (HOST_WIDE_INT n)
@@ -2981,165 +2979,6 @@
   return result + powi_lookup_cost (val, cache);
 }
 
-/* Recursive subroutine of expand_powi.  This function takes the array,
-   CACHE, of already calculated exponents and an exponent N and returns
-   an RTX that corresponds to CACHE[1]**N, as calculated in mode MODE.  */
-
-static rtx
-expand_powi_1 (enum machine_mode mode, unsigned HOST_WIDE_INT n, rtx *cache)
-{
-  unsigned HOST_WIDE_INT digit;
-  rtx target, result;
-  rtx op0, op1;
-
-  if (n < POWI_TABLE_SIZE)
-    {
-      if (cache[n])
-	return cache[n];
-
-      target = gen_reg_rtx (mode);
-      cache[n] = target;
-
-      op0 = expand_powi_1 (mode, n - powi_table[n], cache);
-      op1 = expand_powi_1 (mode, powi_table[n], cache);
-    }
-  else if (n & 1)
-    {
-      target = gen_reg_rtx (mode);
-      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
-      op0 = expand_powi_1 (mode, n - digit, cache);
-      op1 = expand_powi_1 (mode, digit, cache);
-    }
-  else
-    {
-      target = gen_reg_rtx (mode);
-      op0 = expand_powi_1 (mode, n >> 1, cache);
-      op1 = op0;
-    }
-
-  result = expand_mult (mode, op0, op1, target, 0);
-  if (result != target)
-    emit_move_insn (target, result);
-  return target;
-}
-
-/* Expand the RTL to evaluate powi(x,n) in mode MODE.  X is the
-   floating point operand in mode MODE, and N is the exponent.  This
-   function needs to be kept in sync with powi_cost above.  */
-
-static rtx
-expand_powi (rtx x, enum machine_mode mode, HOST_WIDE_INT n)
-{
-  rtx cache[POWI_TABLE_SIZE];
-  rtx result;
-
-  if (n == 0)
-    return CONST1_RTX (mode);
-
-  memset (cache, 0, sizeof (cache));
-  cache[1] = x;
-
-  result = expand_powi_1 (mode, (n < 0) ? -n : n, cache);
-
-  /* If the original exponent was negative, reciprocate the result.  */
-  if (n < 0)
-    result = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-			   result, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-
-  return result;
-}
-
-/* Fold a builtin function call to pow, powf, or powl into a series of sqrts or
-   cbrts.  Return NULL_RTX if no simplification can be made or expand the tree
-   if we can simplify it.  */
-static rtx
-expand_builtin_pow_root (location_t loc, tree arg0, tree arg1, tree type,
-			 rtx subtarget)
-{
-  if (TREE_CODE (arg1) == REAL_CST
-      && !TREE_OVERFLOW (arg1)
-      && flag_unsafe_math_optimizations)
-    {
-      enum machine_mode mode = TYPE_MODE (type);
-      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
-      REAL_VALUE_TYPE c = TREE_REAL_CST (arg1);
-      tree op = NULL_TREE;
-
-      if (sqrtfn)
-	{
-	  /* Optimize pow (x, 0.5) into sqrt.  */
-	  if (REAL_VALUES_EQUAL (c, dconsthalf))
-	    op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-
-	  /* Don't do this optimization if we don't have a sqrt insn.  */
-	  else if (optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
-	    {
-	      REAL_VALUE_TYPE dconst1_4 = dconst1;
-	      REAL_VALUE_TYPE dconst3_4;
-	      SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
-
-	      real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
-	      SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
-
-	      /* Optimize pow (x, 0.25) into sqrt (sqrt (x)).  Assume on most
-		 machines that a builtin sqrt instruction is smaller than a
-		 call to pow with 0.25, so do this optimization even if
-		 -Os.  */
-	      if (REAL_VALUES_EQUAL (c, dconst1_4))
-		{
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, op);
-		}
-
-	      /* Optimize pow (x, 0.75) = sqrt (x) * sqrt (sqrt (x)) unless we
-		 are optimizing for space.  */
-	      else if (optimize_insn_for_speed_p ()
-		       && !TREE_SIDE_EFFECTS (arg0)
-		       && REAL_VALUES_EQUAL (c, dconst3_4))
-		{
-		  tree sqrt1 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
-		  tree sqrt2 = builtin_save_expr (sqrt1);
-		  tree sqrt3 = build_call_expr_loc (loc, sqrtfn, 1, sqrt1);
-		  op = fold_build2_loc (loc, MULT_EXPR, type, sqrt2, sqrt3);
-		}
-	    }
-	}
-
-      /* Check whether we can do cbrt insstead of pow (x, 1./3.) and
-	 cbrt/sqrts instead of pow (x, 1./6.).  */
-      if (cbrtfn && ! op
-	  && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
-	{
-	  /* First try 1/3.  */
-	  REAL_VALUE_TYPE dconst1_3
-	    = real_value_truncate (mode, dconst_third ());
-
-	  if (REAL_VALUES_EQUAL (c, dconst1_3))
-	    op = build_call_nofold_loc (loc, cbrtfn, 1, arg0);
-
-	      /* Now try 1/6.  */
-	  else if (optimize_insn_for_speed_p ()
-		   && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
-	    {
-	      REAL_VALUE_TYPE dconst1_6 = dconst1_3;
-	      SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
-
-	      if (REAL_VALUES_EQUAL (c, dconst1_6))
-		{
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-		  op = build_call_nofold_loc (loc, cbrtfn, 1, op);
-		}
-	    }
-	}
-
-      if (op)
-	return expand_expr (op, subtarget, mode, EXPAND_NORMAL);
-    }
-
-  return NULL_RTX;
-}
-
 /* Expand a call to the pow built-in mathematical function.  Return NULL_RTX if
    a normal call should be emitted rather than expanding the function
    in-line.  EXP is the expression that is a call to the builtin
@@ -3148,147 +2987,9 @@
 static rtx
 expand_builtin_pow (tree exp, rtx target, rtx subtarget)
 {
-  tree arg0, arg1;
-  tree fn, narg0;
-  tree type = TREE_TYPE (exp);
-  REAL_VALUE_TYPE cint, c, c2;
-  HOST_WIDE_INT n;
-  rtx op, op2;
-  enum machine_mode mode = TYPE_MODE (type);
-
   if (! validate_arglist (exp, REAL_TYPE, REAL_TYPE, VOID_TYPE))
     return NULL_RTX;
 
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-
-  if (TREE_CODE (arg1) != REAL_CST
-      || TREE_OVERFLOW (arg1))
-    return expand_builtin_mathfn_2 (exp, target, subtarget);
-
-  /* Handle constant exponents.  */
-
-  /* For integer valued exponents we can expand to an optimal multiplication
-     sequence using expand_powi.  */
-  c = TREE_REAL_CST (arg1);
-  n = real_to_integer (&c);
-  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-  if (real_identical (&c, &cint)
-      && ((n >= -1 && n <= 2)
-	  || (flag_unsafe_math_optimizations
-	      && optimize_insn_for_speed_p ()
-	      && powi_cost (n) <= POWI_MAX_MULTS)))
-    {
-      op = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL);
-      if (n != 1)
-	{
-	  op = force_reg (mode, op);
-	  op = expand_powi (op, mode, n);
-	}
-      return op;
-    }
-
-  narg0 = builtin_save_expr (arg0);
-
-  /* If the exponent is not integer valued, check if it is half of an integer.
-     In this case we can expand to sqrt (x) * x**(n/2).  */
-  fn = mathfn_built_in (type, BUILT_IN_SQRT);
-  if (fn != NULL_TREE)
-    {
-      real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
-      n = real_to_integer (&c2);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      if (real_identical (&c2, &cint)
-	  && ((flag_unsafe_math_optimizations
-	       && optimize_insn_for_speed_p ()
-	       && powi_cost (n/2) <= POWI_MAX_MULTS)
-	      /* Even the c == 0.5 case cannot be done unconditionally
-	         when we need to preserve signed zeros, as
-		 pow (-0, 0.5) is +0, while sqrt(-0) is -0.  */
-	      || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
-	      /* For c == 1.5 we can assume that x * sqrt (x) is always
-	         smaller than pow (x, 1.5) if sqrt will not be expanded
-		 as a call.  */
-	      || (n == 3
-		  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
-	{
-	  tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
-						  narg0);
-	  /* Use expand_expr in case the newly built call expression
-	     was folded to a non-call.  */
-	  op = expand_expr (call_expr, subtarget, mode, EXPAND_NORMAL);
-	  if (n != 1)
-	    {
-	      op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
-	      op2 = force_reg (mode, op2);
-	      op2 = expand_powi (op2, mode, abs (n / 2));
-	      op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
-					0, OPTAB_LIB_WIDEN);
-	      /* If the original exponent was negative, reciprocate the
-		 result.  */
-	      if (n < 0)
-		op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-				   op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-	    }
-	  return op;
-	}
-    }
-
-  /* Check whether we can do a series of sqrt or cbrt's instead of the pow
-     call.  */
-  op = expand_builtin_pow_root (EXPR_LOCATION (exp), arg0, arg1, type,
-				subtarget);
-  if (op)
-    return op;
-
-  /* Try if the exponent is a third of an integer.  In this case
-     we can expand to x**(n/3) * cbrt(x)**(n%3).  As cbrt (x) is
-     different from pow (x, 1./3.) due to rounding and behavior
-     with negative x we need to constrain this transformation to
-     unsafe math and positive x or finite math.  */
-  fn = mathfn_built_in (type, BUILT_IN_CBRT);
-  if (fn != NULL_TREE
-      && flag_unsafe_math_optimizations
-      && (tree_expr_nonnegative_p (arg0)
-	  || !HONOR_NANS (mode)))
-    {
-      REAL_VALUE_TYPE dconst3;
-      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
-      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
-      real_round (&c2, mode, &c2);
-      n = real_to_integer (&c2);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
-      real_convert (&c2, mode, &c2);
-      if (real_identical (&c2, &c)
-	  && ((optimize_insn_for_speed_p ()
-	       && powi_cost (n/3) <= POWI_MAX_MULTS)
-	      || n == 1))
-	{
-	  tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
-						  narg0);
-	  op = expand_builtin (call_expr, NULL_RTX, subtarget, mode, 0);
-	  if (abs (n) % 3 == 2)
-	    op = expand_simple_binop (mode, MULT, op, op, op,
-				      0, OPTAB_LIB_WIDEN);
-	  if (n != 1)
-	    {
-	      op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
-	      op2 = force_reg (mode, op2);
-	      op2 = expand_powi (op2, mode, abs (n / 3));
-	      op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
-					0, OPTAB_LIB_WIDEN);
-	      /* If the original exponent was negative, reciprocate the
-		 result.  */
-	      if (n < 0)
-		op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-				   op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-	    }
-	  return op;
-	}
-    }
-
-  /* Fall back to optab expansion.  */
   return expand_builtin_mathfn_2 (exp, target, subtarget);
 }
 
@@ -3312,27 +3013,6 @@
   arg1 = CALL_EXPR_ARG (exp, 1);
   mode = TYPE_MODE (TREE_TYPE (exp));
 
-  /* Handle constant power.  */
-
-  if (TREE_CODE (arg1) == INTEGER_CST
-      && !TREE_OVERFLOW (arg1))
-    {
-      HOST_WIDE_INT n = TREE_INT_CST_LOW (arg1);
-
-      /* If the exponent is -1, 0, 1 or 2, then expand_powi is exact.
-	 Otherwise, check the number of multiplications required.  */
-      if ((TREE_INT_CST_HIGH (arg1) == 0
-	   || TREE_INT_CST_HIGH (arg1) == -1)
-	  && ((n >= -1 && n <= 2)
-	      || (optimize_insn_for_speed_p ()
-		  && powi_cost (n) <= POWI_MAX_MULTS)))
-	{
-	  op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
-	  op0 = force_reg (mode, op0);
-	  return expand_powi (op0, mode, n);
-	}
-    }
-
   /* Emit a libcall to libgcc.  */
 
   /* Mode of the 2nd argument must match that of an int.  */
@@ -7195,8 +6875,13 @@
       return build_call_expr_loc (loc, expfn, 1, arg);
     }
 
-  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  */
-  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode))
+  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  However, for N=2, only 
+     do this if there is no hardware sqrt instruction.  (For N=3, this
+     has the effect of canonicalizing sqrt(cbrt(x)) as cbrt(sqrt(x)),
+     due to folding on pow(x,1/6).)  */
+  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode)
+      && (!BUILTIN_SQRT_P (fcode)
+	  || optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing))
     {
       tree powfn = mathfn_built_in (type, BUILT_IN_POW);
 
@@ -7238,6 +6923,39 @@
       return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
     }
 
+  /* Optimize sqrt(powi(x,y)) = pow(|x|,y*0.5).  */
+  if (flag_unsafe_math_optimizations
+      && (fcode == BUILT_IN_POWI
+	  || fcode == BUILT_IN_POWIF
+	  || fcode == BUILT_IN_POWIL))
+    {
+      tree powfn = mathfn_built_in (type, BUILT_IN_POW);
+      tree arg0 = CALL_EXPR_ARG (arg, 0);
+      tree arg1 = CALL_EXPR_ARG (arg, 1);
+      tree narg1;
+      if (!tree_expr_nonnegative_p (arg0))
+	arg0 = build1 (ABS_EXPR, type, arg0);
+      narg1 = fold_convert_loc (loc, type, arg1);
+      narg1 = fold_build2_loc (loc, MULT_EXPR, type, narg1,
+			   build_real (type, dconsthalf));
+      return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
+    }
+
+  /* Optimize sqrt(x*x) = |x|.  */
+  if (flag_unsafe_math_optimizations
+      && TREE_CODE (arg) == MULT_EXPR)
+    {
+      tree arg0 = TREE_OPERAND (arg, 0);
+      tree arg1 = TREE_OPERAND (arg, 1);
+
+      if (operand_equal_p (arg0, arg1, 0))
+	{
+	  if (!tree_expr_nonnegative_p (arg0))
+	    arg0 = build1 (ABS_EXPR, type, arg0);
+	  return fold_convert_loc (loc, type, arg0);
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -7271,8 +6989,11 @@
 	  return build_call_expr_loc (loc, expfn, 1, arg);
 	}
 
-      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6).  */
-      if (BUILTIN_SQRT_P (fcode))
+      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6), but only if there is no
+	 native square root instruction or we are optimizing for size.  */
+      if (BUILTIN_SQRT_P (fcode)
+	  && (optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing
+	      || !optimize_function_for_speed_p (cfun)))
 	{
 	  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
 
@@ -8010,11 +7731,177 @@
 }
 
 
+/* Attempt to evaluate powi(arg0,n) at compile time, unless this should
+   raise an exception.  */
+static tree
+fold_eval_powi (tree arg0, HOST_WIDE_INT n, tree type, enum machine_mode mode)
+{
+  if (TREE_CODE (arg0) == REAL_CST
+      && !TREE_OVERFLOW (arg0)
+      && (n > 0
+	  || (!flag_trapping_math && !flag_errno_math)
+	  || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
+    {
+      REAL_VALUE_TYPE x;
+      bool inexact;
+      
+      x = TREE_REAL_CST (arg0);
+      inexact = real_powi (&x, mode, &x, n);
+      if (flag_unsafe_math_optimizations || !inexact)
+	return build_real (type, x);
+    }
+
+  return NULL_TREE;
+}
+
+
+/* Build a call to FNDECL with location LOC and arguments ARG0 and ARG1.
+   If N is even, strip the sign from ARG0 before building the call.  */
+static tree
+build_call_expr_loc_strip_sign (HOST_WIDE_INT n, location_t loc, tree fndecl,
+				tree arg0, tree arg1)
+{
+  if ((n & 1) == 0 && flag_unsafe_math_optimizations)
+    {
+      tree narg0 = fold_strip_sign_ops (arg0);
+      if (narg0)
+	return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
+    }
+
+  return build_call_expr_loc (loc, fndecl, 2, arg0, arg1);
+}
+
+
+/* Attempt to optimize pow(ARG0, C), where C is a real constant not equal
+   to any integer.  When 2C or 3C is an integer, we can sometimes improve
+   the code using sqrt and/or cbrt.  */
+static tree
+fold_builtin_pow_frac_exp (location_t loc, tree arg0, REAL_VALUE_TYPE c,
+			   tree type, enum machine_mode mode)
+{
+  REAL_VALUE_TYPE c2, cint;
+  HOST_WIDE_INT n;
+  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
+  tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
+  tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
+  
+  /* Optimize pow(x,c), where c = floor(c) + 0.5, into
+     sqrt(x) * powi(x, floor(c)).  */
+
+  real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
+  n = real_to_integer (&c2);
+  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+
+  if (real_identical (&c2, &cint)
+      && ((flag_unsafe_math_optimizations
+	   && sqrtfn != NULL_TREE
+	   && powi_cost (n/2) <= POWI_MAX_MULTS)
+	  /* pow(x,0.5) can be done unconditionally provided signed
+	     zeros must not be maintained.  pow(-0,0.5) = +0, but 
+	     sqrt(-0) = -0.  */
+	  || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
+	  /* pow(x,1.5)=x*sqrt(x) is safe, and smaller than pow(x,1.5)
+	     provided sqrt will not be expanded as a call.  */
+	  || (n == 3
+	      && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
+    {
+      tree narg0 = builtin_save_expr (arg0);
+      tree powi_x_floor_c = NULL_TREE;
+      HOST_WIDE_INT floor_c = n / 2;
+      if (n <= 0)
+	floor_c--;
+
+      /* Attempt to fold powi(arg0, floor_c) into a constant.  */
+      powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
+
+      if (!powi_x_floor_c && powifn)
+	{
+	  tree tree_floor_c = build_int_cst (integer_type_node, floor_c);
+	  powi_x_floor_c = build_call_expr_loc_strip_sign (floor_c, loc, powifn,
+							   narg0, tree_floor_c);
+	}
+
+      if (powi_x_floor_c)
+	{
+	  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, narg0);
+	  return fold_build2_loc (loc, MULT_EXPR, type,
+				  sqrt_arg0, powi_x_floor_c);
+	}
+    }
+
+  /* Optimize pow(x,c), where 3c = n for some integer n, into
+     powi(x, floor(c)) * powi(cbrt(x), n%3).  */
+  if (cbrtfn != NULL_TREE
+      && powifn != NULL_TREE
+      && flag_unsafe_math_optimizations
+      && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
+    {
+      REAL_VALUE_TYPE dconst3;
+      
+      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
+      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
+      real_round (&c2, mode, &c2);
+      n = real_to_integer (&c2);
+      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
+      real_convert (&c2, mode, &c2);
+      if (real_identical (&c2, &c)
+	  && ((optimize_function_for_speed_p (cfun)
+	       && powi_cost (n / 3) <= POWI_MAX_MULTS)
+	      || n == 1))
+	{
+	  HOST_WIDE_INT floor_c = n / 3;
+	  tree narg0 = builtin_save_expr (arg0);
+	  tree powi_x_floor_c;
+
+	  if (n <= 0)
+	    floor_c--;
+
+	  /* Attempt to fold powi(x, floor(c)) into a constant.  */
+	  powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
+
+	  if (!powi_x_floor_c)
+	    {
+	      tree tree_floor_c =
+		build_int_cst (integer_type_node, floor_c);
+
+	      powi_x_floor_c = 
+		build_call_expr_loc_strip_sign (floor_c, loc, powifn,
+						narg0, tree_floor_c);
+	    }
+
+	  if (powi_x_floor_c)
+	    {
+	      HOST_WIDE_INT n_mod_3 = n % 3;
+	      tree tree_n_mod_3, powi_cbrt_x, cbrt_arg0;
+	      
+	      if (n <= 0)
+		n_mod_3 = n_mod_3 + 3;
+
+	      tree_n_mod_3 = build_int_cst (integer_type_node, n_mod_3);
+
+	      cbrt_arg0 = build_call_nofold_loc (loc, cbrtfn, 1, narg0);
+	      powi_cbrt_x =
+		build_call_expr_loc_strip_sign (n_mod_3, loc, powifn,
+						cbrt_arg0, tree_n_mod_3);
+
+	      if (powi_cbrt_x)
+		return fold_build2_loc (loc, MULT_EXPR, type,
+					powi_x_floor_c, powi_cbrt_x);
+	    }
+	}
+    }
+
+  return NULL_TREE;
+}
+
+
 /* Fold a builtin function call to pow, powf, or powl.  Return
    NULL_TREE if no simplification can be made.  */
-static tree
+tree
 fold_builtin_pow (location_t loc, tree fndecl, tree arg0, tree arg1, tree type)
 {
+  enum machine_mode mode = TYPE_MODE (type);
   tree res;
 
   if (!validate_arg (arg0, REAL_TYPE)
@@ -8032,9 +7919,10 @@
   if (TREE_CODE (arg1) == REAL_CST
       && !TREE_OVERFLOW (arg1))
     {
-      REAL_VALUE_TYPE cint;
       REAL_VALUE_TYPE c;
-      HOST_WIDE_INT n;
+      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
+      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
+      REAL_VALUE_TYPE dconst1_4, dconst3_4;
 
       c = TREE_REAL_CST (arg1);
 
@@ -8054,56 +7942,63 @@
 
       /* Optimize pow(x,0.5) = sqrt(x).  */
       if (flag_unsafe_math_optimizations
-	  && REAL_VALUES_EQUAL (c, dconsthalf))
+	  && REAL_VALUES_EQUAL (c, dconsthalf)
+	  && sqrtfn != NULL_TREE)
+	return build_call_expr_loc (loc, sqrtfn, 1, arg0);
+
+      /* Optimize pow(x,0.25) = sqrt(sqrt(x)).  */
+      dconst1_4 = dconst1;
+      SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
+
+      if (flag_unsafe_math_optimizations
+	  && REAL_VALUES_EQUAL (c, dconst1_4)
+	  && sqrtfn != NULL_TREE
+	  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	{
-	  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-
-	  if (sqrtfn != NULL_TREE)
-	    return build_call_expr_loc (loc, sqrtfn, 1, arg0);
+	  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
+	  return build_call_nofold_loc (loc, sqrtfn, 1, sqrt_arg0);
 	}
 
-      /* Optimize pow(x,1.0/3.0) = cbrt(x).  */
-      if (flag_unsafe_math_optimizations)
+      /* Optimize pow(x,0.75) = sqrt(x) * sqrt(sqrt(x)) unless we are
+	 optimizing for space.  */
+      real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
+      SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
+
+      if (flag_unsafe_math_optimizations
+	  && optimize_function_for_speed_p (cfun)
+	  && !TREE_SIDE_EFFECTS (arg0)
+	  && REAL_VALUES_EQUAL (c, dconst3_4)
+	  && sqrtfn != NULL_TREE
+	  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	{
-	  const REAL_VALUE_TYPE dconstroot
-	    = real_value_truncate (TYPE_MODE (type), dconst_third ());
-
-	  if (REAL_VALUES_EQUAL (c, dconstroot))
-	    {
-	      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
-	      if (cbrtfn != NULL_TREE)
-		return build_call_expr_loc (loc, cbrtfn, 1, arg0);
-	    }
+	  tree sqrt_arg0 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
+	  tree sqrt_save = builtin_save_expr (sqrt_arg0);
+	  tree sqrt_sqrt = build_call_expr_loc (loc, sqrtfn, 1, sqrt_arg0);
+	  return fold_build2_loc (loc, MULT_EXPR, type, sqrt_save, sqrt_sqrt);
 	}
 
-      /* Check for an integer exponent.  */
-      n = real_to_integer (&c);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      if (real_identical (&c, &cint))
+      /* Optimize pow(x,1.0/3.0) = cbrt(x), and pow(x,1.0/6.0) =
+	 cbrt(sqrt(x)).  */
+      if (flag_unsafe_math_optimizations && cbrtfn != NULL_TREE)
 	{
-	  /* Attempt to evaluate pow at compile-time, unless this should
-	     raise an exception.  */
-	  if (TREE_CODE (arg0) == REAL_CST
-	      && !TREE_OVERFLOW (arg0)
-	      && (n > 0
-		  || (!flag_trapping_math && !flag_errno_math)
-		  || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
-	    {
-	      REAL_VALUE_TYPE x;
-	      bool inexact;
+	  const REAL_VALUE_TYPE dconst1_3
+	    = real_value_truncate (mode, dconst_third ());
 
-	      x = TREE_REAL_CST (arg0);
-	      inexact = real_powi (&x, TYPE_MODE (type), &x, n);
-	      if (flag_unsafe_math_optimizations || !inexact)
-		return build_real (type, x);
-	    }
+	  if (REAL_VALUES_EQUAL (c, dconst1_3))
+	    return build_call_expr_loc (loc, cbrtfn, 1, arg0);
 
-	  /* Strip sign ops from even integer powers.  */
-	  if ((n & 1) == 0 && flag_unsafe_math_optimizations)
+	  if (optimize_function_for_speed_p (cfun)
+	      && sqrtfn != NULL_TREE
+	      && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	    {
-	      tree narg0 = fold_strip_sign_ops (arg0);
-	      if (narg0)
-		return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
+	      REAL_VALUE_TYPE dconst1_6 = dconst1_3;
+	      SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
+
+	      if (REAL_VALUES_EQUAL (c, dconst1_6))
+		{
+		  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
+		  return build_call_nofold_loc (loc, cbrtfn, 1, sqrt_arg0);
+		}
 	    }
 	}
     }
@@ -8137,7 +8032,7 @@
 	  if (tree_expr_nonnegative_p (arg))
 	    {
 	      const REAL_VALUE_TYPE dconstroot
-		= real_value_truncate (TYPE_MODE (type), dconst_third ());
+		= real_value_truncate (mode, dconst_third ());
 	      tree narg1 = fold_build2_loc (loc, MULT_EXPR, type, arg1,
 					build_real (type, dconstroot));
 	      return build_call_expr_loc (loc, fndecl, 2, arg, narg1);
@@ -8159,12 +8054,148 @@
 	}
     }
 
+  if (TREE_CODE (arg1) == REAL_CST
+      && !TREE_OVERFLOW (arg1)
+      /* If we weren't able to fold a constant expression as reals,
+	 don't convert into a different form.  */
+      && TREE_CODE (arg0) != REAL_CST)
+    {
+      REAL_VALUE_TYPE c, cint;
+      HOST_WIDE_INT n;
+
+      c = TREE_REAL_CST (arg1);
+
+      /* Check for an integer exponent.  */
+      n = real_to_integer (&c);
+      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+      if (real_identical (&c, &cint)
+	  && powi_cost (n) <= POWI_MAX_MULTS)
+	{
+	  /* Convert to powi, which will be processed into an optimal
+	     number of multiplications.  */
+	  tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
+
+	  if (powifn)
+	    {
+	      tree power = build_int_cst (integer_type_node, n);
+	      return build_call_expr_loc (loc, powifn, 2, arg0, power);
+	    }
+	}
+
+      /* Check for specific fractional exponents we can optimize.  */
+      else
+	{
+	  tree opt_tree =
+	    fold_builtin_pow_frac_exp (loc, arg0, c, type, mode);
+
+	  if (opt_tree)
+	    return opt_tree;
+	}
+    }
+
   return NULL_TREE;
 }
 
+/* Recursive subroutine of fold_powi_as_mults.  This function takes the
+   array, CACHE, of already calculated exponents and an exponent N and
+   returns a tree that corresponds to CACHE[1]**N, with type TYPE.  */
+
+static tree
+powi_as_mults_1 (gimple_stmt_iterator *gsi, location_t loc, tree type,
+		 HOST_WIDE_INT n, tree *cache)
+{
+  tree op0, op1, target;
+  unsigned HOST_WIDE_INT digit;
+  gimple mult_stmt;
+
+  if (n < POWI_TABLE_SIZE)
+    {
+      if (cache[n])
+	return cache[n];
+
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      cache[n] = target;
+
+      op0 = powi_as_mults_1 (gsi, loc, type, n - powi_table[n], cache);
+      op1 = powi_as_mults_1 (gsi, loc, type, powi_table[n], cache);
+    }
+  else if (n & 1)
+    {
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
+      op0 = powi_as_mults_1 (gsi, loc, type, n - digit, cache);
+      op1 = powi_as_mults_1 (gsi, loc, type, digit, cache);
+    }
+  else
+    {
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      op0 = powi_as_mults_1 (gsi, loc, type, n >> 1, cache);
+      op1 = op0;
+    }
+
+  mult_stmt = gimple_build_assign_with_ops (MULT_EXPR, target, op0, op1);
+  SSA_NAME_DEF_STMT (target) = mult_stmt;
+  gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT);
+
+  return target;
+}
+
+/* Convert ARG0**N to a tree of multiplications of ARG0 with itself.
+   This function needs to be kept in sync with powi_cost, above.  */
+
+static tree
+powi_as_mults (gimple_stmt_iterator *gsi, location_t loc,
+	       tree arg0, HOST_WIDE_INT n)
+{
+  tree cache[POWI_TABLE_SIZE], result, type = TREE_TYPE (arg0);
+
+  if (n == 0)
+    return omit_one_operand_loc (loc, type, build_real (type, dconst1), arg0);
+
+  memset (cache, 0,  sizeof (cache));
+  cache[1] = arg0;
+
+  result = powi_as_mults_1 (gsi, loc, type, (n < 0) ? -n : n, cache);
+
+  /* If the original exponent was negative, reciprocate the result.  */
+  if (n < 0)
+    result = build2_loc (loc, RDIV_EXPR, type,
+			 build_real (type, dconst1), result);
+  return result;
+}
+
+/* ARGS are the two arguments to a powi builtin in GSI with location info
+   LOC.  If the arguments are appropriate, create an equivalent set of
+   statements prior to GSI using an optimal number of multiplications,
+   and return an expession holding the result.  */
+
+tree
+tree_expand_builtin_powi (gimple_stmt_iterator *gsi, location_t loc, tree *args)
+{
+  HOST_WIDE_INT n = TREE_INT_CST_LOW (args[1]);
+  HOST_WIDE_INT n_hi = TREE_INT_CST_HIGH (args[1]);
+
+  if ((n_hi == 0 || n_hi == -1)
+      /* Avoid largest negative number.  */
+      && (n != -n)
+      && ((n >= -1 && n <= 2)
+	  || (optimize_function_for_speed_p (cfun)
+	      && powi_cost (n) <= POWI_MAX_MULTS)))
+    return powi_as_mults (gsi, loc, args[0], n);
+
+  return NULL_TREE;
+}
+
 /* Fold a builtin function call to powi, powif, or powil with argument ARG.
    Return NULL_TREE if no simplification can be made.  */
-static tree
+
+tree
 fold_builtin_powi (location_t loc, tree fndecl ATTRIBUTE_UNUSED,
 		   tree arg0, tree arg1, tree type)
 {
@@ -8179,17 +8210,8 @@
   if (host_integerp (arg1, 0))
     {
       HOST_WIDE_INT c = TREE_INT_CST_LOW (arg1);
+      tree powi_const;
 
-      /* Evaluate powi at compile-time.  */
-      if (TREE_CODE (arg0) == REAL_CST
-	  && !TREE_OVERFLOW (arg0))
-	{
-	  REAL_VALUE_TYPE x;
-	  x = TREE_REAL_CST (arg0);
-	  real_powi (&x, TYPE_MODE (type), &x, c);
-	  return build_real (type, x);
-	}
-
       /* Optimize pow(x,0) = 1.0.  */
       if (c == 0)
 	return omit_one_operand_loc (loc, type, build_real (type, dconst1),
@@ -8203,6 +8225,12 @@
       if (c == -1)
 	return fold_build2_loc (loc, RDIV_EXPR, type,
 			   build_real (type, dconst1), arg0);
+
+      /* Attempt to evaluate powi at compile time.  */
+      powi_const = fold_eval_powi (arg0, c, type, TYPE_MODE (type));
+
+      if (powi_const)
+	return powi_const;
     }
 
   return NULL_TREE;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 173730)
+++ gcc/fold-const.c	(working copy)
@@ -10460,6 +10460,36 @@
 		    }
 		}
 
+	      /* Optimizations of powi(...)*powi(...).  */
+	      if ((fcode0 == BUILT_IN_POWI && fcode1 == BUILT_IN_POWI)
+		  || (fcode0 == BUILT_IN_POWIF && fcode1 == BUILT_IN_POWIF)
+		  || (fcode0 == BUILT_IN_POWIL && fcode1 == BUILT_IN_POWIL))
+		{
+		  tree arg00 = CALL_EXPR_ARG (arg0, 0);
+		  tree arg01 = CALL_EXPR_ARG (arg0, 1);
+		  tree arg10 = CALL_EXPR_ARG (arg1, 0);
+		  tree arg11 = CALL_EXPR_ARG (arg1, 1);
+
+		  /* Optimize powi(x,y)*powi(z,y) as powi(x*z,y).  */
+		  if (operand_equal_p (arg01, arg11, 0))
+		    {
+		      tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      tree arg = fold_build2_loc (loc, MULT_EXPR, type,
+					      arg00, arg10);
+		      return build_call_expr_loc (loc, powfn, 2, arg, arg01);
+		    }
+
+		  /* Optimize powi(x,y)*powi(x,z) as powi(x,y+z).  */
+		  if (operand_equal_p (arg00, arg10, 0))
+		    {
+		      tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      tree inttype = TREE_TYPE (arg01);
+		      tree arg = fold_build2_loc (loc, PLUS_EXPR, inttype,
+					      arg01, arg11);
+		      return build_call_expr_loc (loc, powfn, 2, arg00, arg);
+		    }
+		}
+
 	      /* Optimize tan(x)*cos(x) as sin(x).  */
 	      if (((fcode0 == BUILT_IN_TAN && fcode1 == BUILT_IN_COS)
 		   || (fcode0 == BUILT_IN_TANF && fcode1 == BUILT_IN_COSF)
@@ -10521,16 +10551,61 @@
 		    }
 		}
 
-	      /* Optimize x*x as pow(x,2.0), which is expanded as x*x.  */
-	      if (optimize_function_for_speed_p (cfun)
-		  && operand_equal_p (arg0, arg1, 0))
+	      /* Optimize x*powi(x,c) as powi(x,c+1).  */
+	      if (fcode1 == BUILT_IN_POWI
+		  || fcode1 == BUILT_IN_POWIF
+		  || fcode1 == BUILT_IN_POWIL)
 		{
-		  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
+		  tree arg10 = CALL_EXPR_ARG (arg1, 0);
+		  tree arg11 = CALL_EXPR_ARG (arg1, 1);
+		  if (TREE_CODE (arg11) == INTEGER_CST
+		      && !TREE_OVERFLOW (arg11)
+		      && operand_equal_p (arg0, arg10, 0))
+		    {
+		      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
+		      HOST_WIDE_INT n, n_hi, n_plus_1;
+		      tree arg;
 
-		  if (powfn)
+		      n = TREE_INT_CST_LOW (arg11);
+		      n_hi = TREE_INT_CST_HIGH (arg11);
+		      n_plus_1 = n + 1;
+		      if ((n_hi == 0 || n_hi == -1)
+			  /* Avoid overflow.  */
+			  && n_plus_1 > n)
+			{
+			  arg = build_int_cst (TREE_TYPE (arg11), n + 1);
+			  return build_call_expr_loc (loc, powifn, 2,
+						      arg0, arg);
+			}
+		    }
+		}
+
+	      /* Optimize powi(x,c)*x as powi(x,c+1).  */
+	      if (fcode0 == BUILT_IN_POWI
+		  || fcode0 == BUILT_IN_POWIF
+		  || fcode0 == BUILT_IN_POWIL)
+		{
+		  tree arg00 = CALL_EXPR_ARG (arg0, 0);
+		  tree arg01 = CALL_EXPR_ARG (arg0, 1);
+		  if (TREE_CODE (arg01) == INTEGER_CST
+		      && !TREE_OVERFLOW (arg01)
+		      && operand_equal_p (arg1, arg00, 0))
 		    {
-		      tree arg = build_real (type, dconst2);
-		      return build_call_expr_loc (loc, powfn, 2, arg0, arg);
+		      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      HOST_WIDE_INT n, n_hi, n_plus_1;
+		      tree arg;
+
+		      n = TREE_INT_CST_LOW (arg01);
+		      n_hi = TREE_INT_CST_HIGH (arg01);
+		      n_plus_1 = n + 1;
+		      if ((n_hi == 0 || n_hi == -1)
+			  /* Avoid overflow.  */
+			  && n_plus_1 > n)
+			{
+			  arg = build_int_cst (TREE_TYPE (arg01), n + 1);
+			  return build_call_expr_loc (loc, powifn, 2,
+						      arg1, arg);
+			}
 		    }
 		}
 	    }
@@ -11457,6 +11532,34 @@
 		}
 	    }
 
+	  /* Optimize powi(x,c)/x as powi(x,c-1).  */
+	  if (fcode0 == BUILT_IN_POWI
+	      || fcode0 == BUILT_IN_POWIF
+	      || fcode0 == BUILT_IN_POWIL)
+	    {
+	      tree arg00 = CALL_EXPR_ARG (arg0, 0);
+	      tree arg01 = CALL_EXPR_ARG (arg0, 1);
+	      if (TREE_CODE (arg01) == INTEGER_CST
+		  && !TREE_OVERFLOW (arg01)
+		  && operand_equal_p (arg1, arg00, 0))
+		{
+		  tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		  HOST_WIDE_INT n, n_hi, n_minus_1;
+		  tree arg;
+
+		  n = TREE_INT_CST_LOW (arg01);
+		  n_hi = TREE_INT_CST_HIGH (arg01);
+		  n_minus_1 = n - 1;
+		  if ((n_hi == 0 || n_hi == -1)
+		      /* Avoid overflow.  */
+		      && n_minus_1 < n)
+		    {
+		      arg = build_int_cst (TREE_TYPE (arg01), n - 1);
+		      return build_call_expr_loc (loc, powifn, 2, arg1, arg);
+		    }
+		}
+	    }
+
 	  /* Optimize a/root(b/c) into a*root(c/b).  */
 	  if (BUILTIN_ROOT_P (fcode1))
 	    {
@@ -11499,6 +11602,20 @@
 	      arg1 = build_call_expr_loc (loc, powfn, 2, arg10, neg11);
 	      return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
 	    }
+
+	  /* Optimize x/powi(y,z) into x*powi(y,-z).  */
+	  if (fcode1 == BUILT_IN_POWI
+	      || fcode1 == BUILT_IN_POWIF
+	      || fcode1 == BUILT_IN_POWIL)
+	    {
+	      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
+	      tree arg10 = CALL_EXPR_ARG (arg1, 0);
+	      tree arg11 = CALL_EXPR_ARG (arg1, 1);
+	      tree neg11 = fold_convert_loc (loc, integer_type_node,
+					     negate_expr (arg11));
+	      arg1 = build_call_expr_loc (loc, powifn, 2, arg10, neg11);
+	      return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
+	    }
 	}
       return NULL_TREE;
 
Index: gcc/testsuite/gcc.target/powerpc/pr46728-13.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-13.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-13.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 6.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-3.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.75);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt(values[i]) * sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "sqrt" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-14.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-14.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-14.c	(revision 0)
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 1.5);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 2.5);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -0.5);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, 10.5);
+}
+
+static double
+convert_it_5 (double x)
+{
+  return pow (x, -3.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = .999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 1);
+      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 2);
+      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], -1);
+      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 10);
+      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], -4);
+      if (fabs (convert_it_5 (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-4.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (values[i]))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-15.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-15.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-15.c	(revision 0)
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 10.0 / 3.0);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 11.0 / 3.0);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -7.0 / 3.0);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, -8.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = .999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+
+      x = __builtin_powi (values[i], 3);
+      y = __builtin_powi (cbrt (values[i]), 1);
+      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], 3);
+      y = __builtin_powi (cbrt (values[i]), 2);
+      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], -3);
+      y = __builtin_powi (cbrt (values[i]), 2);
+      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], -3);
+      y = __builtin_powi (cbrt (values[i]), 1);
+      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-5.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 6.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not " pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-16.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-16.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-16.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mcpu=power6" } */
+
+double foo (double x, double y)
+{
+  return __builtin_pow (x, 0.75) + y;
+}
+
+
+/* { dg-final { scan-assembler "fmadd" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-7.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 1.5);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 2.5);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -0.5);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, 10.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      if (convert_it_1 (values[i]) != sqrt (values[i]) * powi (values[i], 1))
+	abort ();
+      if (convert_it_2 (values[i]) != sqrt (values[i]) * powi (values[i], 2))
+	abort ();
+      if (convert_it_3 (values[i]) != sqrt (values[i]) * powi (values[i], -1))
+	abort ();
+      if (convert_it_4 (values[i]) != sqrt (values[i]) * powi (values[i], 10))
+	abort ();
+    }
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "sqrt" 5 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "powi" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-10.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-10.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-10.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.25);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/powerpc/pr46728-8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-8.c	(revision 0)
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 10.0 / 3.0);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 11.0 / 3.0);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -7.0 / 3.0);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, -8.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      if (convert_it_1 (values[i]) != 
+	  powi (values[i], 3) * powi (cbrt (values[i]), 1))
+	abort ();
+      if (convert_it_2 (values[i]) != 
+	  powi (values[i], 3) * powi (cbrt (values[i]), 2))
+	abort ();
+      if (convert_it_3 (values[i]) != 
+	  powi (values[i], -3) * powi (cbrt (values[i]), 2))
+	abort ();
+      if (convert_it_4 (values[i]) !=
+	  powi (values[i], -3) * powi (cbrt (values[i]), 1))
+	abort ();
+    }
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "powi" 8 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "cbrt" 5 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-11.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-11.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-11.c	(revision 0)
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.75);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+      x = sqrt (values[i]);
+      y = sqrt (sqrt (values[i]));
+  
+      if (fabs (convert_it (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-1.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (values[i]))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "fsqrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.25);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "fsqrt" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.dg/pr46728-9.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-9.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-9.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (fabs (convert_it (values[i]) / sqrt (values[i])) < PREC)
+      abort ();
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.dg/pr46728-12.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-12.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-12.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (fabs (convert_it (values[i]) / cbrt (values[i])) < PREC)
+      abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr46728-6.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-6.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-6.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -lm" } */
+
+#include <math.h>
+
+int
+main (int argc, char *argv[])
+{
+  volatile double result;
+
+  result = pow (-0.0, 3.0);
+  result = pow (26.47, -2.0);
+  result = pow (0.0, 0.0);
+  result = pow (22.3, 1.0);
+  result = pow (33.2, -1.0);
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-not "pow" } } */
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	(revision 173730)
+++ gcc/tree-ssa-math-opts.c	(working copy)
@@ -1,5 +1,5 @@
 /* Global, SSA-based optimizations using mathematical identities.
-   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -103,6 +103,7 @@
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
 #include "expr.h"		/* Because optabs.h wants sepops.  */
 #include "optabs.h"
+#include "tree-ssa-propagate.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -1854,3 +1855,123 @@
   | TODO_update_ssa                     /* todo_flags_finish */
  }
 };
+
+/* Simplify built-in calls to pow and powi.  This is done prior to
+   the vectorizer to expose vector square root and multiplication
+   series opportunities.  */
+
+static unsigned int
+execute_lower_pow (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    {
+      gimple_stmt_iterator gsi;
+
+      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
+        {
+	  gimple stmt = gsi_stmt (gsi);
+
+	  if (is_gimple_call (stmt))
+	    {
+	      tree fndecl = gimple_call_fndecl (stmt);
+	      tree result = NULL_TREE;
+
+	      if (!fndecl
+		  || TREE_CODE (fndecl) != FUNCTION_DECL
+		  || !DECL_BUILT_IN (fndecl)
+		  || gimple_call_va_arg_pack_p (stmt)
+		  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+		{
+		  gsi_next (&gsi);
+		  continue;
+		}
+
+	      switch (DECL_FUNCTION_CODE (fndecl))
+		{
+		case BUILT_IN_POW:
+		case BUILT_IN_POWF:
+		case BUILT_IN_POWL:
+		  {
+		    location_t loc = gimple_location (stmt);
+		    tree *args = gimple_call_arg_ptr (stmt, 0);
+		    tree type = TREE_TYPE (TREE_TYPE (fndecl));
+		    result = fold_builtin_pow (loc, fndecl, args[0],
+					       args[1], type);
+		    break;
+		  }
+		case BUILT_IN_POWI:
+		case BUILT_IN_POWIF:
+		case BUILT_IN_POWIL:
+		  {
+		    location_t loc = gimple_location (stmt);
+		    tree *args = gimple_call_arg_ptr (stmt, 0);
+		    tree type = TREE_TYPE (TREE_TYPE (fndecl));
+		    result = fold_builtin_powi (loc, fndecl, args[0],
+						args[1], type);
+
+		    /* Expanding powi into an optimal number of 
+		       multiplications requires adding statements,
+		       so handle that separately.  */
+		    if (result == NULL_TREE
+			&& host_integerp (args[1], 0)
+			&& !TREE_OVERFLOW (args[1]))
+		      result = tree_expand_builtin_powi (&gsi, loc, args);
+
+		    break;
+		  }
+		default:
+		  break;
+		}
+
+	      if (result)
+		{
+		  /* Propagate location information from original call to
+		     expansion of builtin.  Otherwise things like
+		     maybe_emit_chk_warning, that operate on the expansion
+		     of a builtin, will use the wrong location information.  */
+		  if (gimple_has_location (stmt))
+		    {
+		      tree realret = result;
+		      if (TREE_CODE (result) == NOP_EXPR)
+			realret = TREE_OPERAND (result, 0);
+		      if (CAN_HAVE_LOCATION_P (realret)
+			  && !EXPR_HAS_LOCATION (realret))
+			SET_EXPR_LOCATION (realret, gimple_location (stmt));
+		      result = realret;
+		    }
+		}
+
+	      if (result && !update_call_from_tree (&gsi, result))
+		gimplify_and_update_call_from_tree (&gsi, result);
+	    }
+
+	  gsi_next (&gsi);
+	}
+    }
+
+  return 0;
+}
+
+struct gimple_opt_pass pass_lower_pow =
+{
+ {
+  GIMPLE_PASS,
+  "lower_pow",				/* name */
+  NULL,					/* gate */
+  execute_lower_pow,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_NONE,				/* tv_id */
+  PROP_ssa,				/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_verify_ssa
+  | TODO_verify_stmts
+  | TODO_dump_func
+  | TODO_update_ssa                     /* todo_flags_finish */
+ }
+};
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 173730)
+++ gcc/tree-flow.h	(working copy)
@@ -856,4 +856,7 @@
 
 void swap_tree_operands (gimple, tree *, tree *);
 
+/* In builtins.c  */
+tree tree_expand_builtin_powi (gimple_stmt_iterator *, location_t, tree *);
+
 #endif /* _TREE_FLOW_H  */
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 173730)
+++ gcc/Makefile.in	(working copy)
@@ -2639,7 +2639,8 @@
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   tree-ssa-propagate.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 173730)
+++ gcc/passes.c	(working copy)
@@ -1,7 +1,7 @@
 /* Top level of GCC compilers (cc1, cc1plus, etc.)
    Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
-   Free Software Foundation, Inc.
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
+   2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -812,6 +812,7 @@
      output to the assembler file.  */
   p = &all_passes;
   NEXT_PASS (pass_lower_eh_dispatch);
+  NEXT_PASS (pass_lower_pow);
   NEXT_PASS (pass_all_optimizations);
     {
       struct opt_pass **p = &pass_all_optimizations.pass.sub;




More information about the Gcc-patches mailing list