This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] PR tree-optimization/20216: Rewrite tree_fold_binomial


The following patch resolves PR tree-optimization/20216 which is a
compile-time and memory hog regression in 4.0 and 4.1.  The problem
is that the new chains of recurrences functionality uses an inefficient
and inappropriate method to calculate the Binomial Coefficients required
in its analysis.

Consider the following simple test program from the bugzilla PR.


void foo(unsigned int *dst, unsigned int *src)
{
 int i, j;
 for (i = 0; i < 4; i++)
  for (j = 0; j < 1600000; j++)
   *dst++ = src[j];
}

During the analysis of whether these loops can be (profittably)
interchanged, the chrec analyser requires the evaluation of the
binomial coefficient binom(1599999,0).

The first problem is that tree-chrec.c contains a very literal
implementation of this function, and implements binom(n,k) using
the formula n!/((n-k)!k!), so in this case it evaluates this via
tree_fold_factorial(1599999)/tree_fold_factorial(1599999).

The second problem is the current implementation of tree_fold_factorial
is implemented literally as a recursive function, that at each of the
1.6 million levels of recursion uses fold (build2 (...)) to perform the
compile-time evaluation of arithmetic operations with constant arguments.

The net effect of this strategy is that for this simple two line testcase
above, GCC allocates over 600 Megabytes of garbage collected memory and
similarly impressive amount of stack and requires 8.23 seconds of user
time.  The complete rewrite of tree_fold_binomial below provides a 800x
speed-up, compiling this testcase in 0.01 seconds (total compilation
time), and no longer performs memory allocations in tree_fold_binomial.


The first realization is that in chrec_evlauate we're interested in
binomial co-efficients with small values of k, often with values of
k never larger than three or four.  Rather than use a tree to hold
this value, an unsigned int is more appropriate.  The most commonly
evaluated value of k in these calls is zero, and for all binom(n,0) = 1,
the next most common value of k is one, and for all binom(n,1) = n.

For the remaining cases, the code below uses an optimized iterative
algorithm to calculate n!/(n-k)!/k!, by performing the equivalent of

		prod(i=1 to k) { n-i+1 } / prod(i=1 to k) { i }
which is	prod(i=1 to k) { n-i+1 } / factorial(k)


i.e. for k=2, we calculate just (n*(n-1)) / 2,
and for k=3, we calculate (n*(n-1)*(n-2)) / 6.

For all of these calcuations, instead of pointlessly constructing
trees to pass to fold, we call mul_double and div_and_round_double
directly, to avoid any iterative memory allocation.  The existing
checks already confirm that n is known to be positive.  The previous
code always performed arithmetic in integer_type_node, we now
calculate intermediates at 2*precision(HOST_WIDE_INT) and just
check that the final result fits the appropriate type (much like
GCC's real.c software floating point emulation).

Although there are alternative implementations that can further reduce
overflow of the numerator, by dividing during each iteration, I decided
that this approach is already much more numerically stable than the
previous implementation, and that better support for the extremely rare
larger  values of k, typically isn't worth the additional effort to
avoid slowing down the common cases.

One additional/obvious enhancement below, that is a change of
functionality from the original, is that by using mul_double we can
now detect overflow for ourselves and stop early.  The strategy is
to return NULL_TREE from tree_fold_binomial if our calculation ever
overflows an intermediate (always the numerator), and the function
chrec_evaluate detects this condition and returns chrec_dont_know.


The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with a
top-level "make -k check" with no new failures.

Ok for mainline?  Seb, could you also confirm you're happy with this?



2005-02-27  Roger Sayle  <roger@eyesopen.com>

	PR tree-optimization/20216
	* tree-chrec.c (tree_fold_factorial): Delete.
	(tree_fold_binomial): Change argument list to take a return type
	and change the type of K to unsigned int.  Rewrite to avoid explicit
	evaluation of factorials, and (recursively) calling fold to perform
	compile-time arithmetic.  Return NULL on (internal) overflow.
	(chrec_evaluate): Change type of K to an unsigned int.  Avoid
	calling tree_fold_binomial unnecessarily.  Return chrec_dont_know
	if any intermediate calculation overflows.
	(chrec_apply): Update call to chrec_evaluate.


Index: tree-chrec.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-chrec.c,v
retrieving revision 2.12
diff -c -3 -p -r2.12 tree-chrec.c
*** tree-chrec.c	9 Dec 2004 16:17:00 -0000	2.12
--- tree-chrec.c	27 Feb 2005 04:20:57 -0000
***************
*** 1,5 ****
  /* Chains of recurrences.
!    Copyright (C) 2003, 2004 Free Software Foundation, Inc.
     Contributed by Sebastian Pop <s.pop@laposte.net>

  This file is part of GCC.
--- 1,5 ----
  /* Chains of recurrences.
!    Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
     Contributed by Sebastian Pop <s.pop@laposte.net>

  This file is part of GCC.
*************** chrec_fold_multiply (tree type,
*** 383,445 ****

  /* Operations.  */

! /* The factorial.  */
!
  static tree
! tree_fold_factorial (tree f)
  {
!   if (tree_int_cst_sgn (f) <= 0)
!     return integer_one_node;
    else
!     return fold
!       (build (MULT_EXPR, integer_type_node, f,
! 	      tree_fold_factorial (fold (build (MINUS_EXPR, integer_type_node,
! 						f, integer_one_node)))));
! }

! /* The binomial coefficient.  */

! static tree
! tree_fold_binomial (tree n,
! 		    tree k)
! {
!   return fold
!     (build (EXACT_DIV_EXPR, integer_type_node, tree_fold_factorial (n),
! 	    fold (build (MULT_EXPR, integer_type_node,
! 			 tree_fold_factorial (k),
! 			 tree_fold_factorial
! 			 (fold (build (MINUS_EXPR, integer_type_node,
! 				       n, k)))))));
  }

  /* Helper function.  Use the Newton's interpolating formula for
     evaluating the value of the evolution function.  */

  static tree
! chrec_evaluate (unsigned var,
! 		tree chrec,
! 		tree n,
! 		tree k)
  {
!   tree type = chrec_type (chrec);
!   tree binomial_n_k = tree_fold_binomial (n, k);
!
!   if (TREE_CODE (chrec) == POLYNOMIAL_CHREC)
      {
!       if (CHREC_VARIABLE (chrec) > var)
! 	return chrec_evaluate (var, CHREC_LEFT (chrec), n, k);
!
!       if (CHREC_VARIABLE (chrec) == var)
! 	return chrec_fold_plus
! 	  (type,
! 	   fold (build (MULT_EXPR, type, binomial_n_k, CHREC_LEFT (chrec))),
! 	   chrec_evaluate (var, CHREC_RIGHT (chrec), n,
! 			   fold (build (PLUS_EXPR, type, k, integer_one_node))));
!
!       return fold (build (MULT_EXPR, type, binomial_n_k, chrec));
      }
!   else
!     return fold (build (MULT_EXPR, type, binomial_n_k, chrec));
  }

  /* Evaluates "CHREC (X)" when the varying variable is VAR.
--- 383,493 ----

  /* Operations.  */

! /* Evaluate the binomial coefficient.  Return NULL_TREE if the intermediate
!    calculation overflows, otherwise return C(n,k) with type TYPE.  */
!
  static tree
! tree_fold_binomial (tree type, tree n, unsigned int k)
  {
!   unsigned HOST_WIDE_INT lidx, lnum, ldenom, lres, ldum;
!   HOST_WIDE_INT hidx, hnum, hdenom, hres, hdum;
!   unsigned int i;
!   tree res;
!
!   /* Handle the most frequent cases.  */
!   if (k == 0)
!     return build_int_cst (type, 1);
!   if (k == 1)
!     return fold_convert (type, n);
!
!   /* Check that k <= n.  */
!   if (TREE_INT_CST_HIGH (n) == 0
!       && TREE_INT_CST_LOW (n) < k)
!     return NULL_TREE;
!
!   /* Numerator = n.  */
!   lnum = TREE_INT_CST_LOW (n);
!   hnum = TREE_INT_CST_HIGH (n);
!
!   /* Denominator = 2.  */
!   ldenom = 2;
!   hdenom = 0;
!
!   /* Index = Numerator-1.  */
!   if (lnum == 0)
!     {
!       hidx = hnum - 1;
!       lidx = ~ (unsigned HOST_WIDE_INT) 0;
!     }
    else
!     {
!       hidx = hnum;
!       lidx = lnum - 1;
!     }

!   /* Numerator = Numerator*Index = n*(n-1).  */
!   if (mul_double (lnum, hnum, lidx, hidx, &lnum, &hnum))
!     return NULL_TREE;

!   for (i = 3; i <= k; i++)
!     {
!       /* Index--.  */
!       if (lidx == 0)
! 	{
! 	  hidx--;
! 	  lidx = ~ (unsigned HOST_WIDE_INT) 0;
! 	}
!       else
!         lidx--;
!
!       /* Numerator *= Index.  */
!       if (mul_double (lnum, hnum, lidx, hidx, &lnum, &hnum))
! 	return NULL_TREE;
!
!       /* Denominator *= i.  */
!       mul_double (ldenom, hdenom, i, 0, &ldenom, &hdenom);
!     }
!
!   /* Result = Numerator / Denominator.  */
!   div_and_round_double (EXACT_DIV_EXPR, 1, lnum, hnum, ldenom, hdenom,
! 			&lres, &hres, &ldum, &hdum);
!
!   res = build_int_cst_wide (type, lres, hres);
!   return int_fits_type_p (res, type) ? res : NULL_TREE;
  }

  /* Helper function.  Use the Newton's interpolating formula for
     evaluating the value of the evolution function.  */

  static tree
! chrec_evaluate (unsigned var, tree chrec, tree n, unsigned int k)
  {
!   tree arg0, arg1, binomial_n_k;
!   tree type = TREE_TYPE (chrec);
!
!   while (TREE_CODE (chrec) == POLYNOMIAL_CHREC
! 	 && CHREC_VARIABLE (chrec) > var)
!     chrec = CHREC_LEFT (chrec);
!
!   if (TREE_CODE (chrec) == POLYNOMIAL_CHREC
!       && CHREC_VARIABLE (chrec) == var)
      {
!       arg0 = chrec_evaluate (var, CHREC_RIGHT (chrec), n, k + 1);
!       if (arg0 == chrec_dont_know)
! 	return chrec_dont_know;
!       binomial_n_k = tree_fold_binomial (type, n, k);
!       if (!binomial_n_k)
! 	return chrec_dont_know;
!       arg1 = fold (build2 (MULT_EXPR, type,
! 			   CHREC_LEFT (chrec), binomial_n_k));
!       return chrec_fold_plus (type, arg0, arg1);
      }
!
!   binomial_n_k = tree_fold_binomial (type, n, k);
!   if (!binomial_n_k)
!     return chrec_dont_know;
!
!   return fold (build2 (MULT_EXPR, type, chrec, binomial_n_k));
  }

  /* Evaluates "CHREC (X)" when the varying variable is VAR.
*************** chrec_apply (unsigned var,
*** 493,499 ****
    else if (TREE_CODE (x) == INTEGER_CST
  	   && tree_int_cst_sgn (x) == 1)
      /* testsuite/.../ssa-chrec-38.c.  */
!     res = chrec_evaluate (var, chrec, x, integer_zero_node);

    else
      res = chrec_dont_know;
--- 541,547 ----
    else if (TREE_CODE (x) == INTEGER_CST
  	   && tree_int_cst_sgn (x) == 1)
      /* testsuite/.../ssa-chrec-38.c.  */
!     res = chrec_evaluate (var, chrec, x, 0);

    else
      res = chrec_dont_know;


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]