This is the mail archive of the
`gcc-patches@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Roger Sayle <roger at eyesopen dot com>*To*: gcc-patches at gcc dot gnu dot org*Cc*: Sebastian Pop <pop at cri dot ensmp dot fr>*Date*: Sun, 27 Feb 2005 08:21:51 -0700 (MST)*Subject*: [PATCH] PR tree-optimization/20216: Rewrite tree_fold_binomial

The following patch resolves PR tree-optimization/20216 which is a compile-time and memory hog regression in 4.0 and 4.1. The problem is that the new chains of recurrences functionality uses an inefficient and inappropriate method to calculate the Binomial Coefficients required in its analysis. Consider the following simple test program from the bugzilla PR. void foo(unsigned int *dst, unsigned int *src) { int i, j; for (i = 0; i < 4; i++) for (j = 0; j < 1600000; j++) *dst++ = src[j]; } During the analysis of whether these loops can be (profittably) interchanged, the chrec analyser requires the evaluation of the binomial coefficient binom(1599999,0). The first problem is that tree-chrec.c contains a very literal implementation of this function, and implements binom(n,k) using the formula n!/((n-k)!k!), so in this case it evaluates this via tree_fold_factorial(1599999)/tree_fold_factorial(1599999). The second problem is the current implementation of tree_fold_factorial is implemented literally as a recursive function, that at each of the 1.6 million levels of recursion uses fold (build2 (...)) to perform the compile-time evaluation of arithmetic operations with constant arguments. The net effect of this strategy is that for this simple two line testcase above, GCC allocates over 600 Megabytes of garbage collected memory and similarly impressive amount of stack and requires 8.23 seconds of user time. The complete rewrite of tree_fold_binomial below provides a 800x speed-up, compiling this testcase in 0.01 seconds (total compilation time), and no longer performs memory allocations in tree_fold_binomial. The first realization is that in chrec_evlauate we're interested in binomial co-efficients with small values of k, often with values of k never larger than three or four. Rather than use a tree to hold this value, an unsigned int is more appropriate. The most commonly evaluated value of k in these calls is zero, and for all binom(n,0) = 1, the next most common value of k is one, and for all binom(n,1) = n. For the remaining cases, the code below uses an optimized iterative algorithm to calculate n!/(n-k)!/k!, by performing the equivalent of prod(i=1 to k) { n-i+1 } / prod(i=1 to k) { i } which is prod(i=1 to k) { n-i+1 } / factorial(k) i.e. for k=2, we calculate just (n*(n-1)) / 2, and for k=3, we calculate (n*(n-1)*(n-2)) / 6. For all of these calcuations, instead of pointlessly constructing trees to pass to fold, we call mul_double and div_and_round_double directly, to avoid any iterative memory allocation. The existing checks already confirm that n is known to be positive. The previous code always performed arithmetic in integer_type_node, we now calculate intermediates at 2*precision(HOST_WIDE_INT) and just check that the final result fits the appropriate type (much like GCC's real.c software floating point emulation). Although there are alternative implementations that can further reduce overflow of the numerator, by dividing during each iteration, I decided that this approach is already much more numerically stable than the previous implementation, and that better support for the extremely rare larger values of k, typically isn't worth the additional effort to avoid slowing down the common cases. One additional/obvious enhancement below, that is a change of functionality from the original, is that by using mul_double we can now detect overflow for ourselves and stop early. The strategy is to return NULL_TREE from tree_fold_binomial if our calculation ever overflows an intermediate (always the numerator), and the function chrec_evaluate detects this condition and returns chrec_dont_know. The following patch has been tested on i686-pc-linux-gnu with a full "make bootstrap", all default languages, and regression tested with a top-level "make -k check" with no new failures. Ok for mainline? Seb, could you also confirm you're happy with this? 2005-02-27 Roger Sayle <roger@eyesopen.com> PR tree-optimization/20216 * tree-chrec.c (tree_fold_factorial): Delete. (tree_fold_binomial): Change argument list to take a return type and change the type of K to unsigned int. Rewrite to avoid explicit evaluation of factorials, and (recursively) calling fold to perform compile-time arithmetic. Return NULL on (internal) overflow. (chrec_evaluate): Change type of K to an unsigned int. Avoid calling tree_fold_binomial unnecessarily. Return chrec_dont_know if any intermediate calculation overflows. (chrec_apply): Update call to chrec_evaluate. Index: tree-chrec.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-chrec.c,v retrieving revision 2.12 diff -c -3 -p -r2.12 tree-chrec.c *** tree-chrec.c 9 Dec 2004 16:17:00 -0000 2.12 --- tree-chrec.c 27 Feb 2005 04:20:57 -0000 *************** *** 1,5 **** /* Chains of recurrences. ! Copyright (C) 2003, 2004 Free Software Foundation, Inc. Contributed by Sebastian Pop <s.pop@laposte.net> This file is part of GCC. --- 1,5 ---- /* Chains of recurrences. ! Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc. Contributed by Sebastian Pop <s.pop@laposte.net> This file is part of GCC. *************** chrec_fold_multiply (tree type, *** 383,445 **** /* Operations. */ ! /* The factorial. */ ! static tree ! tree_fold_factorial (tree f) { ! if (tree_int_cst_sgn (f) <= 0) ! return integer_one_node; else ! return fold ! (build (MULT_EXPR, integer_type_node, f, ! tree_fold_factorial (fold (build (MINUS_EXPR, integer_type_node, ! f, integer_one_node))))); ! } ! /* The binomial coefficient. */ ! static tree ! tree_fold_binomial (tree n, ! tree k) ! { ! return fold ! (build (EXACT_DIV_EXPR, integer_type_node, tree_fold_factorial (n), ! fold (build (MULT_EXPR, integer_type_node, ! tree_fold_factorial (k), ! tree_fold_factorial ! (fold (build (MINUS_EXPR, integer_type_node, ! n, k))))))); } /* Helper function. Use the Newton's interpolating formula for evaluating the value of the evolution function. */ static tree ! chrec_evaluate (unsigned var, ! tree chrec, ! tree n, ! tree k) { ! tree type = chrec_type (chrec); ! tree binomial_n_k = tree_fold_binomial (n, k); ! ! if (TREE_CODE (chrec) == POLYNOMIAL_CHREC) { ! if (CHREC_VARIABLE (chrec) > var) ! return chrec_evaluate (var, CHREC_LEFT (chrec), n, k); ! ! if (CHREC_VARIABLE (chrec) == var) ! return chrec_fold_plus ! (type, ! fold (build (MULT_EXPR, type, binomial_n_k, CHREC_LEFT (chrec))), ! chrec_evaluate (var, CHREC_RIGHT (chrec), n, ! fold (build (PLUS_EXPR, type, k, integer_one_node)))); ! ! return fold (build (MULT_EXPR, type, binomial_n_k, chrec)); } ! else ! return fold (build (MULT_EXPR, type, binomial_n_k, chrec)); } /* Evaluates "CHREC (X)" when the varying variable is VAR. --- 383,493 ---- /* Operations. */ ! /* Evaluate the binomial coefficient. Return NULL_TREE if the intermediate ! calculation overflows, otherwise return C(n,k) with type TYPE. */ ! static tree ! tree_fold_binomial (tree type, tree n, unsigned int k) { ! unsigned HOST_WIDE_INT lidx, lnum, ldenom, lres, ldum; ! HOST_WIDE_INT hidx, hnum, hdenom, hres, hdum; ! unsigned int i; ! tree res; ! ! /* Handle the most frequent cases. */ ! if (k == 0) ! return build_int_cst (type, 1); ! if (k == 1) ! return fold_convert (type, n); ! ! /* Check that k <= n. */ ! if (TREE_INT_CST_HIGH (n) == 0 ! && TREE_INT_CST_LOW (n) < k) ! return NULL_TREE; ! ! /* Numerator = n. */ ! lnum = TREE_INT_CST_LOW (n); ! hnum = TREE_INT_CST_HIGH (n); ! ! /* Denominator = 2. */ ! ldenom = 2; ! hdenom = 0; ! ! /* Index = Numerator-1. */ ! if (lnum == 0) ! { ! hidx = hnum - 1; ! lidx = ~ (unsigned HOST_WIDE_INT) 0; ! } else ! { ! hidx = hnum; ! lidx = lnum - 1; ! } ! /* Numerator = Numerator*Index = n*(n-1). */ ! if (mul_double (lnum, hnum, lidx, hidx, &lnum, &hnum)) ! return NULL_TREE; ! for (i = 3; i <= k; i++) ! { ! /* Index--. */ ! if (lidx == 0) ! { ! hidx--; ! lidx = ~ (unsigned HOST_WIDE_INT) 0; ! } ! else ! lidx--; ! ! /* Numerator *= Index. */ ! if (mul_double (lnum, hnum, lidx, hidx, &lnum, &hnum)) ! return NULL_TREE; ! ! /* Denominator *= i. */ ! mul_double (ldenom, hdenom, i, 0, &ldenom, &hdenom); ! } ! ! /* Result = Numerator / Denominator. */ ! div_and_round_double (EXACT_DIV_EXPR, 1, lnum, hnum, ldenom, hdenom, ! &lres, &hres, &ldum, &hdum); ! ! res = build_int_cst_wide (type, lres, hres); ! return int_fits_type_p (res, type) ? res : NULL_TREE; } /* Helper function. Use the Newton's interpolating formula for evaluating the value of the evolution function. */ static tree ! chrec_evaluate (unsigned var, tree chrec, tree n, unsigned int k) { ! tree arg0, arg1, binomial_n_k; ! tree type = TREE_TYPE (chrec); ! ! while (TREE_CODE (chrec) == POLYNOMIAL_CHREC ! && CHREC_VARIABLE (chrec) > var) ! chrec = CHREC_LEFT (chrec); ! ! if (TREE_CODE (chrec) == POLYNOMIAL_CHREC ! && CHREC_VARIABLE (chrec) == var) { ! arg0 = chrec_evaluate (var, CHREC_RIGHT (chrec), n, k + 1); ! if (arg0 == chrec_dont_know) ! return chrec_dont_know; ! binomial_n_k = tree_fold_binomial (type, n, k); ! if (!binomial_n_k) ! return chrec_dont_know; ! arg1 = fold (build2 (MULT_EXPR, type, ! CHREC_LEFT (chrec), binomial_n_k)); ! return chrec_fold_plus (type, arg0, arg1); } ! ! binomial_n_k = tree_fold_binomial (type, n, k); ! if (!binomial_n_k) ! return chrec_dont_know; ! ! return fold (build2 (MULT_EXPR, type, chrec, binomial_n_k)); } /* Evaluates "CHREC (X)" when the varying variable is VAR. *************** chrec_apply (unsigned var, *** 493,499 **** else if (TREE_CODE (x) == INTEGER_CST && tree_int_cst_sgn (x) == 1) /* testsuite/.../ssa-chrec-38.c. */ ! res = chrec_evaluate (var, chrec, x, integer_zero_node); else res = chrec_dont_know; --- 541,547 ---- else if (TREE_CODE (x) == INTEGER_CST && tree_int_cst_sgn (x) == 1) /* testsuite/.../ssa-chrec-38.c. */ ! res = chrec_evaluate (var, chrec, x, 0); else res = chrec_dont_know; Roger -- Roger Sayle, E-mail: roger@eyesopen.com OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833

**Follow-Ups**:**Re: [PATCH] PR tree-optimization/20216: Rewrite tree_fold_binomial***From:*Sebastian Pop

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |