This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR18589


On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote:
>> On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> >
>> > Unfortunately this seems to be necessary if I name the two passes
>> > "reassoc1" and "reassoc2". ?If I try to name both of them "reassoc" I
>> > get failures in other tests like gfortran.dg/reassoc_4, where
>> > -fdump-tree-reassoc1 doesn't work. ?Unless I'm missing something
>> > obvious, I think I need to keep that change.
>>
>> Hm, naming them "reassoc1" and "reassoc2" is a hack. ?Naming both
>> "reassoc" will not trigger re-naming them to reassoc1 and reassoc2
>> I think. ?How ugly. ?Especially that -fdump-tree-reassoc will no longer
>> work. ?Maybe instead of using two pass structs resort to using
>> the existing hack with using first_pass_instance and TODO_mark_first_instance.
>
> OK, that seems to be the best among evils. ?Using the
> first_pass_instance hack, the patch is transformed as below.
> Regstrapped on powerpc64-linux, no additional failures. ?OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Bill
>
>
> gcc:
>
> 2012-04-05 ?Bill Schmidt ?<wschmidt@linux.vnet.ibm.com>
>
> ? ? ? ?PR tree-optimization/18589
> ? ? ? ?* tree-ssa-reassoc.c (reassociate_stats): Add two fields.
> ? ? ? ?(operand_entry): Add count field.
> ? ? ? ?(add_repeat_to_ops_vec): New function.
> ? ? ? ?(completely_remove_stmt): Likewise.
> ? ? ? ?(remove_def_if_absorbed_call): Likewise.
> ? ? ? ?(remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
> ? ? ? ?(acceptable_pow_call): New function.
> ? ? ? ?(linearize_expr_tree): Look for builtin pow/powi calls and add operand
> ? ? ? ?entries with repeat counts when found.
> ? ? ? ?(repeat_factor_d): New struct and associated typedefs.
> ? ? ? ?(repeat_factor_vec): New static vector variable.
> ? ? ? ?(compare_repeat_factors): New function.
> ? ? ? ?(get_reassoc_pow_ssa_name): Likewise.
> ? ? ? ?(attempt_builtin_powi): Likewise.
> ? ? ? ?(reassociate_bb): Call attempt_builtin_powi.
> ? ? ? ?(fini_reassoc): Two new calls to statistics_counter_event.
>
> gcc/testsuite:
>
> 2012-04-05 ?Bill Schmidt ?<wschmidt@linux.vnet.ibm.com>
>
> ? ? ? ?PR tree-optimization/18589
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-1.c: New test.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-2.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-3.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-4.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-5.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-6.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-7.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-8.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-9.c: Likewise.
> ? ? ? ?* gcc.dg/tree-ssa/pr18589-10.c: Likewise.
>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y, double z, double u)
> +{
> + ?return x * x * y * y * y * z * z * z * z * u;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y, double z, double u)
> +{
> + ?return x * x * x * y * y * y * z * z * z * z * u * u * u * u;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y)
> +{
> + ?return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 4 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +float baz (float x, float y)
> +{
> + ?return x * x * x * x * y * y * y * y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +long double baz (long double x, long double y)
> +{
> + ?return x * x * x * x * y * y * y * y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c ? (revision 0)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y, double z)
> +{
> + ?return (__builtin_pow (x, 3.0) * __builtin_pow (y, 2.0)
> + ? ? ? ? * __builtin_pow (z, 5.0));
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c ?(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c ?(revision 0)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y, double z)
> +{
> + ?return (__builtin_pow (x, 4.0) * __builtin_pow (y, 2.0)
> + ? ? ? ? * __builtin_pow (z, 4.0));
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y)
> +{
> + ?return x * x * x * x * y * y * y * y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y)
> +{
> + ?return x * y * y * x * y * x * x * y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c ? (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c ? (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */
> +
> +double baz (double x, double y, double z)
> +{
> + ?return x * x * y * y * y * z * z * z * z;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/tree-ssa-reassoc.c
> ===================================================================
> --- gcc/tree-ssa-reassoc.c ? ? ?(revision 186108)
> +++ gcc/tree-ssa-reassoc.c ? ? ?(working copy)
> @@ -61,6 +61,10 @@ along with GCC; see the file COPYING3. ?If not see
> ? ? 3. Optimization of the operand lists, eliminating things like a +
> ? ? -a, a & a, etc.
>
> + ? ?3a. Combine repeated factors with the same occurrence counts
> + ? ?into a __builtin_powi call that will later be optimized into
> + ? ?an optimal number of multiplies.
> +
> ? ? 4. Rewrite the expression trees we linearized and optimized so
> ? ? they are in proper rank order.
>
> @@ -169,6 +173,8 @@ static struct
> ? int constants_eliminated;
> ? int ops_eliminated;
> ? int rewritten;
> + ?int pows_encountered;
> + ?int pows_created;
> ?} reassociate_stats;
>
> ?/* Operator, rank pair. ?*/
> @@ -177,6 +183,7 @@ typedef struct operand_entry
> ? unsigned int rank;
> ? int id;
> ? tree op;
> + ?unsigned int count;
> ?} *operand_entry_t;
>
> ?static alloc_pool operand_entry_pool;
> @@ -515,9 +522,28 @@ add_to_ops_vec (VEC(operand_entry_t, heap) **ops,
> ? oe->op = op;
> ? oe->rank = get_rank (op);
> ? oe->id = next_operand_entry_id++;
> + ?oe->count = 1;
> ? VEC_safe_push (operand_entry_t, heap, *ops, oe);
> ?}
>
> +/* Add an operand entry to *OPS for the tree operand OP with repeat
> + ? count REPEAT. ?*/
> +
> +static void
> +add_repeat_to_ops_vec (VEC(operand_entry_t, heap) **ops, tree op,
> + ? ? ? ? ? ? ? ? ? ? ?HOST_WIDE_INT repeat)
> +{
> + ?operand_entry_t oe = (operand_entry_t) pool_alloc (operand_entry_pool);
> +
> + ?oe->op = op;
> + ?oe->rank = get_rank (op);
> + ?oe->id = next_operand_entry_id++;
> + ?oe->count = repeat;
> + ?VEC_safe_push (operand_entry_t, heap, *ops, oe);
> +
> + ?reassociate_stats.pows_encountered++;
> +}
> +
> ?/* Return true if STMT is reassociable operation containing a binary
> ? ?operation with tree code CODE, and is inside LOOP. ?*/
>
> @@ -2049,6 +2075,32 @@ is_phi_for_stmt (gimple stmt, tree operand)
> ? return false;
> ?}
>
> +/* Remove STMT, unlink its virtual defs, and release its SSA defs. ?*/
> +
> +static inline void
> +completely_remove_stmt (gimple stmt)
> +{
> + ?gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> + ?gsi_remove (&gsi, true);
> + ?unlink_stmt_vdef (stmt);
> + ?release_defs (stmt);
> +}
> +
> +/* If OP is defined by a builtin call that has been absorbed by
> + ? reassociation, remove its defining statement completely. ?*/
> +
> +static inline void
> +remove_def_if_absorbed_call (tree op)
> +{
> + ?gimple stmt;
> +
> + ?if (TREE_CODE (op) == SSA_NAME
> + ? ? ?&& has_zero_uses (op)
> + ? ? ?&& is_gimple_call ((stmt = SSA_NAME_DEF_STMT (op)))
> + ? ? ?&& gimple_visited_p (stmt))
> + ? ?completely_remove_stmt (stmt);
> +}
> +
> ?/* Remove def stmt of VAR if VAR has zero uses and recurse
> ? ?on rhs1 operand if so. ?*/
>
> @@ -2057,19 +2109,31 @@ remove_visited_stmt_chain (tree var)
> ?{
> ? gimple stmt;
> ? gimple_stmt_iterator gsi;
> + ?tree var2;
>
> ? while (1)
> ? ? {
> ? ? ? if (TREE_CODE (var) != SSA_NAME || !has_zero_uses (var))
> ? ? ? ?return;
> ? ? ? stmt = SSA_NAME_DEF_STMT (var);
> - ? ? ?if (!is_gimple_assign (stmt)
> - ? ? ? ? || !gimple_visited_p (stmt))
> + ? ? ?if (is_gimple_assign (stmt) && gimple_visited_p (stmt))
> + ? ? ? {
> + ? ? ? ? var = gimple_assign_rhs1 (stmt);
> + ? ? ? ? var2 = gimple_assign_rhs2 (stmt);
> + ? ? ? ? gsi = gsi_for_stmt (stmt);
> + ? ? ? ? gsi_remove (&gsi, true);
> + ? ? ? ? release_defs (stmt);
> + ? ? ? ? /* A multiply whose operands are both fed by builtin pow/powi
> + ? ? ? ? ? ?calls must check whether to remove rhs2 as well. ?*/
> + ? ? ? ? remove_def_if_absorbed_call (var2);
> + ? ? ? }
> + ? ? ?else if (is_gimple_call (stmt) && gimple_visited_p (stmt))
> + ? ? ? {
> + ? ? ? ? completely_remove_stmt (stmt);
> + ? ? ? ? return;
> + ? ? ? }
> + ? ? ?else
> ? ? ? ?return;
> - ? ? ?var = gimple_assign_rhs1 (stmt);
> - ? ? ?gsi = gsi_for_stmt (stmt);
> - ? ? ?gsi_remove (&gsi, true);
> - ? ? ?release_defs (stmt);
> ? ? }
> ?}
>
> @@ -2564,6 +2628,75 @@ break_up_subtract (gimple stmt, gimple_stmt_iterat
> ? update_stmt (stmt);
> ?}
>
> +/* Determine whether STMT is a builtin call that raises an SSA name
> + ? to an integer power and has only one use. ?If so, and this is early
> + ? reassociation and unsafe math optimizations are permitted, place
> + ? the SSA name in *BASE and the exponent in *EXPONENT, and return TRUE.
> + ? If any of these conditions does not hold, return FALSE. ?*/
> +
> +static bool
> +acceptable_pow_call (gimple stmt, tree *base, HOST_WIDE_INT *exponent)
> +{
> + ?tree fndecl, arg1;
> + ?REAL_VALUE_TYPE c, cint;
> +
> + ?if (!first_pass_instance
> + ? ? ?|| !flag_unsafe_math_optimizations
> + ? ? ?|| !is_gimple_call (stmt)
> + ? ? ?|| !has_single_use (gimple_call_lhs (stmt)))
> + ? ?return false;
> +
> + ?fndecl = gimple_call_fndecl (stmt);
> +
> + ?if (!fndecl
> + ? ? ?|| DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
> + ? ?return false;
> +
> + ?switch (DECL_FUNCTION_CODE (fndecl))
> + ? ?{
> + ? ?CASE_FLT_FN (BUILT_IN_POW):
> + ? ? ?*base = gimple_call_arg (stmt, 0);
> + ? ? ?arg1 = gimple_call_arg (stmt, 1);
> +
> + ? ? ?if (TREE_CODE (arg1) != REAL_CST)
> + ? ? ? return false;
> +
> + ? ? ?c = TREE_REAL_CST (arg1);
> +
> + ? ? ?if (REAL_EXP (&c) > HOST_BITS_PER_WIDE_INT)
> + ? ? ? return false;
> +
> + ? ? ?*exponent = real_to_integer (&c);
> + ? ? ?real_from_integer (&cint, VOIDmode, *exponent,
> + ? ? ? ? ? ? ? ? ? ? ? ?*exponent < 0 ? -1 : 0, 0);
> + ? ? ?if (!real_identical (&c, &cint))
> + ? ? ? return false;
> +
> + ? ? ?break;
> +
> + ? ?CASE_FLT_FN (BUILT_IN_POWI):
> + ? ? ?*base = gimple_call_arg (stmt, 0);
> + ? ? ?arg1 = gimple_call_arg (stmt, 1);
> +
> + ? ? ?if (!host_integerp (arg1, 0))
> + ? ? ? return false;
> +
> + ? ? ?*exponent = TREE_INT_CST_LOW (arg1);
> + ? ? ?break;
> +
> + ? ?default:
> + ? ? ?return false;
> + ? ?}
> +
> + ?/* Expanding negative exponents is generally unproductive, so we don't
> + ? ? complicate matters with those. ?Exponents of zero and one should
> + ? ? have been handled by expression folding. ?*/
> + ?if (*exponent < 2 || TREE_CODE (*base) != SSA_NAME)
> + ? ?return false;
> +
> + ?return true;
> +}
> +
> ?/* Recursively linearize a binary expression that is the RHS of STMT.
> ? ?Place the operands of the expression tree in the vector named OPS. ?*/
>
> @@ -2573,11 +2706,13 @@ linearize_expr_tree (VEC(operand_entry_t, heap) **
> ?{
> ? tree binlhs = gimple_assign_rhs1 (stmt);
> ? tree binrhs = gimple_assign_rhs2 (stmt);
> - ?gimple binlhsdef, binrhsdef;
> + ?gimple binlhsdef = NULL, binrhsdef = NULL;
> ? bool binlhsisreassoc = false;
> ? bool binrhsisreassoc = false;
> ? enum tree_code rhscode = gimple_assign_rhs_code (stmt);
> ? struct loop *loop = loop_containing_stmt (stmt);
> + ?tree base = NULL_TREE;
> + ?HOST_WIDE_INT exponent = 0;
>
> ? if (set_visited)
> ? ? gimple_set_visited (stmt, true);
> @@ -2615,8 +2750,26 @@ linearize_expr_tree (VEC(operand_entry_t, heap) **
>
> ? ? ? if (!binrhsisreassoc)
> ? ? ? ?{
> - ? ? ? ? add_to_ops_vec (ops, binrhs);
> - ? ? ? ? add_to_ops_vec (ops, binlhs);
> + ? ? ? ? if (rhscode == MULT_EXPR
> + ? ? ? ? ? ? && TREE_CODE (binrhs) == SSA_NAME
> + ? ? ? ? ? ? && acceptable_pow_call (binrhsdef, &base, &exponent))
> + ? ? ? ? ? {
> + ? ? ? ? ? ? add_repeat_to_ops_vec (ops, base, exponent);
> + ? ? ? ? ? ? gimple_set_visited (binrhsdef, true);
> + ? ? ? ? ? }
> + ? ? ? ? else
> + ? ? ? ? ? add_to_ops_vec (ops, binrhs);
> +
> + ? ? ? ? if (rhscode == MULT_EXPR
> + ? ? ? ? ? ? && TREE_CODE (binlhs) == SSA_NAME
> + ? ? ? ? ? ? && acceptable_pow_call (binlhsdef, &base, &exponent))
> + ? ? ? ? ? {
> + ? ? ? ? ? ? add_repeat_to_ops_vec (ops, base, exponent);
> + ? ? ? ? ? ? gimple_set_visited (binlhsdef, true);
> + ? ? ? ? ? }
> + ? ? ? ? else
> + ? ? ? ? ? add_to_ops_vec (ops, binlhs);
> +
> ? ? ? ? ?return;
> ? ? ? ?}
>
> @@ -2655,7 +2808,16 @@ linearize_expr_tree (VEC(operand_entry_t, heap) **
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rhscode, loop));
> ? linearize_expr_tree (ops, SSA_NAME_DEF_STMT (binlhs),
> ? ? ? ? ? ? ? ? ? ? ? is_associative, set_visited);
> - ?add_to_ops_vec (ops, binrhs);
> +
> + ?if (rhscode == MULT_EXPR
> + ? ? ?&& TREE_CODE (binrhs) == SSA_NAME
> + ? ? ?&& acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), &base, &exponent))
> + ? ?{
> + ? ? ?add_repeat_to_ops_vec (ops, base, exponent);
> + ? ? ?gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true);
> + ? ?}
> + ?else
> + ? ?add_to_ops_vec (ops, binrhs);
> ?}
>
> ?/* Repropagate the negates back into subtracts, since no other pass
> @@ -2815,6 +2977,347 @@ break_up_subtract_bb (basic_block bb)
> ? ? break_up_subtract_bb (son);
> ?}
>
> +/* Used for repeated factor analysis. ?*/
> +struct repeat_factor_d
> +{
> + ?/* An SSA name that occurs in a multiply chain. ?*/
> + ?tree factor;
> +
> + ?/* Cached rank of the factor. ?*/
> + ?unsigned rank;
> +
> + ?/* Number of occurrences of the factor in the chain. ?*/
> + ?HOST_WIDE_INT count;
> +
> + ?/* An SSA name representing the product of this factor and
> + ? ? all factors appearing later in the repeated factor vector. ?*/
> + ?tree repr;
> +};
> +
> +typedef struct repeat_factor_d repeat_factor, *repeat_factor_t;
> +typedef const struct repeat_factor_d *const_repeat_factor_t;
> +
> +DEF_VEC_O (repeat_factor);
> +DEF_VEC_ALLOC_O (repeat_factor, heap);
> +
> +static VEC (repeat_factor, heap) *repeat_factor_vec;
> +
> +/* Used for sorting the repeat factor vector. ?Sort primarily by
> + ? ascending occurrence count, secondarily by descending rank. ?*/
> +
> +static int
> +compare_repeat_factors (const void *x1, const void *x2)
> +{
> + ?const_repeat_factor_t rf1 = (const_repeat_factor_t) x1;
> + ?const_repeat_factor_t rf2 = (const_repeat_factor_t) x2;
> +
> + ?if (rf1->count != rf2->count)
> + ? ?return rf1->count - rf2->count;
> +
> + ?return rf2->rank - rf1->rank;
> +}
> +
> +/* Get a new SSA name for register variable *TARGET of type TYPE.
> + ? If *TARGET is null or incompatible with TYPE, create the variable
> + ? first. ?*/
> +
> +static tree
> +get_reassoc_pow_ssa_name (tree *target, tree type)
> +{
> + ?if (!*target || !types_compatible_p (type, TREE_TYPE (*target)))
> + ? ?{
> + ? ? ?*target = create_tmp_reg (type, "reassocpow");
> + ? ? ?add_referenced_var (*target);
> + ? ?}
> +
> + ?return make_ssa_name (*target, NULL);
> +}
> +
> +/* Look for repeated operands in OPS in the multiply tree rooted at
> + ? STMT. ?Replace them with an optimal sequence of multiplies and powi
> + ? builtin calls, and remove the used operands from OPS. ?Push new
> + ? SSA names onto OPS that represent the introduced multiplies and
> + ? builtin calls. ?*/
> +
> +static void
> +attempt_builtin_powi (gimple stmt, VEC(operand_entry_t, heap) **ops,
> + ? ? ? ? ? ? ? ? ? ? tree *target)
> +{
> + ?unsigned i, j, vec_len;
> + ?int ii;
> + ?operand_entry_t oe;
> + ?repeat_factor_t rf1, rf2;
> + ?repeat_factor rfnew;
> + ?tree target_ssa, iter_result;
> + ?tree type = TREE_TYPE (gimple_get_lhs (stmt));
> + ?tree powi_fndecl = mathfn_built_in (type, BUILT_IN_POWI);
> + ?gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> + ?gimple mul_stmt, pow_stmt;
> +
> + ?/* Nothing to do if BUILT_IN_POWI doesn't exist for this type and
> + ? ? target. ?*/
> + ?if (!powi_fndecl)
> + ? ?return;
> +
> + ?/* Allocate the repeated factor vector. ?*/
> + ?repeat_factor_vec = VEC_alloc (repeat_factor, heap, 10);
> +
> + ?/* Scan the OPS vector for all SSA names in the product and build
> + ? ? up a vector of occurrence counts for each factor. ?*/
> + ?FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe)
> + ? ?{
> + ? ? ?if (TREE_CODE (oe->op) == SSA_NAME)
> + ? ? ? {
> + ? ? ? ? FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1)
> + ? ? ? ? ? {
> + ? ? ? ? ? ? if (rf1->factor == oe->op)
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? rf1->count += oe->count;
> + ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? }
> +
> + ? ? ? ? if (j >= VEC_length (repeat_factor, repeat_factor_vec))
> + ? ? ? ? ? {
> + ? ? ? ? ? ? rfnew.factor = oe->op;
> + ? ? ? ? ? ? rfnew.rank = oe->rank;
> + ? ? ? ? ? ? rfnew.count = oe->count;
> + ? ? ? ? ? ? rfnew.repr = NULL_TREE;
> + ? ? ? ? ? ? VEC_safe_push (repeat_factor, heap, repeat_factor_vec, &rfnew);
> + ? ? ? ? ? }
> + ? ? ? }
> + ? ?}
> +
> + ?/* Sort the repeated factor vector by (a) increasing occurrence count,
> + ? ? and (b) decreasing rank. ?*/
> + ?VEC_qsort (repeat_factor, repeat_factor_vec, compare_repeat_factors);
> +
> + ?/* It is generally best to combine as many base factors as possible
> + ? ? into a product before applying __builtin_powi to the result.
> + ? ? However, the sort order chosen for the repeated factor vector
> + ? ? allows us to cache partial results for the product of the base
> + ? ? factors for subsequent use. ?When we already have a cached partial
> + ? ? result from a previous iteration, it is best to make use of it
> + ? ? before looking for another __builtin_pow opportunity.
> +
> + ? ? As an example, consider x * x * y * y * y * z * z * z * z.
> + ? ? We want to first compose the product x * y * z, raise it to the
> + ? ? second power, then multiply this by y * z, and finally multiply
> + ? ? by z. ?This can be done in 5 multiplies provided we cache y * z
> + ? ? for use in both expressions:
> +
> + ? ? ? ?t1 = y * z
> + ? ? ? t2 = t1 * x
> + ? ? ? t3 = t2 * t2
> + ? ? ? t4 = t1 * t3
> + ? ? ? result = t4 * z
> +
> + ? ? If we instead ignored the cached y * z and first multiplied by
> + ? ? the __builtin_pow opportunity z * z, we would get the inferior:
> +
> + ? ? ? ?t1 = y * z
> + ? ? ? t2 = t1 * x
> + ? ? ? t3 = t2 * t2
> + ? ? ? t4 = z * z
> + ? ? ? t5 = t3 * t4
> + ? ? ? ?result = t5 * y ?*/
> +
> + ?vec_len = VEC_length (repeat_factor, repeat_factor_vec);
> +
> + ?/* Repeatedly look for opportunities to create a builtin_powi call. ?*/
> + ?while (true)
> + ? ?{
> + ? ? ?HOST_WIDE_INT power;
> +
> + ? ? ?/* First look for the largest cached product of factors from
> + ? ? ? ?preceding iterations. ?If found, create a builtin_powi for
> + ? ? ? ?it if the minimum occurrence count for its factors is at
> + ? ? ? ?least 2, or just use this cached product as our next
> + ? ? ? ?multiplicand if the minimum occurrence count is 1. ?*/
> + ? ? ?FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1)
> + ? ? ? {
> + ? ? ? ? if (rf1->repr && rf1->count > 0)
> + ? ? ? ? ? break;
> + ? ? ? }
> +
> + ? ? ?if (j < vec_len)
> + ? ? ? {
> + ? ? ? ? power = rf1->count;
> +
> + ? ? ? ? if (power == 1)
> + ? ? ? ? ? {
> + ? ? ? ? ? ? iter_result = rf1->repr;
> +
> + ? ? ? ? ? ? if (dump_file && (dump_flags & TDF_DETAILS))
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? unsigned elt;
> + ? ? ? ? ? ? ? ? repeat_factor_t rf;
> + ? ? ? ? ? ? ? ? fputs ("Multiplying by cached product ", dump_file);
> + ? ? ? ? ? ? ? ? for (elt = j; elt < vec_len; elt++)
> + ? ? ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? ? ? rf = VEC_index (repeat_factor, repeat_factor_vec, elt);
> + ? ? ? ? ? ? ? ? ? ? print_generic_expr (dump_file, rf->factor, 0);
> + ? ? ? ? ? ? ? ? ? ? if (elt < vec_len - 1)
> + ? ? ? ? ? ? ? ? ? ? ? fputs (" * ", dump_file);
> + ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? fputs ("\n", dump_file);
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? }
> + ? ? ? ? else
> + ? ? ? ? ? {
> + ? ? ? ? ? ? iter_result = get_reassoc_pow_ssa_name (target, type);
> + ? ? ? ? ? ? pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? build_int_cst (integer_type_node,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?power));
> + ? ? ? ? ? ? gimple_call_set_lhs (pow_stmt, iter_result);
> + ? ? ? ? ? ? gimple_set_location (pow_stmt, gimple_location (stmt));
> + ? ? ? ? ? ? gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT);
> +
> + ? ? ? ? ? ? if (dump_file && (dump_flags & TDF_DETAILS))
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? unsigned elt;
> + ? ? ? ? ? ? ? ? repeat_factor_t rf;
> + ? ? ? ? ? ? ? ? fputs ("Building __builtin_pow call for cached product (",
> + ? ? ? ? ? ? ? ? ? ? ? ?dump_file);
> + ? ? ? ? ? ? ? ? for (elt = j; elt < vec_len; elt++)
> + ? ? ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? ? ? rf = VEC_index (repeat_factor, repeat_factor_vec, elt);
> + ? ? ? ? ? ? ? ? ? ? print_generic_expr (dump_file, rf->factor, 0);
> + ? ? ? ? ? ? ? ? ? ? if (elt < vec_len - 1)
> + ? ? ? ? ? ? ? ? ? ? ? fputs (" * ", dump_file);
> + ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? fprintf (dump_file, ")^%ld\n", power);
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? }
> + ? ? ? }
> + ? ? ?else
> + ? ? ? {
> + ? ? ? ? /* Otherwise, find the first factor in the repeated factor
> + ? ? ? ? ? ?vector whose occurrence count is at least 2. ?If no such
> + ? ? ? ? ? ?factor exists, there are no builtin_powi opportunities
> + ? ? ? ? ? ?remaining. ?*/
> + ? ? ? ? FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1)
> + ? ? ? ? ? {
> + ? ? ? ? ? ? if (rf1->count >= 2)
> + ? ? ? ? ? ? ? break;
> + ? ? ? ? ? }
> +
> + ? ? ? ? if (j >= vec_len)
> + ? ? ? ? ? break;
> +
> + ? ? ? ? power = rf1->count;
> +
> + ? ? ? ? if (dump_file && (dump_flags & TDF_DETAILS))
> + ? ? ? ? ? {
> + ? ? ? ? ? ? unsigned elt;
> + ? ? ? ? ? ? repeat_factor_t rf;
> + ? ? ? ? ? ? fputs ("Building __builtin_pow call for (", dump_file);
> + ? ? ? ? ? ? for (elt = j; elt < vec_len; elt++)
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? rf = VEC_index (repeat_factor, repeat_factor_vec, elt);
> + ? ? ? ? ? ? ? ? print_generic_expr (dump_file, rf->factor, 0);
> + ? ? ? ? ? ? ? ? if (elt < vec_len - 1)
> + ? ? ? ? ? ? ? ? ? fputs (" * ", dump_file);
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? fprintf (dump_file, ")^%ld\n", power);
> + ? ? ? ? ? }
> +
> + ? ? ? ? reassociate_stats.pows_created++;
> +
> + ? ? ? ? /* Visit each element of the vector in reverse order (so that
> + ? ? ? ? ? ?high-occurrence elements are visited first, and within the
> + ? ? ? ? ? ?same occurrence count, lower-ranked elements are visited
> + ? ? ? ? ? ?first). ?Form a linear product of all elements in this order
> + ? ? ? ? ? ?whose occurrencce count is at least that of element J.
> + ? ? ? ? ? ?Record the SSA name representing the product of each element
> + ? ? ? ? ? ?with all subsequent elements in the vector. ?*/
> + ? ? ? ? if (j == vec_len - 1)
> + ? ? ? ? ? rf1->repr = rf1->factor;
> + ? ? ? ? else
> + ? ? ? ? ? {
> + ? ? ? ? ? ? for (ii = vec_len - 2; ii >= (int)j; ii--)
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? tree op1, op2;
> +
> + ? ? ? ? ? ? ? ? rf1 = VEC_index (repeat_factor, repeat_factor_vec, ii);
> + ? ? ? ? ? ? ? ? rf2 = VEC_index (repeat_factor, repeat_factor_vec, ii + 1);
> +
> + ? ? ? ? ? ? ? ? /* Init the last factor's representative to be itself. ?*/
> + ? ? ? ? ? ? ? ? if (!rf2->repr)
> + ? ? ? ? ? ? ? ? ? rf2->repr = rf2->factor;
> +
> + ? ? ? ? ? ? ? ? op1 = rf1->factor;
> + ? ? ? ? ? ? ? ? op2 = rf2->repr;
> +
> + ? ? ? ? ? ? ? ? target_ssa = get_reassoc_pow_ssa_name (target, type);
> + ? ? ? ? ? ? ? ? mul_stmt = gimple_build_assign_with_ops (MULT_EXPR,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?target_ssa,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?op1, op2);
> + ? ? ? ? ? ? ? ? gimple_set_location (mul_stmt, gimple_location (stmt));
> + ? ? ? ? ? ? ? ? gsi_insert_before (&gsi, mul_stmt, GSI_SAME_STMT);
> + ? ? ? ? ? ? ? ? rf1->repr = target_ssa;
> +
> + ? ? ? ? ? ? ? ? /* Don't reprocess the multiply we just introduced. ?*/
> + ? ? ? ? ? ? ? ? gimple_set_visited (mul_stmt, true);
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? }
> +
> + ? ? ? ? /* Form a call to __builtin_powi for the maximum product
> + ? ? ? ? ? ?just formed, raised to the power obtained earlier. ?*/
> + ? ? ? ? rf1 = VEC_index (repeat_factor, repeat_factor_vec, j);
> + ? ? ? ? iter_result = get_reassoc_pow_ssa_name (target, type);
> + ? ? ? ? pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? build_int_cst (integer_type_node,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?power));
> + ? ? ? ? gimple_call_set_lhs (pow_stmt, iter_result);
> + ? ? ? ? gimple_set_location (pow_stmt, gimple_location (stmt));
> + ? ? ? ? gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT);
> + ? ? ? }
> +
> + ? ? ?/* Append the result of this iteration to the ops vector. ?*/
> + ? ? ?add_to_ops_vec (ops, iter_result);
> +
> + ? ? ?/* Decrement the occurrence count of each element in the product
> + ? ? ? ?by the count found above, and remove this many copies of each
> + ? ? ? ?factor from OPS. ?*/
> + ? ? ?for (i = j; i < vec_len; i++)
> + ? ? ? {
> + ? ? ? ? unsigned k = power;
> + ? ? ? ? unsigned n;
> +
> + ? ? ? ? rf1 = VEC_index (repeat_factor, repeat_factor_vec, i);
> + ? ? ? ? rf1->count -= power;
> +
> + ? ? ? ? FOR_EACH_VEC_ELT_REVERSE (operand_entry_t, *ops, n, oe)
> + ? ? ? ? ? {
> + ? ? ? ? ? ? if (oe->op == rf1->factor)
> + ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? if (oe->count <= k)
> + ? ? ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? ? ? VEC_ordered_remove (operand_entry_t, *ops, n);
> + ? ? ? ? ? ? ? ? ? ? k -= oe->count;
> +
> + ? ? ? ? ? ? ? ? ? ? if (k == 0)
> + ? ? ? ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? else
> + ? ? ? ? ? ? ? ? ? {
> + ? ? ? ? ? ? ? ? ? ? oe->count -= k;
> + ? ? ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? }
> + ? ? ? }
> + ? ?}
> +
> + ?/* At this point all elements in the repeated factor vector have a
> + ? ? remaining occurrence count of 0 or 1, and those with a count of 1
> + ? ? don't have cached representatives. ?Re-sort the ops vector and
> + ? ? clean up. ?*/
> + ?VEC_qsort (operand_entry_t, *ops, sort_by_operand_rank);
> + ?VEC_free (repeat_factor, heap, repeat_factor_vec);
> +}
> +
> ?/* Reassociate expressions in basic block BB and its post-dominator as
> ? ?children. ?*/
>
> @@ -2823,6 +3326,7 @@ reassociate_bb (basic_block bb)
> ?{
> ? gimple_stmt_iterator gsi;
> ? basic_block son;
> + ?tree target = NULL_TREE;
>
> ? for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
> ? ? {
> @@ -2904,6 +3408,11 @@ reassociate_bb (basic_block bb)
> ? ? ? ? ? ? ?if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR)
> ? ? ? ? ? ? ? ?optimize_range_tests (rhs_code, &ops);
>
> + ? ? ? ? ? ? if (first_pass_instance
> + ? ? ? ? ? ? ? ? && rhs_code == MULT_EXPR
> + ? ? ? ? ? ? ? ? && flag_unsafe_math_optimizations)
> + ? ? ? ? ? ? ? attempt_builtin_powi (stmt, &ops, &target);
> +
> ? ? ? ? ? ? ?if (VEC_length (operand_entry_t, ops) == 1)
> ? ? ? ? ? ? ? ?{
> ? ? ? ? ? ? ? ? ?if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -3054,6 +3563,10 @@ fini_reassoc (void)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?reassociate_stats.ops_eliminated);
> ? statistics_counter_event (cfun, "Statements rewritten",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?reassociate_stats.rewritten);
> + ?statistics_counter_event (cfun, "Built-in pow[i] calls encountered",
> + ? ? ? ? ? ? ? ? ? ? ? ? ? reassociate_stats.pows_encountered);
> + ?statistics_counter_event (cfun, "Built-in powi calls created",
> + ? ? ? ? ? ? ? ? ? ? ? ? ? reassociate_stats.pows_created);
>
> ? pointer_map_destroy (operand_rank);
> ? free_alloc_pool (operand_entry_pool);
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]