[PATCH] Straight-line strength reduction, stage 1
Richard Guenther
richard.guenther@gmail.com
Fri Nov 4 13:58:00 GMT 2011
On Sun, Oct 30, 2011 at 5:10 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Greetings,
>
> IVOPTS handles strength reduction of induction variables, but GCC does
> not currently perform strength reduction in straight-line code. This
> has been noted before in PR22586 and PR35308. PR46556 is also a case
> that could be handled with strength reduction. This patch adds a pass
> to perform strength reduction along dominator paths on the easiest
> cases, where replacements are obviously profitable.
>
> My intent is to add subsequent installments to handle more involved
> cases, as described in the code commentary. The cases not yet covered
> will require target-specific cost analysis to determine profitability.
> It is likely that this will leverage some of the cost function
> infrastructure in tree-ssa-loop-ivopts.c.
>
> I've bootstrapped and tested the code on powerpc64-linux with no
> regressions. I've also run SPEC CPU2006 to compare results. 32-bit
> PowerPC gains about 1% on integer code. Other results are in the noise
> range. 64-bit integer code would have also shown gains, except for one
> bad result (400.perlbench degraded 4%). Initial analysis shows that
> very different control flow is created for regmatch with the patch than
> without. I will be investigating further this week, but presumably some
> touchy threshold was no longer met for a downstream optimization --
> probably an indirect effect.
>
> My hope is to commit this first stage as part of 4.7, with the remainder
> to follow in 4.8.
>
> Thanks for your consideration,
I've had a quick look for now and noted two things
1) you try to handle casting already - for the specific patterns, it seems
the constraints are the same as for detecting when using a widening
multiplication/add is possible? I think we should have some common
code for the legality checks.
2) you do not handle POINTER_PLUS_EXPR - which looks most
interesting for the pure pointer arithmetic case. (and you do not
treat PLUS_EXPR as commutative, thus a + b*c you handle but
not b*c + a(?), for a*b + c*d we'd have two candidates)
I'm not sure I follow the dominance checks - usually instead of
+ || !dominated_by_p (CDI_DOMINATORS,
+ gimple_bb (c->cand_stmt),
+ gimple_bb (use_stmt)))
you'd special case gimple_bb (c->cand_stmt) == gimple_bb (use_stmt)
and have stmt uids to resolve dominance inside a basic block.
For the overall structure I see you are matching the MULT_EXPRs
and are looking at all immediate uses of it for candidates with +-.
I'd have expected you walk the BBs in dominator order, visiting
statements and looking for the +- instead. That would avoid any
issue about dominance (because you walk BBs properly) and
wouldn't require immediate use walks either (I couldn't convince
myself 100% you are not visiting stmts multiple times that way...)
But maybe I missed sth in my quick look.
Most interesting (and one reason why strength reduction from PRE
didn't work out) is the from a A + B*C stmt lookup all candidates
for computing the value in a more efficient way.
You do not seem to transform stmts on-the-fly, so for
a1 = c + d;
a2 = c + 2*d;
a3 = c + 3*d;
are you able to generate
a1 = c + d;
a2 = a1 + d;
a3 = a2 + d;
? On-the-fly operation would also handle this if the candidate info
for a2 is kept as c + 2*d. Though it's probably worth lookign at
a1 = c + d;
a2 = a1 + d;
a3 = c + 3*d;
and see if that still figures out that a3 = a2 + d (thus, do you,
when you find the candidate a1 + 1 * d, fold in candidate information
for a1? thus, try to reduce it to an ultimate base?)
Thanks,
Richard.
> Bill
>
>
> 2011-10-30 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> gcc:
>
> * tree-pass.h (pass_strength_reduction): New declaration.
> * timevar.def (TV_TREE_SLSR): New time-var.
> * tree-ssa-strength-reduction.c: New file.
> * Makefile.in: New dependences.
> * passes.c (init_optimization_passes): Add new pass.
>
> gcc/testsuite:
>
> * gcc.dg/tree-ssa/slsr-1.c: New test case.
> * gcc.dg/tree-ssa/slsr-2.c: New test case.
> * gcc.dg/tree-ssa/slsr-3.c: New test case.
> * gcc.dg/tree-ssa/slsr-4.c: New test case.
>
>
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h (revision 180617)
> +++ gcc/tree-pass.h (working copy)
> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_tracer;
> extern struct gimple_opt_pass pass_warn_unused_result;
> extern struct gimple_opt_pass pass_split_functions;
> extern struct gimple_opt_pass pass_feedback_split_functions;
> +extern struct gimple_opt_pass pass_strength_reduction;
>
> /* IPA Passes */
> extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, unsigned int n)
> +{
> + foo (*(p + n * 4));
> + foo (*(p + 32 + n * 4));
> + if (n > 3)
> + foo (*(p + 16 + n * 4));
> + else
> + foo (*(p + 48 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 128" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 64" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 192" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, int n)
> +{
> + foo (*(p + n++ * 4));
> + foo (*(p + 32 + n++ * 4));
> + foo (*(p + 16 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 144" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 96" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +foo (int a[], int b[], int i)
> +{
> + a[i] = b[i] + 2;
> + i++;
> + a[i] = b[i] + 2;
> + i++;
> + a[i] = b[i] + 2;
> + i++;
> + a[i] = b[i] + 2;
> + i++;
> + return i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 4" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 8" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 12" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c (revision 0)
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-slsr -fdump-tree-optimized" } */
> +
> +void foo (int);
> +
> +int
> +f (int i)
> +{
> + int x, y;
> +
> + x = i * 4;
> + y = x * 10;
> + foo (y);
> +
> + i = i + 5;
> + x = i * 4;
> + y = x * 10;
> + foo (y);
> +
> + i = i - 4;
> + x = i * 4;
> + y = x * 10;
> + foo (y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "slsr" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def (revision 180617)
> +++ gcc/timevar.def (working copy)
> @@ -252,6 +252,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-co
> DEFTIMEVAR (TV_TREE_UNINIT , "uninit var analysis")
> DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization")
> DEFTIMEVAR (TV_PLUGIN_RUN , "plugin execution")
> +DEFTIMEVAR (TV_TREE_SLSR , "straight-line strength reduction")
>
> /* Everything else in rest_of_compilation not included above. */
> DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes")
> Index: gcc/tree-ssa-strength-reduction.c
> ===================================================================
> --- gcc/tree-ssa-strength-reduction.c (revision 0)
> +++ gcc/tree-ssa-strength-reduction.c (revision 0)
> @@ -0,0 +1,813 @@
> +/* Straight-line strength reduction.
> + Copyright (C) 2011 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* There are many algorithms for performing strength reduction on
> + loops. This is not one of them. IVOPTS handles strength reduction
> + of induction variables just fine. This pass is intended to pick
> + up the crumbs it leaves behind, by considering opportunities for
> + strength reduction along dominator paths.
> +
> + Strength reduction will be implemented in four stages, gradually
> + adding more complex candidates:
> +
> + 1) Explicit multiplies, known constant multipliers, no
> + conditional increments.
> + 2) Explicit multiplies, unknown constant multipliers,
> + no conditional increments.
> + 3) Explicit multiplies, conditional increments.
> + 4) Implicit multiplies in addressing expressions.
> +
> + It would also be possible to apply strength reduction to divisions
> + and modulos, but such opportunities are relatively uncommon.
> +
> + Strength reduction is also currently restricted to integer operations.
> + If desired, it could be extended to floating-point operations under
> + control of something like -funsafe-math-optimizations. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "basic-block.h"
> +#include "tree-pass.h"
> +#include "timevar.h"
> +#include "cfgloop.h"
> +#include "tree-pretty-print.h"
> +#include "gimple-pretty-print.h"
> +#include "alloc-pool.h"
> +#include "tree-flow.h"
> +
> +/* Index into the candidate vector, offset by 1. VECs are zero-based,
> + while cand_idx's are one-based, with zero indicating null. */
> +
> +typedef unsigned int cand_idx;
> +
> +/* Information about a strength reduction candidate. A candidate
> + is a statement S1 of the expanded form
> +
> + S1: LHS = (B + c1) * M,
> +
> + where B is an SSA name, c1 is a constant that may be zero, and
> + M is either an SSA name or a nonzero constant. A candidate may
> + only be strength reduced if a previous "basis" statement
> +
> + S0: S = (B' + c0) * M
> +
> + of the same form exists previously in the same block or in a
> + dominator. Assuming replacement is profitable, there are two
> + cases:
> +
> + (1) If B = B', then S1 can be replaced with:
> +
> + S1': LHS = S + (c1 - c0) * M,
> +
> + where (c1 - c0) * M is folded to the extent possible.
> +
> + (2) If B differs from B', then we require that B = B' + c0,
> + and S1 can be replaced with:
> +
> + S1': LHS = S + c1 * M,
> +
> + where c1 * M is folded if possible.
> +
> + Candidate records are allocated from an allocation pool. They are
> + addressed both from a hash table keyed on S1, and from a vector of
> + candidate pointers arranged in predominator order. */
> +
> +typedef struct slsr_cand_d
> +{
> + /* The candidate statement S1. */
> + gimple cand_stmt;
> +
> + /* The base SSA name B. */
> + tree base_name;
> +
> + /* The stride M. */
> + tree stride;
> +
> + /* The index constant c1. */
> + double_int index;
> +
> + /* Index of this candidate in the candidate vector. */
> + cand_idx cand_num;
> +
> + /* Index of the basis statement S0, if any, in the candidate vector. */
> + cand_idx basis;
> +
> + /* First candidate for which this candidate is a basis, if one exists. */
> + cand_idx dependent;
> +
> + /* Next candidate having the same basis as this one. */
> + cand_idx sibling;
> +
> + /* If this is a conditional candidate, the defining PHI statement
> + for the base SSA name B. */
> + gimple def_phi;
> +
> + /* Access to the statement for subsequent modification. Cached to
> + save compile time. */
> + gimple_stmt_iterator cand_gsi;
> +
> +} slsr_cand, *slsr_cand_t;
> +
> +typedef const struct slsr_cand_d *const_slsr_cand_t;
> +
> +/* Candidates are maintained in a vector. If candidate X dominates
> + candidate Y, then X appears before Y in the vector; but the
> + converse does not necessarily hold. */
> +DEF_VEC_P (slsr_cand_t);
> +DEF_VEC_ALLOC_P (slsr_cand_t, heap);
> +static VEC (slsr_cand_t, heap) *cand_vec;
> +
> +/* Hash table embodying a mapping from statements to candidates. */
> +static htab_t stmt_cand_map;
> +
> +/* Allocation pool for candidates. */
> +static alloc_pool cand_pool;
> +
> +
> +/* Produce a pointer to the IDX'th candidate in the candidate vector. */
> +
> +static slsr_cand_t
> +lookup_cand (cand_idx idx)
> +{
> + return VEC_index (slsr_cand_t, cand_vec, idx - 1);
> +}
> +
> +/* Callback to produce a hash value for a candidate. */
> +
> +static hashval_t
> +stmt_cand_hash (const void *p)
> +{
> + return htab_hash_pointer (((const_slsr_cand_t)p)->cand_stmt);
> +}
> +
> +/* Callback when an element is removed from the hash table.
> + We never remove entries until the entire table is released. */
> +
> +static void
> +stmt_cand_free (void *p ATTRIBUTE_UNUSED)
> +{
> +}
> +
> +/* Callback to return true if two candidates are equal. */
> +
> +static int
> +stmt_cand_eq (const void *p1, const void *p2)
> +{
> + const_slsr_cand_t const cand1 = (const_slsr_cand_t)p1;
> + const_slsr_cand_t const cand2 = (const_slsr_cand_t)p2;
> + return (cand1->cand_stmt == cand2->cand_stmt);
> +}
> +
> +/* Return TRUE if GS is a statement that defines an SSA name from
> + a NOP_EXPR and is legal for us to combine an add and multiply
> + across. This is legitimate for casts from a signed type to
> + a signed or unsigned type of the same or larger size. It is not
> + legitimate to convert any unsigned type to a signed type, or
> + to an unsigned type of a different size.
> +
> + The reasoning here is that signed integer overflow is undefined,
> + so any program that was expecting overflow that no longer occurs
> + is not correct. Unsigned integers, however, have wrap semantics,
> + and it is reasonable for programs to assume an overflowing add
> + will wrap. */
> +
> +static bool
> +legal_cast_p (gimple gs)
> +{
> + tree lhs, rhs;
> + unsigned int src_size, dst_size, src_uns, dst_uns;
> +
> + if (!is_gimple_assign (gs)
> + || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> + || gimple_assign_rhs_code (gs) != NOP_EXPR)
> + return false;
> +
> + lhs = gimple_assign_lhs (gs);
> + rhs = gimple_assign_rhs1 (gs);
> +
> + if (TREE_CODE (rhs) != SSA_NAME)
> + return false;
> +
> + src_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (rhs)));
> + src_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (rhs)));
> + dst_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (lhs)));
> + dst_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (lhs)));
> +
> + if (dst_size < src_size)
> + return false;
> +
> + if (src_uns && !dst_uns)
> + return false;
> +
> + if (src_uns && dst_uns && src_size != dst_size)
> + return false;
> +
> + return true;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> + CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> + return the source SSA name. */
> +
> +static tree
> +source_of_legal_cast (gimple gs)
> +{
> + if (legal_cast_p (gs))
> + return gimple_assign_rhs1 (gs);
> +
> + return NULL_TREE;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> + CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> + return the target SSA name. */
> +
> +static tree
> +target_of_legal_cast (gimple gs)
> +{
> + if (legal_cast_p (gs))
> + return gimple_assign_lhs (gs);
> +
> + return NULL_TREE;
> +}
> +
> +/* Return TRUE if GS consists of an add or subtract of BASE_NAME
> + with a constant, with the result placed in an SSA name. GS is
> + known to be a gimple assignment. CODE is the RHS code for GS. */
> +
> +static bool
> +stmt_adds_base_to_const (gimple gs, enum tree_code code, tree base_name)
> +{
> + return (TREE_CODE (gimple_assign_lhs (gs)) == SSA_NAME
> + && (code == PLUS_EXPR || code == MINUS_EXPR)
> + && operand_equal_p (gimple_assign_rhs1 (gs), base_name, 0)
> + && TREE_CODE (gimple_assign_rhs2 (gs)) == INTEGER_CST);
> +}
> +
> +/* Recursive helper for find_basis_for_candidate. To find a
> + possible basis, first try the immediate uses of the given
> + BASE_NAME. If a particular use doesn't appear in a multiply,
> + see if it appears in an add that feeds one or more multiplies.
> + Recurse using the SSA name defined on the add. Each potential
> + basis found in this manner must also appear in a block that
> + dominates the candidate statement, be present in the candidate
> + table, and have the same stride M. If more than one possible
> + basis exists, pick the one with highest index in the vector,
> + which will be the most immediately dominating basis. */
> +
> +static slsr_cand_t
> +find_basis_for_candidate_1 (slsr_cand_t c, tree base_name, slsr_cand_t basis)
> +{
> + imm_use_iterator use_iter;
> + use_operand_p use_p;
> +
> + FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> + {
> + gimple use_stmt = USE_STMT (use_p);
> + slsr_cand_t use_basis = NULL;
> + enum tree_code code;
> + tree cast_target;
> +
> + if (!is_gimple_assign (use_stmt))
> + continue;
> +
> + /* When revising a candidate's basis, make sure not to pick itself. */
> + if (c->cand_stmt == use_stmt)
> + continue;
> +
> + /* Look through casts where legal. */
> + cast_target = target_of_legal_cast (use_stmt);
> + if (cast_target)
> + return find_basis_for_candidate_1 (c, cast_target, basis);
> +
> + /* If the use statement is a multiply, it's a potential basis.
> + Otherwise, we have to look at the uses of the use statement
> + to find any multiplies that may be potential bases. */
> + code = gimple_assign_rhs_code (use_stmt);
> +
> + if (code == MULT_EXPR)
> + {
> + slsr_cand mapping_key;
> + mapping_key.cand_stmt = use_stmt;
> + use_basis = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key);
> + }
> +
> + /* Recurse if this is an add of a constant that might feed a
> + multiply. */
> + else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> + {
> + tree lhs = gimple_assign_lhs (use_stmt);
> + use_basis = find_basis_for_candidate_1 (c, lhs, basis);
> + }
> +
> + if (!use_basis
> + || !operand_equal_p (c->stride, use_basis->stride, 0)
> + || !dominated_by_p (CDI_DOMINATORS,
> + gimple_bb (c->cand_stmt),
> + gimple_bb (use_stmt)))
> + continue;
> +
> + /* When revising a candidate's basis, be careful not to pick
> + a basis that occurs after the candidate. */
> + if (use_basis->cand_num > c->cand_num)
> + continue;
> +
> + if (!basis || basis->cand_num < use_basis->cand_num)
> + basis = use_basis;
> + }
> +
> + return basis;
> +}
> +
> +/* Look for the nearest dominating statement in the candidate
> + vector that can serve as a basis for the new CANDIDATE. If
> + found, adjust the dependent links for the basis entry and
> + return the index of the basis entry. Otherwise return zero. */
> +
> +static unsigned int
> +find_basis_for_candidate (slsr_cand_t c)
> +{
> + slsr_cand_t basis = find_basis_for_candidate_1 (c, c->base_name, NULL);
> +
> + if (basis)
> + {
> + c->sibling = basis->dependent;
> + basis->dependent = c->cand_num;
> + return basis->cand_num;
> + }
> +
> + return 0;
> +}
> +
> +/* Starting with statement GS, look backwards through any
> + intervening legal casts to find one or more
> + adds of an SSA name and a constant. Return the earliest
> + SSA name in the chain as *BASE, and the sum of all constants
> + in the chain as INDEX. */
> +
> +static void
> +combine_feeding_adds (gimple gs, tree *base, double_int *index)
> +{
> + double_int new_index;
> + tree cast_source;
> +
> + while ((cast_source = source_of_legal_cast (gs)))
> + {
> + gs = SSA_NAME_DEF_STMT (cast_source);
> + *base = cast_source;
> + }
> +
> + /* If there aren't any more adds, we're done. */
> + if (!is_gimple_assign (gs)
> + || (gimple_assign_rhs_code (gs) != PLUS_EXPR
> + && gimple_assign_rhs_code (gs) != MINUS_EXPR)
> + || TREE_CODE (gimple_assign_rhs1 (gs)) != SSA_NAME
> + || TREE_CODE (gimple_assign_rhs2 (gs)) != INTEGER_CST)
> + return;
> +
> + *base = gimple_assign_rhs1 (gs);
> + new_index = tree_to_double_int (gimple_assign_rhs2 (gs));
> +
> + if (gimple_assign_rhs_code (gs) == MINUS_EXPR)
> + new_index = double_int_neg (new_index);
> +
> + *index = double_int_add (*index, new_index);
> + combine_feeding_adds (SSA_NAME_DEF_STMT (*base), base, index);
> +}
> +
> +/* Given statement GS containing an integer multiply, determine
> + whether it's a possible strength-reduction candidate. If so,
> + add it to the candidate vector and the statement-candidate
> + mapping. */
> +
> +static void
> +process_possible_candidate (gimple_stmt_iterator gsi, gimple gs)
> +{
> + tree rhs1 = gimple_assign_rhs1 (gs);
> + tree rhs2 = gimple_assign_rhs2 (gs);
> + tree base;
> + double_int index;
> + slsr_cand_t c;
> + void **slot;
> +
> + /* TODO: Unknown constant multipliers will be dealt with in
> + stage 2. */
> + if (TREE_CODE (rhs2) != INTEGER_CST)
> + return;
> +
> + /* If RHS1 isn't fed by an addition or subtraction of an SSA_NAME
> + and a constant, then treat it as an add of zero. Look through
> + legal casts and combine multiple chained adds to find
> + the true base name and index. */
> + gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
> + base = rhs1;
> + index = double_int_zero;
> + combine_feeding_adds (SSA_NAME_DEF_STMT (rhs1), &base, &index);
> +
> + c = (slsr_cand_t)pool_alloc (cand_pool);
> + c->cand_stmt = gs;
> + c->base_name = base;
> + c->stride = rhs2;
> + c->index = index;
> + c->cand_num = VEC_length (slsr_cand_t, cand_vec) + 1;
> + c->dependent = 0;
> + c->sibling = 0;
> + c->def_phi = NULL;
> + c->cand_gsi = gsi;
> + c->basis = find_basis_for_candidate (c);
> +
> + VEC_safe_push (slsr_cand_t, heap, cand_vec, c);
> +
> + slot = htab_find_slot (stmt_cand_map, c, INSERT);
> + gcc_assert (!*slot);
> + *slot = c;
> +}
> +
> +/* Find strength-reduction candidates in block BB. */
> +
> +static void
> +find_candidates_in_block (basic_block bb)
> +{
> + gimple_stmt_iterator gsi;
> + basic_block son;
> +
> + /* Process this block. */
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gimple gs = gsi_stmt (gsi);
> + if (is_gimple_assign (gs)
> + && gimple_assign_rhs_code (gs) == MULT_EXPR
> + && INTEGRAL_MODE_P (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)))))
> + process_possible_candidate (gsi, gs);
> + }
> +
> + /* Process dominated blocks. */
> + for (son = first_dom_son (CDI_DOMINATORS, bb);
> + son;
> + son = next_dom_son (CDI_DOMINATORS, son))
> + find_candidates_in_block (son);
> +}
> +
> +/* Dump a candidate for debug. */
> +
> +static void
> +dump_candidate (slsr_cand_t c)
> +{
> + fprintf (dump_file, "%3d [%d] ", c->cand_num,
> + gimple_bb (c->cand_stmt)->index);
> + print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> + fputs (" base: ", dump_file);
> + print_generic_expr (dump_file, c->base_name, 0);
> + fputs ("\n index: ", dump_file);
> + dump_double_int (dump_file, c->index, false);
> + fputs ("\n stride: ", dump_file);
> + print_generic_expr (dump_file, c->stride, 0);
> + fprintf (dump_file, "\n basis: %3d", c->basis);
> + fprintf (dump_file, "\n dependent: %3d", c->dependent);
> + fprintf (dump_file, "\n sibling: %3d", c->sibling);
> + fputs ("\n phi: ", dump_file);
> + if (c->def_phi)
> + print_gimple_stmt (dump_file, c->def_phi, 0, 0);
> + else
> + fputs ("n/a", dump_file);
> + fputs ("\n\n", dump_file);
> +}
> +
> +/* Dump the candidate vector for debug. */
> +
> +static void
> +dump_cand_vec (void)
> +{
> + unsigned int i;
> + slsr_cand_t c;
> +
> + fprintf (dump_file, "\nStrength reduction candidate vector:\n\n");
> +
> + FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> + dump_candidate (c);
> +}
> +
> +/* Recursive helper for unconditional_cands_with_known_stride_p.
> + Returns TRUE iff C, its siblings, and its dependents are all
> + unconditional candidates. */
> +
> +static bool
> +unconditional_cands (slsr_cand_t c)
> +{
> + if (c->def_phi)
> + return false;
> +
> + if (c->sibling && !unconditional_cands (lookup_cand (c->sibling)))
> + return false;
> +
> + if (c->dependent && !unconditional_cands (lookup_cand (c->dependent)))
> + return false;
> +
> + return true;
> +}
> +
> +/* Determine whether or not the tree of candidates rooted at
> + ROOT consists entirely of unconditional increments with
> + an INTEGER_CST stride. */
> +
> +static bool
> +unconditional_cands_with_known_stride_p (slsr_cand_t root)
> +{
> + /* The stride is identical for all related candidates, so
> + check it once. */
> + if (TREE_CODE (root->stride) != INTEGER_CST)
> + return false;
> +
> + return unconditional_cands (lookup_cand (root->dependent));
> +}
> +
> +/* GS is an add or subtract that used to be a multiply. Follow
> + its immediate uses to update the candidate table accordingly. */
> +
> +static void
> +update_chained_candidates (gimple gs)
> +{
> + tree base_name = gimple_assign_lhs (gs);
> + imm_use_iterator use_iter;
> + use_operand_p use_p;
> + tree cast_target;
> +
> + FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> + {
> + gimple use_stmt = USE_STMT (use_p);
> + enum tree_code code;
> + slsr_cand_t c;
> +
> + if (!is_gimple_assign (use_stmt))
> + continue;
> +
> + /* Look forward through converts that don't change semantics. */
> + cast_target = target_of_legal_cast (use_stmt);
> + if (cast_target)
> + {
> + update_chained_candidates (use_stmt);
> + continue;
> + }
> +
> + code = gimple_assign_rhs_code (use_stmt);
> +
> + /* Case 1: The name defined by the add appears in a
> + multiply that exists in the candidate table. */
> + if (code == MULT_EXPR)
> + {
> + slsr_cand mapping_key;
> + tree base;
> + double_int index;
> +
> + mapping_key.cand_stmt = use_stmt;
> + if (!(c = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key)))
> + continue;
> +
> + base = base_name;
> + index = double_int_zero;
> + combine_feeding_adds (gs, &base, &index);
> + c->base_name = base;
> + c->index = index;
> +
> + if (!c->basis)
> + c->basis = find_basis_for_candidate (c);
> + else
> + /* If all has worked correctly, the modified candidate has
> + the same basis as before. */
> + gcc_assert (lookup_cand (c->basis)
> + == find_basis_for_candidate_1 (c, base, NULL));
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fputs ("\nRevised candidate:\n", dump_file);
> + dump_candidate (c);
> + }
> + }
> +
> + /* Case 2: The name defined by the add appears in another
> + add with a constant, which in turn feeds one or more multiplies
> + in the candidate table. Recurse to find the multiplies. */
> + else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> + update_chained_candidates (use_stmt);
> + }
> +}
> +
> +/* Replace candidate C, each sibling of candidate C, and each
> + dependent of candidate C with an add or subtract. */
> +
> +static void
> +replace_dependents (slsr_cand_t c)
> +{
> + slsr_cand_t basis = lookup_cand (c->basis);
> + tree basis_name = gimple_assign_lhs (basis->cand_stmt);
> + tree incr_type = TREE_TYPE (gimple_assign_rhs1 (c->cand_stmt));
> + double_int increment, stride;
> +
> + if (operand_equal_p (c->base_name, basis->base_name, 0))
> + increment = double_int_sub (c->index, basis->index);
> + else
> + increment = c->index;
> +
> + stride = tree_to_double_int (c->stride);
> + increment = double_int_mul (increment, stride);
> +
> + /* It is highly unlikely, but possible, that the resulting
> + increment doesn't fit in a HWI. Abandon the replacement
> + in this case. This does not affect siblings or dependents
> + of C. */
> + /* Restriction to signed HWI is conservative for unsigned types
> + but allows for safe negation without twisted logic. */
> + if (double_int_fits_in_shwi_p (increment))
> + {
> + enum tree_code code = PLUS_EXPR;
> + tree orig_type
> + = TREE_TYPE (SSA_NAME_VAR (gimple_assign_rhs1 (c->cand_stmt)));
> + tree repl_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
> + tree incr_tree;
> +
> + if (double_int_negative_p (increment))
> + {
> + code = MINUS_EXPR;
> + increment = double_int_neg (increment);
> + }
> +
> + incr_tree = double_int_to_tree (incr_type, increment);
> +
> + /* A widening cast may be necessary. */
> + if (orig_type != repl_type)
> + {
> + tree new_basis_name = create_tmp_reg (orig_type, "slsr");
> + gimple nop_stmt;
> + add_referenced_var (new_basis_name);
> + new_basis_name = make_ssa_name (new_basis_name, NULL);
> + nop_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_basis_name,
> + basis_name, NULL_TREE);
> + gimple_set_location (nop_stmt, gimple_location (c->cand_stmt));
> + gsi_insert_before (&c->cand_gsi, nop_stmt, GSI_SAME_STMT);
> + basis_name = new_basis_name;
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fputs ("Inserting: ", dump_file);
> + print_gimple_stmt (dump_file, nop_stmt, 0, 0);
> + }
> + }
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fputs ("Replacing: ", dump_file);
> + print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> + }
> +
> + gimple_assign_set_rhs_with_ops (&c->cand_gsi, code,
> + basis_name, incr_tree);
> + update_stmt (c->cand_stmt);
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fputs ("With: ", dump_file);
> + print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> + fputs ("\n", dump_file);
> + }
> +
> + /* The multiply we just converted to an add might well feed
> + into one or more other multiplies in the candidate table.
> + If so, we need to update those candidate entries. */
> + update_chained_candidates (c->cand_stmt);
> + }
> +
> + if (c->sibling)
> + replace_dependents (lookup_cand (c->sibling));
> +
> + if (c->dependent)
> + replace_dependents (lookup_cand (c->dependent));
> +}
> +
> +/* Analyze costs of related candidates in the candidate vector,
> + and make beneficial replacements. */
> +
> +static void
> +analyze_candidates_and_replace (void)
> +{
> + unsigned int i;
> + slsr_cand_t c;
> +
> + /* Each candidate that has a null basis and a non-null
> + dependent is the root of a tree of related statements.
> + Analyze each tree to determine a subset of those
> + statements that can be replaced with maximum benefit. */
> + FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> + {
> + if (c->basis != 0 || c->dependent == 0)
> + continue;
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "\nProcessing dependency tree rooted at %d.\n",
> + c->cand_num);
> +
> + /* If the common stride of all related candidates is a
> + known constant, and none of these has a phi-dependence,
> + then all replacements are considered profitable.
> + Each replaces a multiply by a single add, with the
> + possibility that a feeding add also goes dead as a
> + result. */
> + if (unconditional_cands_with_known_stride_p (c))
> + replace_dependents (lookup_cand (c->dependent));
> +
> + /* TODO: When the stride is an SSA name, it may still
> + be profitable to replace some or all of the dependent
> + candidates, depending on whether the introduced increments
> + can be reused. */
> +
> + /* TODO: When conditional increments occur so that a
> + candidate is dependent upon a phi-basis, the cost of
> + introducing a temporary must be accounted for. */
> +
> + /* TODO: Strength reduction candidates hidden in
> + addressing expressions are not yet handled, and will
> + have more complex cost functions. */
> + }
> +}
> +
> +/* Main entry point for straight-line strength reduction. */
> +
> +static unsigned int
> +execute_strength_reduction (void)
> +{
> + /* Create the allocation pool where candidates will reside. */
> + cand_pool = create_alloc_pool ("Strength reduction pool",
> + sizeof (slsr_cand), 128);
> +
> + /* Allocate the candidate vector. */
> + cand_vec = VEC_alloc (slsr_cand_t, heap, 128);
> +
> + /* Allocate the mapping from statements to candidate indices. */
> + stmt_cand_map = htab_create (500, stmt_cand_hash,
> + stmt_cand_eq, stmt_cand_free);
> +
> + /* Initialize the loop optimizer. We need to detect flow across
> + back edges, and this gives us dominator information as well. */
> + loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
> + /* Walk the CFG in predominator order looking for strength reduction
> + candidates. */
> + find_candidates_in_block (ENTRY_BLOCK_PTR);
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + dump_cand_vec ();
> +
> + /* Analyze costs and make appropriate replacements. */
> + analyze_candidates_and_replace ();
> +
> + /* Free resources. */
> + loop_optimizer_finalize ();
> + htab_delete (stmt_cand_map);
> + VEC_free (slsr_cand_t, heap, cand_vec);
> + free_alloc_pool (cand_pool);
> +
> + return 0;
> +}
> +
> +static bool
> +gate_strength_reduction (void)
> +{
> + return optimize > 0;
> +}
> +
> +struct gimple_opt_pass pass_strength_reduction =
> +{
> + {
> + GIMPLE_PASS,
> + "slsr", /* name */
> + gate_strength_reduction, /* gate */
> + execute_strength_reduction, /* execute */
> + NULL, /* sub */
> + NULL, /* next */
> + 0, /* static_pass_number */
> + TV_TREE_SLSR, /* tv_id */
> + PROP_cfg | PROP_ssa, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_dump_func | TODO_verify_ssa /* todo_flags_finish */
> + }
> +};
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in (revision 180617)
> +++ gcc/Makefile.in (working copy)
> @@ -1475,6 +1475,7 @@ OBJS = \
> tree-ssa-reassoc.o \
> tree-ssa-sccvn.o \
> tree-ssa-sink.o \
> + tree-ssa-strength-reduction.o \
> tree-ssa-strlen.o \
> tree-ssa-structalias.o \
> tree-ssa-tail-merge.o \
> @@ -2519,6 +2520,10 @@ tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H)
> alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) langhooks.h $(HASHTAB_H) $(GIMPLE_H) \
> $(TREE_INLINE_H) tree-iterator.h tree-ssa-propagate.h tree-ssa-sccvn.h \
> $(PARAMS_H) tree-pretty-print.h gimple-pretty-print.h gimple-fold.h
> +tree-ssa-strength-reduction.o : tree-ssa-strength-reduction.c $(CONFIG_H) \
> + $(SYSTEM_H) coretypes.h $(TREE_H) $(GIMPLE_H) $(BASIC_BLOCK_H) \
> + $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) tree-pretty-print.h alloc-pool.h \
> + $(TREE_FLOW_H)
> tree-vrp.o : tree-vrp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(GGC_H) \
> $(BASIC_BLOCK_H) tree-ssa-propagate.h $(FLAGS_H) $(TREE_DUMP_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c (revision 180617)
> +++ gcc/passes.c (working copy)
> @@ -1368,6 +1368,7 @@ init_optimization_passes (void)
> }
> NEXT_PASS (pass_lower_vector_ssa);
> NEXT_PASS (pass_cse_reciprocals);
> + NEXT_PASS (pass_strength_reduction);
> NEXT_PASS (pass_reassoc);
> NEXT_PASS (pass_vrp);
> NEXT_PASS (pass_dominator);
>
>
>
More information about the Gcc-patches
mailing list