[PATCH] Straight-line strength reduction, stage 1

Richard Guenther richard.guenther@gmail.com
Fri Nov 4 13:58:00 GMT 2011


On Sun, Oct 30, 2011 at 5:10 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Greetings,
>
> IVOPTS handles strength reduction of induction variables, but GCC does
> not currently perform strength reduction in straight-line code.  This
> has been noted before in PR22586 and PR35308.  PR46556 is also a case
> that could be handled with strength reduction.  This patch adds a pass
> to perform strength reduction along dominator paths on the easiest
> cases, where replacements are obviously profitable.
>
> My intent is to add subsequent installments to handle more involved
> cases, as described in the code commentary.  The cases not yet covered
> will require target-specific cost analysis to determine profitability.
> It is likely that this will leverage some of the cost function
> infrastructure in tree-ssa-loop-ivopts.c.
>
> I've bootstrapped and tested the code on powerpc64-linux with no
> regressions.  I've also run SPEC CPU2006 to compare results.  32-bit
> PowerPC gains about 1% on integer code.  Other results are in the noise
> range.  64-bit integer code would have also shown gains, except for one
> bad result (400.perlbench degraded 4%).  Initial analysis shows that
> very different control flow is created for regmatch with the patch than
> without.  I will be investigating further this week, but presumably some
> touchy threshold was no longer met for a downstream optimization --
> probably an indirect effect.
>
> My hope is to commit this first stage as part of 4.7, with the remainder
> to follow in 4.8.
>
> Thanks for your consideration,

I've had a quick look for now and noted two things

1) you try to handle casting already - for the specific patterns, it seems
the constraints are the same as for detecting when using a widening
multiplication/add is possible?  I think we should have some common
code for the legality checks.

2) you do not handle POINTER_PLUS_EXPR - which looks most
interesting for the pure pointer arithmetic case.  (and you do not
treat PLUS_EXPR as commutative, thus a + b*c you handle but
not b*c + a(?), for a*b + c*d we'd have two candidates)

I'm not sure I follow the dominance checks - usually instead of

+         || !dominated_by_p (CDI_DOMINATORS,
+                             gimple_bb (c->cand_stmt),
+                             gimple_bb (use_stmt)))

you'd special case gimple_bb (c->cand_stmt) == gimple_bb (use_stmt)
and have stmt uids to resolve dominance inside a basic block.

For the overall structure I see you are matching the MULT_EXPRs
and are looking at all immediate uses of it for candidates with +-.
I'd have expected you walk the BBs in dominator order, visiting
statements and looking for the +- instead.  That would avoid any
issue about dominance (because you walk BBs properly) and
wouldn't require immediate use walks either (I couldn't convince
myself 100% you are not visiting stmts multiple times that way...)
But maybe I missed sth in my quick look.

Most interesting (and one reason why strength reduction from PRE
didn't work out) is the from a A + B*C stmt lookup all candidates
for computing the value in a more efficient way.

You do not seem to transform stmts on-the-fly, so for

a1 = c + d;
a2 = c + 2*d;
a3 = c + 3*d;

are you able to generate

a1 = c + d;
a2 = a1 + d;
a3 = a2 + d;

?  On-the-fly operation would also handle this if the candidate info
for a2 is kept as c + 2*d.  Though it's probably worth lookign at

a1 = c + d;
a2 = a1 + d;
a3 = c + 3*d;

and see if that still figures out that a3 = a2 + d (thus, do you,
when you find the candidate a1 + 1 * d, fold in candidate information
for a1?  thus, try to reduce it to an ultimate base?)

Thanks,
Richard.


> Bill
>
>
> 2011-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
> gcc:
>
>        * tree-pass.h (pass_strength_reduction): New declaration.
>        * timevar.def (TV_TREE_SLSR): New time-var.
>        * tree-ssa-strength-reduction.c: New file.
>        * Makefile.in: New dependences.
>        * passes.c (init_optimization_passes): Add new pass.
>
> gcc/testsuite:
>
>        * gcc.dg/tree-ssa/slsr-1.c: New test case.
>        * gcc.dg/tree-ssa/slsr-2.c: New test case.
>        * gcc.dg/tree-ssa/slsr-3.c: New test case.
>        * gcc.dg/tree-ssa/slsr-4.c: New test case.
>
>
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     (revision 180617)
> +++ gcc/tree-pass.h     (working copy)
> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_tracer;
>  extern struct gimple_opt_pass pass_warn_unused_result;
>  extern struct gimple_opt_pass pass_split_functions;
>  extern struct gimple_opt_pass pass_feedback_split_functions;
> +extern struct gimple_opt_pass pass_strength_reduction;
>
>  /* IPA Passes */
>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, unsigned int n)
> +{
> +  foo (*(p + n * 4));
> +  foo (*(p + 32 + n * 4));
> +  if (n > 3)
> +    foo (*(p + 16 + n * 4));
> +  else
> +    foo (*(p + 48 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 128" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 64" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 192" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, int n)
> +{
> +  foo (*(p + n++ * 4));
> +  foo (*(p + 32 + n++ * 4));
> +  foo (*(p + 16 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 144" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 96" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +foo (int a[], int b[], int i)
> +{
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  return i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 4" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 8" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 12" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-slsr -fdump-tree-optimized" } */
> +
> +void foo (int);
> +
> +int
> +f (int i)
> +{
> +  int x, y;
> +
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +
> +  i = i + 5;
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +
> +  i = i - 4;
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "slsr" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def     (revision 180617)
> +++ gcc/timevar.def     (working copy)
> @@ -252,6 +252,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> +DEFTIMEVAR (TV_TREE_SLSR             , "straight-line strength reduction")
>
>  /* Everything else in rest_of_compilation not included above.  */
>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> Index: gcc/tree-ssa-strength-reduction.c
> ===================================================================
> --- gcc/tree-ssa-strength-reduction.c   (revision 0)
> +++ gcc/tree-ssa-strength-reduction.c   (revision 0)
> @@ -0,0 +1,813 @@
> +/* Straight-line strength reduction.
> +   Copyright (C) 2011  Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* There are many algorithms for performing strength reduction on
> +   loops.  This is not one of them.  IVOPTS handles strength reduction
> +   of induction variables just fine.  This pass is intended to pick
> +   up the crumbs it leaves behind, by considering opportunities for
> +   strength reduction along dominator paths.
> +
> +   Strength reduction will be implemented in four stages, gradually
> +   adding more complex candidates:
> +
> +   1) Explicit multiplies, known constant multipliers, no
> +      conditional increments.
> +   2) Explicit multiplies, unknown constant multipliers,
> +      no conditional increments.
> +   3) Explicit multiplies, conditional increments.
> +   4) Implicit multiplies in addressing expressions.
> +
> +   It would also be possible to apply strength reduction to divisions
> +   and modulos, but such opportunities are relatively uncommon.
> +
> +   Strength reduction is also currently restricted to integer operations.
> +   If desired, it could be extended to floating-point operations under
> +   control of something like -funsafe-math-optimizations.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "basic-block.h"
> +#include "tree-pass.h"
> +#include "timevar.h"
> +#include "cfgloop.h"
> +#include "tree-pretty-print.h"
> +#include "gimple-pretty-print.h"
> +#include "alloc-pool.h"
> +#include "tree-flow.h"
> +
> +/* Index into the candidate vector, offset by 1.  VECs are zero-based,
> +   while cand_idx's are one-based, with zero indicating null.  */
> +
> +typedef unsigned int cand_idx;
> +
> +/* Information about a strength reduction candidate.  A candidate
> +   is a statement S1 of the expanded form
> +
> +     S1:  LHS = (B + c1) * M,
> +
> +   where B is an SSA name, c1 is a constant that may be zero, and
> +   M is either an SSA name or a nonzero constant.  A candidate may
> +   only be strength reduced if a previous "basis" statement
> +
> +     S0:  S = (B' + c0) * M
> +
> +   of the same form exists previously in the same block or in a
> +   dominator.  Assuming replacement is profitable, there are two
> +   cases:
> +
> +   (1) If B = B', then S1 can be replaced with:
> +
> +         S1':  LHS = S + (c1 - c0) * M,
> +
> +       where (c1 - c0) * M is folded to the extent possible.
> +
> +   (2) If B differs from B', then we require that B = B' + c0,
> +       and S1 can be replaced with:
> +
> +         S1':  LHS = S + c1 * M,
> +
> +       where c1 * M is folded if possible.
> +
> +   Candidate records are allocated from an allocation pool.  They are
> +   addressed both from a hash table keyed on S1, and from a vector of
> +   candidate pointers arranged in predominator order.  */
> +
> +typedef struct slsr_cand_d
> +{
> +  /* The candidate statement S1.  */
> +  gimple cand_stmt;
> +
> +  /* The base SSA name B.  */
> +  tree base_name;
> +
> +  /* The stride M.  */
> +  tree stride;
> +
> +  /* The index constant c1.  */
> +  double_int index;
> +
> +  /* Index of this candidate in the candidate vector.  */
> +  cand_idx cand_num;
> +
> +  /* Index of the basis statement S0, if any, in the candidate vector.  */
> +  cand_idx basis;
> +
> +  /* First candidate for which this candidate is a basis, if one exists.  */
> +  cand_idx dependent;
> +
> +  /* Next candidate having the same basis as this one.  */
> +  cand_idx sibling;
> +
> +  /* If this is a conditional candidate, the defining PHI statement
> +     for the base SSA name B.  */
> +  gimple def_phi;
> +
> +  /* Access to the statement for subsequent modification.  Cached to
> +     save compile time.  */
> +  gimple_stmt_iterator cand_gsi;
> +
> +} slsr_cand, *slsr_cand_t;
> +
> +typedef const struct slsr_cand_d *const_slsr_cand_t;
> +
> +/* Candidates are maintained in a vector.  If candidate X dominates
> +   candidate Y, then X appears before Y in the vector; but the
> +   converse does not necessarily hold.  */
> +DEF_VEC_P (slsr_cand_t);
> +DEF_VEC_ALLOC_P (slsr_cand_t, heap);
> +static VEC (slsr_cand_t, heap) *cand_vec;
> +
> +/* Hash table embodying a mapping from statements to candidates.  */
> +static htab_t stmt_cand_map;
> +
> +/* Allocation pool for candidates.  */
> +static alloc_pool cand_pool;
> +
> +
> +/* Produce a pointer to the IDX'th candidate in the candidate vector.  */
> +
> +static slsr_cand_t
> +lookup_cand (cand_idx idx)
> +{
> +  return VEC_index (slsr_cand_t, cand_vec, idx - 1);
> +}
> +
> +/* Callback to produce a hash value for a candidate.  */
> +
> +static hashval_t
> +stmt_cand_hash (const void *p)
> +{
> +  return htab_hash_pointer (((const_slsr_cand_t)p)->cand_stmt);
> +}
> +
> +/* Callback when an element is removed from the hash table.
> +   We never remove entries until the entire table is released.  */
> +
> +static void
> +stmt_cand_free (void *p ATTRIBUTE_UNUSED)
> +{
> +}
> +
> +/* Callback to return true if two candidates are equal.  */
> +
> +static int
> +stmt_cand_eq (const void *p1, const void *p2)
> +{
> +  const_slsr_cand_t const cand1 = (const_slsr_cand_t)p1;
> +  const_slsr_cand_t const cand2 = (const_slsr_cand_t)p2;
> +  return (cand1->cand_stmt == cand2->cand_stmt);
> +}
> +
> +/* Return TRUE if GS is a statement that defines an SSA name from
> +   a NOP_EXPR and is legal for us to combine an add and multiply
> +   across.  This is legitimate for casts from a signed type to
> +   a signed or unsigned type of the same or larger size.  It is not
> +   legitimate to convert any unsigned type to a signed type, or
> +   to an unsigned type of a different size.
> +
> +   The reasoning here is that signed integer overflow is undefined,
> +   so any program that was expecting overflow that no longer occurs
> +   is not correct.  Unsigned integers, however, have wrap semantics,
> +   and it is reasonable for programs to assume an overflowing add
> +   will wrap.  */
> +
> +static bool
> +legal_cast_p (gimple gs)
> +{
> +  tree lhs, rhs;
> +  unsigned int src_size, dst_size, src_uns, dst_uns;
> +
> +  if (!is_gimple_assign (gs)
> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
> +    return false;
> +
> +  lhs = gimple_assign_lhs (gs);
> +  rhs = gimple_assign_rhs1 (gs);
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  src_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (rhs)));
> +  src_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (rhs)));
> +  dst_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (lhs)));
> +  dst_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (lhs)));
> +
> +  if (dst_size < src_size)
> +    return false;
> +
> +  if (src_uns && !dst_uns)
> +    return false;
> +
> +  if (src_uns && dst_uns && src_size != dst_size)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> +   return the source SSA name.  */
> +
> +static tree
> +source_of_legal_cast (gimple gs)
> +{
> +  if (legal_cast_p (gs))
> +    return gimple_assign_rhs1 (gs);
> +
> +  return NULL_TREE;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> +   return the target SSA name.  */
> +
> +static tree
> +target_of_legal_cast (gimple gs)
> +{
> +  if (legal_cast_p (gs))
> +    return gimple_assign_lhs (gs);
> +
> +  return NULL_TREE;
> +}
> +
> +/* Return TRUE if GS consists of an add or subtract of BASE_NAME
> +   with a constant, with the result placed in an SSA name.  GS is
> +   known to be a gimple assignment.  CODE is the RHS code for GS.  */
> +
> +static bool
> +stmt_adds_base_to_const (gimple gs, enum tree_code code, tree base_name)
> +{
> +  return (TREE_CODE (gimple_assign_lhs (gs)) == SSA_NAME
> +         && (code == PLUS_EXPR || code == MINUS_EXPR)
> +         && operand_equal_p (gimple_assign_rhs1 (gs), base_name, 0)
> +         && TREE_CODE (gimple_assign_rhs2 (gs)) == INTEGER_CST);
> +}
> +
> +/* Recursive helper for find_basis_for_candidate.  To find a
> +   possible basis, first try the immediate uses of the given
> +   BASE_NAME.  If a particular use doesn't appear in a multiply,
> +   see if it appears in an add that feeds one or more multiplies.
> +   Recurse using the SSA name defined on the add.  Each potential
> +   basis found in this manner must also appear in a block that
> +   dominates the candidate statement, be present in the candidate
> +   table, and have the same stride M.  If more than one possible
> +   basis exists, pick the one with highest index in the vector,
> +   which will be the most immediately dominating basis.  */
> +
> +static slsr_cand_t
> +find_basis_for_candidate_1 (slsr_cand_t c, tree base_name, slsr_cand_t basis)
> +{
> +  imm_use_iterator use_iter;
> +  use_operand_p use_p;
> +
> +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> +    {
> +      gimple use_stmt = USE_STMT (use_p);
> +      slsr_cand_t use_basis = NULL;
> +      enum tree_code code;
> +      tree cast_target;
> +
> +      if (!is_gimple_assign (use_stmt))
> +       continue;
> +
> +      /* When revising a candidate's basis, make sure not to pick itself.  */
> +      if (c->cand_stmt == use_stmt)
> +       continue;
> +
> +      /* Look through casts where legal.  */
> +      cast_target = target_of_legal_cast (use_stmt);
> +      if (cast_target)
> +       return find_basis_for_candidate_1 (c, cast_target, basis);
> +
> +      /* If the use statement is a multiply, it's a potential basis.
> +        Otherwise, we have to look at the uses of the use statement
> +        to find any multiplies that may be potential bases.  */
> +      code = gimple_assign_rhs_code (use_stmt);
> +
> +      if (code == MULT_EXPR)
> +       {
> +         slsr_cand mapping_key;
> +         mapping_key.cand_stmt = use_stmt;
> +         use_basis = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key);
> +       }
> +
> +      /* Recurse if this is an add of a constant that might feed a
> +        multiply.  */
> +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> +       {
> +         tree lhs = gimple_assign_lhs (use_stmt);
> +         use_basis = find_basis_for_candidate_1 (c, lhs, basis);
> +       }
> +
> +      if (!use_basis
> +         || !operand_equal_p (c->stride, use_basis->stride, 0)
> +         || !dominated_by_p (CDI_DOMINATORS,
> +                             gimple_bb (c->cand_stmt),
> +                             gimple_bb (use_stmt)))
> +       continue;
> +
> +      /* When revising a candidate's basis, be careful not to pick
> +        a basis that occurs after the candidate.  */
> +      if (use_basis->cand_num > c->cand_num)
> +       continue;
> +
> +      if (!basis || basis->cand_num < use_basis->cand_num)
> +       basis = use_basis;
> +    }
> +
> +  return basis;
> +}
> +
> +/* Look for the nearest dominating statement in the candidate
> +   vector that can serve as a basis for the new CANDIDATE.  If
> +   found, adjust the dependent links for the basis entry and
> +   return the index of the basis entry.  Otherwise return zero.  */
> +
> +static unsigned int
> +find_basis_for_candidate (slsr_cand_t c)
> +{
> +  slsr_cand_t basis = find_basis_for_candidate_1 (c, c->base_name, NULL);
> +
> +  if (basis)
> +    {
> +      c->sibling = basis->dependent;
> +      basis->dependent = c->cand_num;
> +      return basis->cand_num;
> +    }
> +
> +  return 0;
> +}
> +
> +/* Starting with statement GS, look backwards through any
> +   intervening legal casts to find one or more
> +   adds of an SSA name and a constant.  Return the earliest
> +   SSA name in the chain as *BASE, and the sum of all constants
> +   in the chain as INDEX.  */
> +
> +static void
> +combine_feeding_adds (gimple gs, tree *base, double_int *index)
> +{
> +  double_int new_index;
> +  tree cast_source;
> +
> +  while ((cast_source = source_of_legal_cast (gs)))
> +    {
> +      gs = SSA_NAME_DEF_STMT (cast_source);
> +      *base = cast_source;
> +    }
> +
> +  /* If there aren't any more adds, we're done.  */
> +  if (!is_gimple_assign (gs)
> +      || (gimple_assign_rhs_code (gs) != PLUS_EXPR
> +         && gimple_assign_rhs_code (gs) != MINUS_EXPR)
> +      || TREE_CODE (gimple_assign_rhs1 (gs)) != SSA_NAME
> +      || TREE_CODE (gimple_assign_rhs2 (gs)) != INTEGER_CST)
> +    return;
> +
> +  *base = gimple_assign_rhs1 (gs);
> +  new_index = tree_to_double_int (gimple_assign_rhs2 (gs));
> +
> +  if (gimple_assign_rhs_code (gs) == MINUS_EXPR)
> +    new_index = double_int_neg (new_index);
> +
> +  *index = double_int_add (*index, new_index);
> +  combine_feeding_adds (SSA_NAME_DEF_STMT (*base), base, index);
> +}
> +
> +/* Given statement GS containing an integer multiply, determine
> +   whether it's a possible strength-reduction candidate.  If so,
> +   add it to the candidate vector and the statement-candidate
> +   mapping.  */
> +
> +static void
> +process_possible_candidate (gimple_stmt_iterator gsi, gimple gs)
> +{
> +  tree rhs1 = gimple_assign_rhs1 (gs);
> +  tree rhs2 = gimple_assign_rhs2 (gs);
> +  tree base;
> +  double_int index;
> +  slsr_cand_t c;
> +  void **slot;
> +
> +  /* TODO:  Unknown constant multipliers will be dealt with in
> +     stage 2.  */
> +  if (TREE_CODE (rhs2) != INTEGER_CST)
> +    return;
> +
> +  /* If RHS1 isn't fed by an addition or subtraction of an SSA_NAME
> +     and a constant, then treat it as an add of zero.  Look through
> +     legal casts and combine multiple chained adds to find
> +     the true base name and index.  */
> +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
> +  base = rhs1;
> +  index = double_int_zero;
> +  combine_feeding_adds (SSA_NAME_DEF_STMT (rhs1), &base, &index);
> +
> +  c = (slsr_cand_t)pool_alloc (cand_pool);
> +  c->cand_stmt = gs;
> +  c->base_name = base;
> +  c->stride = rhs2;
> +  c->index = index;
> +  c->cand_num = VEC_length (slsr_cand_t, cand_vec) + 1;
> +  c->dependent = 0;
> +  c->sibling = 0;
> +  c->def_phi = NULL;
> +  c->cand_gsi = gsi;
> +  c->basis = find_basis_for_candidate (c);
> +
> +  VEC_safe_push (slsr_cand_t, heap, cand_vec, c);
> +
> +  slot = htab_find_slot (stmt_cand_map, c, INSERT);
> +  gcc_assert (!*slot);
> +  *slot = c;
> +}
> +
> +/* Find strength-reduction candidates in block BB.  */
> +
> +static void
> +find_candidates_in_block (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +  basic_block son;
> +
> +  /* Process this block.  */
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple gs = gsi_stmt (gsi);
> +      if (is_gimple_assign (gs)
> +         && gimple_assign_rhs_code (gs) == MULT_EXPR
> +         && INTEGRAL_MODE_P (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)))))
> +       process_possible_candidate (gsi, gs);
> +    }
> +
> +  /* Process dominated blocks.  */
> +  for (son = first_dom_son (CDI_DOMINATORS, bb);
> +       son;
> +       son = next_dom_son (CDI_DOMINATORS, son))
> +    find_candidates_in_block (son);
> +}
> +
> +/* Dump a candidate for debug.  */
> +
> +static void
> +dump_candidate (slsr_cand_t c)
> +{
> +  fprintf (dump_file, "%3d  [%d] ", c->cand_num,
> +          gimple_bb (c->cand_stmt)->index);
> +  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +  fputs ("     base:        ", dump_file);
> +  print_generic_expr (dump_file, c->base_name, 0);
> +  fputs ("\n     index:       ", dump_file);
> +  dump_double_int (dump_file, c->index, false);
> +  fputs ("\n     stride:      ", dump_file);
> +  print_generic_expr (dump_file, c->stride, 0);
> +  fprintf (dump_file, "\n     basis:     %3d", c->basis);
> +  fprintf (dump_file, "\n     dependent: %3d", c->dependent);
> +  fprintf (dump_file, "\n     sibling:   %3d", c->sibling);
> +  fputs ("\n     phi:         ", dump_file);
> +  if (c->def_phi)
> +    print_gimple_stmt (dump_file, c->def_phi, 0, 0);
> +  else
> +    fputs ("n/a", dump_file);
> +  fputs ("\n\n", dump_file);
> +}
> +
> +/* Dump the candidate vector for debug.  */
> +
> +static void
> +dump_cand_vec (void)
> +{
> +  unsigned int i;
> +  slsr_cand_t c;
> +
> +  fprintf (dump_file, "\nStrength reduction candidate vector:\n\n");
> +
> +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> +    dump_candidate (c);
> +}
> +
> +/* Recursive helper for unconditional_cands_with_known_stride_p.
> +   Returns TRUE iff C, its siblings, and its dependents are all
> +   unconditional candidates.  */
> +
> +static bool
> +unconditional_cands (slsr_cand_t c)
> +{
> +  if (c->def_phi)
> +    return false;
> +
> +  if (c->sibling && !unconditional_cands (lookup_cand (c->sibling)))
> +    return false;
> +
> +  if (c->dependent && !unconditional_cands (lookup_cand (c->dependent)))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Determine whether or not the tree of candidates rooted at
> +   ROOT consists entirely of unconditional increments with
> +   an INTEGER_CST stride.  */
> +
> +static bool
> +unconditional_cands_with_known_stride_p (slsr_cand_t root)
> +{
> +  /* The stride is identical for all related candidates, so
> +     check it once.  */
> +  if (TREE_CODE (root->stride) != INTEGER_CST)
> +    return false;
> +
> +  return unconditional_cands (lookup_cand (root->dependent));
> +}
> +
> +/* GS is an add or subtract that used to be a multiply.  Follow
> +   its immediate uses to update the candidate table accordingly.  */
> +
> +static void
> +update_chained_candidates (gimple gs)
> +{
> +  tree base_name = gimple_assign_lhs (gs);
> +  imm_use_iterator use_iter;
> +  use_operand_p use_p;
> +  tree cast_target;
> +
> +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> +    {
> +      gimple use_stmt = USE_STMT (use_p);
> +      enum tree_code code;
> +      slsr_cand_t c;
> +
> +      if (!is_gimple_assign (use_stmt))
> +       continue;
> +
> +      /* Look forward through converts that don't change semantics.  */
> +      cast_target = target_of_legal_cast (use_stmt);
> +      if (cast_target)
> +       {
> +         update_chained_candidates (use_stmt);
> +         continue;
> +       }
> +
> +      code = gimple_assign_rhs_code (use_stmt);
> +
> +      /* Case 1:  The name defined by the add appears in a
> +        multiply that exists in the candidate table.  */
> +      if (code == MULT_EXPR)
> +       {
> +         slsr_cand mapping_key;
> +         tree base;
> +         double_int index;
> +
> +         mapping_key.cand_stmt = use_stmt;
> +         if (!(c = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key)))
> +           continue;
> +
> +         base = base_name;
> +         index = double_int_zero;
> +         combine_feeding_adds (gs, &base, &index);
> +         c->base_name = base;
> +         c->index = index;
> +
> +         if (!c->basis)
> +           c->basis = find_basis_for_candidate (c);
> +         else
> +           /* If all has worked correctly, the modified candidate has
> +              the same basis as before.  */
> +           gcc_assert (lookup_cand (c->basis)
> +                       == find_basis_for_candidate_1 (c, base, NULL));
> +
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           {
> +             fputs ("\nRevised candidate:\n", dump_file);
> +             dump_candidate (c);
> +           }
> +       }
> +
> +      /* Case 2: The name defined by the add appears in another
> +        add with a constant, which in turn feeds one or more multiplies
> +        in the candidate table.  Recurse to find the multiplies.  */
> +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> +       update_chained_candidates (use_stmt);
> +    }
> +}
> +
> +/* Replace candidate C, each sibling of candidate C, and each
> +   dependent of candidate C with an add or subtract.  */
> +
> +static void
> +replace_dependents (slsr_cand_t c)
> +{
> +  slsr_cand_t basis = lookup_cand (c->basis);
> +  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
> +  tree incr_type = TREE_TYPE (gimple_assign_rhs1 (c->cand_stmt));
> +  double_int increment, stride;
> +
> +  if (operand_equal_p (c->base_name, basis->base_name, 0))
> +    increment = double_int_sub (c->index, basis->index);
> +  else
> +    increment = c->index;
> +
> +  stride = tree_to_double_int (c->stride);
> +  increment = double_int_mul (increment, stride);
> +
> +  /* It is highly unlikely, but possible, that the resulting
> +     increment doesn't fit in a HWI.  Abandon the replacement
> +     in this case.  This does not affect siblings or dependents
> +     of C.  */
> +  /* Restriction to signed HWI is conservative for unsigned types
> +     but allows for safe negation without twisted logic.  */
> +  if (double_int_fits_in_shwi_p (increment))
> +    {
> +      enum tree_code code = PLUS_EXPR;
> +      tree orig_type
> +       = TREE_TYPE (SSA_NAME_VAR (gimple_assign_rhs1 (c->cand_stmt)));
> +      tree repl_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
> +      tree incr_tree;
> +
> +      if (double_int_negative_p (increment))
> +       {
> +         code = MINUS_EXPR;
> +         increment = double_int_neg (increment);
> +       }
> +
> +      incr_tree = double_int_to_tree (incr_type, increment);
> +
> +      /* A widening cast may be necessary.  */
> +      if (orig_type != repl_type)
> +       {
> +         tree new_basis_name = create_tmp_reg (orig_type, "slsr");
> +         gimple nop_stmt;
> +         add_referenced_var (new_basis_name);
> +         new_basis_name = make_ssa_name (new_basis_name, NULL);
> +         nop_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_basis_name,
> +                                                  basis_name, NULL_TREE);
> +         gimple_set_location (nop_stmt, gimple_location (c->cand_stmt));
> +         gsi_insert_before (&c->cand_gsi, nop_stmt, GSI_SAME_STMT);
> +         basis_name = new_basis_name;
> +
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           {
> +             fputs ("Inserting: ", dump_file);
> +             print_gimple_stmt (dump_file, nop_stmt, 0, 0);
> +           }
> +       }
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fputs ("Replacing: ", dump_file);
> +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +       }
> +
> +      gimple_assign_set_rhs_with_ops (&c->cand_gsi, code,
> +                                     basis_name, incr_tree);
> +      update_stmt (c->cand_stmt);
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fputs ("With: ", dump_file);
> +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +         fputs ("\n", dump_file);
> +       }
> +
> +      /* The multiply we just converted to an add might well feed
> +        into one or more other multiplies in the candidate table.
> +        If so, we need to update those candidate entries.  */
> +      update_chained_candidates (c->cand_stmt);
> +    }
> +
> +  if (c->sibling)
> +    replace_dependents (lookup_cand (c->sibling));
> +
> +  if (c->dependent)
> +    replace_dependents (lookup_cand (c->dependent));
> +}
> +
> +/* Analyze costs of related candidates in the candidate vector,
> +   and make beneficial replacements.  */
> +
> +static void
> +analyze_candidates_and_replace (void)
> +{
> +  unsigned int i;
> +  slsr_cand_t c;
> +
> +  /* Each candidate that has a null basis and a non-null
> +     dependent is the root of a tree of related statements.
> +     Analyze each tree to determine a subset of those
> +     statements that can be replaced with maximum benefit.  */
> +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> +    {
> +      if (c->basis != 0 || c->dependent == 0)
> +       continue;
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "\nProcessing dependency tree rooted at %d.\n",
> +                c->cand_num);
> +
> +      /* If the common stride of all related candidates is a
> +        known constant, and none of these has a phi-dependence,
> +        then all replacements are considered profitable.
> +        Each replaces a multiply by a single add, with the
> +        possibility that a feeding add also goes dead as a
> +        result.  */
> +      if (unconditional_cands_with_known_stride_p (c))
> +       replace_dependents (lookup_cand (c->dependent));
> +
> +      /* TODO:  When the stride is an SSA name, it may still
> +        be profitable to replace some or all of the dependent
> +        candidates, depending on whether the introduced increments
> +        can be reused.  */
> +
> +      /* TODO:  When conditional increments occur so that a
> +        candidate is dependent upon a phi-basis, the cost of
> +        introducing a temporary must be accounted for.  */
> +
> +      /* TODO:  Strength reduction candidates hidden in
> +        addressing expressions are not yet handled, and will
> +        have more complex cost functions.  */
> +    }
> +}
> +
> +/* Main entry point for straight-line strength reduction.  */
> +
> +static unsigned int
> +execute_strength_reduction (void)
> +{
> +  /* Create the allocation pool where candidates will reside.  */
> +  cand_pool = create_alloc_pool ("Strength reduction pool",
> +                                sizeof (slsr_cand), 128);
> +
> +  /* Allocate the candidate vector.  */
> +  cand_vec = VEC_alloc (slsr_cand_t, heap, 128);
> +
> +  /* Allocate the mapping from statements to candidate indices.  */
> +  stmt_cand_map = htab_create (500, stmt_cand_hash,
> +                              stmt_cand_eq, stmt_cand_free);
> +
> +  /* Initialize the loop optimizer.  We need to detect flow across
> +     back edges, and this gives us dominator information as well.  */
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
> +  /* Walk the CFG in predominator order looking for strength reduction
> +     candidates.  */
> +  find_candidates_in_block (ENTRY_BLOCK_PTR);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    dump_cand_vec ();
> +
> +  /* Analyze costs and make appropriate replacements.  */
> +  analyze_candidates_and_replace ();
> +
> +  /* Free resources.  */
> +  loop_optimizer_finalize ();
> +  htab_delete (stmt_cand_map);
> +  VEC_free (slsr_cand_t, heap, cand_vec);
> +  free_alloc_pool (cand_pool);
> +
> +  return 0;
> +}
> +
> +static bool
> +gate_strength_reduction (void)
> +{
> +  return optimize > 0;
> +}
> +
> +struct gimple_opt_pass pass_strength_reduction =
> +{
> + {
> +  GIMPLE_PASS,
> +  "slsr",                              /* name */
> +  gate_strength_reduction,             /* gate */
> +  execute_strength_reduction,          /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_TREE_SLSR,                                /* tv_id */
> +  PROP_cfg | PROP_ssa,                 /* properties_required */
> +  0,                                   /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  TODO_dump_func | TODO_verify_ssa     /* todo_flags_finish */
> + }
> +};
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     (revision 180617)
> +++ gcc/Makefile.in     (working copy)
> @@ -1475,6 +1475,7 @@ OBJS = \
>        tree-ssa-reassoc.o \
>        tree-ssa-sccvn.o \
>        tree-ssa-sink.o \
> +       tree-ssa-strength-reduction.o \
>        tree-ssa-strlen.o \
>        tree-ssa-structalias.o \
>        tree-ssa-tail-merge.o \
> @@ -2519,6 +2520,10 @@ tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H)
>    alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) langhooks.h $(HASHTAB_H) $(GIMPLE_H) \
>    $(TREE_INLINE_H) tree-iterator.h tree-ssa-propagate.h tree-ssa-sccvn.h \
>    $(PARAMS_H) tree-pretty-print.h gimple-pretty-print.h gimple-fold.h
> +tree-ssa-strength-reduction.o : tree-ssa-strength-reduction.c $(CONFIG_H) \
> +   $(SYSTEM_H) coretypes.h $(TREE_H) $(GIMPLE_H) $(BASIC_BLOCK_H) \
> +   $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) tree-pretty-print.h alloc-pool.h \
> +   $(TREE_FLOW_H)
>  tree-vrp.o : tree-vrp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
>    $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(GGC_H) \
>    $(BASIC_BLOCK_H) tree-ssa-propagate.h $(FLAGS_H) $(TREE_DUMP_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c        (revision 180617)
> +++ gcc/passes.c        (working copy)
> @@ -1368,6 +1368,7 @@ init_optimization_passes (void)
>        }
>       NEXT_PASS (pass_lower_vector_ssa);
>       NEXT_PASS (pass_cse_reciprocals);
> +      NEXT_PASS (pass_strength_reduction);
>       NEXT_PASS (pass_reassoc);
>       NEXT_PASS (pass_vrp);
>       NEXT_PASS (pass_dominator);
>
>
>



More information about the Gcc-patches mailing list